Sequence Determinants of Pri-miRNA Processing by Vincent C. Auyeung B.S., Biology California Institute of Technology, 2005 SUBMITTED TO THE DEPARTMENT OF BIOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUNE 2012 © 2012 Massachusetts Institute of Technology All rights reserved Signature of Author: ____________________________________________________________ Vincent C. Auyeung Department of Biology May 21, 2012 Certified by: ___________________________________________________________________ David P. Bartel Professor of Biology Thesis Supervisor Accepted by: __________________________________________________________________ Robert T. Sauer Professor of Biology Chair, Biology Graduate Committee 1 Sequence determinants of pri-miRNA processing by Vincent C. Auyeung Submitted to the Department of Biology on May 21, 2012 in partial fulfillment of the requirements for the degree of Doctor of Philosophy MicroRNAs (miRNAs) are short RNAs that regulate many processes in physiology and pathology by guiding the repression of target messenger RNAs. For classification purposes, miRNAs are defined as ~22 nt RNAs that are produced by the cleavage of endogenously transcribed hairpins. From a cellular perspective, however, miRNAs are the functional products of a multistep maturation pathway, and are thus defined by the ability of their precursors to enter this pathway. The cellular distinction between miRNA precursors and other hairpins is made in the first step of maturation, when the primary miRNA transcript (pri-miRNA) is cleaved by the Microprocessor, a complex containing Drosha, an RNase III enzyme, and an RNA-binding partner DGCR8. However, it is unclear how the Microprocessor distinguishes between these hairpins and authentic pri-miRNAs. In fact, C. elegans pri-miRNAs are not processed in human cells, illustrating the complexity of pri-miRNA recognition and processing. To systematically explore sequence determinants of pri-miRNA recognition, hundreds of billions of variants of human pri-miRNAs were generated, and millions of variants that were functional Microprocessor substrates were selected in vitro and sequenced. Analysis of the successful sequences revealed multiple determinants of pri-miRNA binding and cleavage, including hairpin secondary structure and primary sequence preferences in the terminal loop and flanking the hairpin. One of these determinants, a CNNC motif downstream of the Drosha cleavage site, is enriched in pri-miRNAs throughout bilaterian animals. Addition of the primary sequence motifs to C. elegans pri-miRNAs promoted their efficient processing in human cells, underscoring the importance of these determinants. The identification and characterization of specific motifs greatly expands the understanding of the features that cells use to recognize pri-miRNAs, and opens the door to future studies of pri-miRNA recognition in humans and other bilaterian animals. In addition, the approach is applicable to the exploration of a variety of functional RNA elements that have so far resisted functional dissection, including long noncoding RNAs and messenger RNA localization signals. Thesis Advisor: David P. Bartel Title: Professor 3 4 Acknowledgements Many individuals have contributed to the work described here, and to my professional and personal development. Of course, none of this would have been possible without the support and guidance of Dave Bartel. I have admired and benefited from his willingness to pursue any approach and master any technique, as long as it moves us closer to answering interesting scientific questions. Beyond that, Dave is an excellent personal and professional role model. One of the greatest things about Dave has been his ability to recruit a group of fantastic scientists to work in his lab. The atmosphere of the lab is incredibly open, with people regularly speaking to each other throughout the day to exchange ideas and advice. I have also benefited from members’ backgrounds in in diverse disciplines, including developmental biology, cancer biology, computational biology, plant biology, biochemistry, and genetics. I am inspired by the lab members’ ability to creatively combine different experimental approaches and ways of thinking to address a variety of scientific problems. The work described in Chapter 2 relied on the critical contributions of David Shechner and Igor Ulitsky. The circularized-substrate cleavage selection was born of a nighttime brainstorming session with David Shechner, and was just one of his ideas among many good ones. Igor Ulitsky performed most of the conservation analysis described in Chapter 2. I am always amazed by his ability to quickly grasp the biological question and apply his vast computational expertise to finding the answer, all the while maintaining a great sense of humor. I am grateful to the people who listened to my constant stream of stupid ideas and considered them critically: Calvin Jan, J. Graham Ruby, Olivia Rissland, David Weinberg, and Igor Ulitsky. Our conversations have taught me to take idle musings, strengthen their intellectual foundations, and operationalize them into productive experiments. I have particularly benefited from the wisdom of Calvin Jan; without Calvin, I would have wandered around the wildernesses of science much longer than I did. Over the years I have also gained from mentors and role models my research and medical careers. My thesis committee, Phillip Sharp and Uttam RajBhandary, have been with me nearly every step of my research training. On the medical side, I have benefited from the advice and guidance of Richard Mitchell. On a personal level, many people have helped keep me sane over the past few years. My baymates Ines Anna Drinnenberg and David Garcia have helped maintain an “atmosphere” in the bay to make spending hours at the bench that much more palatable. I’ve shared good times in lab and out of lab with Laura Resteghini, Sue-Jean Hong, Huili Guo, Stephen Eichhorn, Igor Ulitsky, Alena Shkumatava, Christine Mayr, Andrew Grimson, Calvin Jan, Olivia Rissland, David Shechner, J. Graham Ruby, and Noah Spies. My other Boston friends have kept things in perspective for me while I was immersed in research; Evgeniy Kreydin, Xavier Rios, and Takahiro Soda deserve special thanks. Others have kept me physically active, like the many graduate students and postdocs who played for the Biograds intramural tennis team, which I had the privilege of organizing. Our consistent losing record never stopped us from loving the game. David Garcia, Calvin Jan, James Patridge, Dave Kenezevic, and Eveline Stein have cycled thousands of miles with me, giving me a chance to see a slice of Massachusetts up close and to get some vitamin D in the process. No set of acknowledgements would be complete without mentioning the support and patience of my parents, Marianna and Michael, my brother William, and my sister Wendy. And, most importantly, Joanne. Thank you for everything. 5 6 Table of Contents Chapter 1. What defines a miRNA?...................................................................................... 9 Chapter 2. Beyond secondary structure: primary-sequence determinants license pri-miRNA hairpins for processing..................................................................... 59 Chapter 3. Future directions................................................................................................. 121 Appendix 1. Experimental protocols........................................................................................ 161 Appendix 2. Statistical methods.............................................................................................. 225 Appendix 3. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes.................................................................................. 235 7 8 Chapter 1. What defines a miRNA? Contents Introduction ................................................................................................................................... 10 Understanding the cellular definition of an animal miRNA ......................................................... 13 The biogenesis of miRNAs ........................................................................................................... 15 Known determinants of pri-miRNA processing ........................................................................... 16 General preferences of the Microprocessor .......................................................................... 17 Regulation of cleavage in subsets of animal pri-miRNAs .................................................... 20 Plant pri-miRNA processing: DCL1..................................................................................... 23 Determinants of canonical biogenesis downstream of the Microprocessor.................................. 24 Specificity of nuclear export mediated by exportin-5........................................................... 24 Specificity of cleavage by Dicer ........................................................................................... 25 Specificity of loading into Argonaute ................................................................................... 28 Regulation of biogenesis in subsets of animal pre-miRNAs ................................................ 30 Finding additional biogenesis determinants in pri-miRNAs ........................................................ 31 Substrate specificity in RNase III family proteins ........................................................................ 33 Eubacterial RNase III ............................................................................................................ 33 Yeast RNase III: Rnt1p and Pac1 ......................................................................................... 35 An exhaustive, quantitative approach to defining pri-miRNAs.................................................... 39 9 Introduction The microRNA (miRNA) field began with the cloning of the nematode gene lin-4 and the realization that it formed Watson–Crick base pairs with the 3′ untranslated region of the lin-14 messenger RNA (mRNA). Both genes had previously been identified as key regulators of developmental timing in the nematode Caenorhabditis elegans, although the molecular mechanism linking the two genes was unknown. Extensive effort to clone and sequence these two genes revealed that lin-4 was a tiny RNA that did not encode a protein (Lee et al., 1993). The realization that lin-4 was complementary to portions of the lin-14 mRNA led to the hypothesis that the genetic relationship between lin-4 and lin-14 was mediated by the physical relationship between regulator RNA and target mRNA (Lee et al., 1993; Wightman et al., 1993). This type of interaction had never been described in animals, and even among prokaryotic regulatory RNAs lin-4 was exceptional: four times smaller than any other noncoding regulatory RNA known at that time (Ruvkun et al., 2004). Still, this regulatory scheme seemed likely to be idiosyncratic (Ruvkun et al., 2004); both examples were from one pathway from one rapidlyevolving nematode. As it turns out, lin-4 is the founding member of a much larger class of regulatory RNAs in animals. Several years later, a second small regulatory RNA, let-7, was identified and shown to regulate lin-14 and other developmental timing genes by binding to their 3′ untranslated regions (Reinhart et al., 2000), and was conserved across bilaterian animals (Pasquinelli et al., 2000). The initial trickle of small RNA discovery became a torrent when three groups described the existence of a large number of tiny RNAs, ranging from 21 to 24 nucleotides (nt) long (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). These RNAs were found in multiple bilaterian animal species (Drosophila melanogaster, Caenorhabditis elegans, and humans); were both diverse and individually abundant; and were often conserved between all three species, spanning hundreds of millions of years of evolution. In recognition of their small size, these RNAs were called “microRNAs” (miRNAs). In the past decade, both the number of miRNAs and the catalog of their biological functions have blossomed. Extensive miRNA discovery efforts in animals have identified hundreds of miRNA families in animals and plants, and each family can have many individual miRNA members in each species (Bartel, 2004). For perspective, miRNAs account for over 2% of predicted mammalian genes, and the number of annotated human miRNA families in 10 miRBase (Griffiths-Jones et al., 2006) is comparable to the number of human protein tyrosine kinases annotated by the Gene Ontology Consortium (Ashburner et al., 2000). Like lin-4, miRNAs recognize their targets by base pairing to sites in the mRNA; in plants, the target pairing occurs throughout the length of the miRNA, while in animals target pairing to ~6–8 nt at the 5′ end of the miRNA, termed “seed” pairing, is nearly always necessary and often sufficient for repression (Bartel, 2009). Supplemental elements also contribute to target site efficacy in animals, including the local nucleotide content, position of the target site in the mRNA 3′ untranslated region (3′ UTR), thermodynamic stability of seed pairing, the abundance of target sites in the cell, and proximity of the target site to other miRNA sites (Doench and Sharp, 2004; Grimson et al., 2007; Saetrom et al., 2007; Ui-Tei et al., 2008; Arvey et al., 2010; Garcia et al., 2011). The mechanism by which miRNAs repress target mRNAs varies between plants and animals, in accordance to the targeting mechanism. In plants, this extensive pairing guides cleavage of the target mRNA (Llave et al., 2002; Tang et al., 2003; Jones-Rhoades et al., 2006). Although animal miRNAs also guide cleavage of some targets, their limited pairing with most miRNA targets is insufficient to support target-site cleavage (Yekta et al., 2004; Shin et al., 2010). Instead, miRNA targeting results in mRNA destabilization and/or inhibition of translation. Which mode of target gene repression predominates depends on time; recent transcriptome- and proteome-wide studies indicate that steady-state repressive effects are mostly explained by mRNA destabilization (Baek et al., 2008; Hendrickson et al., 2009; Guo et al., 2010), while similar studies in the fish embryo show that translational repression can dominate for a brief period immediately after induction of miRNA expression (Bazzini et al., 2012). Regardless of mechanism, the magnitude of effects is generally modest but nevertheless significant. Despite the subtlety of their effect, the animal miRNAs are critical regulators of the transcriptome. After all, each miRNA can have hundreds of conserved targets, and >60% of mammalian mRNAs have been under selective pressure to maintain at least one target site (Friedman et al., 2009), while other mRNAs have been under selective pressure to avoid targeting by coexpressed miRNAs (Farh et al., 2005; Stark et al., 2005). As a class, the miRNAs are essential for normal mammalian development, since mutations that ablate miRNA biogenesis are lethal in mammals (Bernstein et al., 2003; Babiarz et al., 2008). Individual miRNAs have 11 been implicated in a spectrum of biological processes. A particular theme has been spatiotemporal control in development, consistent with the roles of lin-4 and let-7 in developmental timing in C. elegans. For example, the mammalian miRNA miR-155 regulates the differentiation of helper T cells in the immune system, and loss of miR-155 impairs the formation of germinal centers, which are important for effective antibody responses (Thai et al., 2007). By contrast, sustained overexpression of this miRNA perturbs the early differentiation of hematopoietic cells, and ultimately causes the inappropriate proliferation of the myeloid cell compartment in mice (O'Connell et al., 2008). In human cancer, miRNAs are often located at or near sites of genomic damage and have reduced levels, suggesting that the disruption of miRNA regulation is a common feature of cancer (Calin et al., 2004b; Lu et al., 2005). Consistent with this view, a general reduction in miRNA levels by inhibition of biogenesis promotes oncogenic transformation in mice (Kumar et al., 2007). In fact, the disruption of a single miRNA–target relationship between let-7 and the oncogene HMGA2 is enough to promote oncogenic transformation, and disruptions of this relationship occur frequently in human cancers (Mayr et al., 2007). More broadly, miRNAs comprise just one class of molecule in a larger paradigm of biological regulation by small noncoding RNAs. This paradigm emerged from a collection of mysterious observations in plants, animals, and fungi. In plants and fungi, separate efforts to overexpress genes unexpectedly caused silencing of both the exogenously introduced gene and endogenous genes with the same sequence (Napoli et al., 1990; van der Krol et al., 1990; Romano and Macino, 1992). In animals, antisense nucleic acids had been used to inhibit endogenous gene expression, presumably by forming duplexes with the target mRNA (Izant and Weintraub, 1984), although the method was curiously successful when either sense or antisense RNAs were injected (Fire et al., 1991; Guo and Kemphues, 1995). It was later discovered that the efficacy of inhibition could be enhanced over 100-fold in C. elegans when both sense and antisense RNAs were injected (Fire et al., 1998). These observations, disparate in method, goals, and even phylogenetic kingdom, likely had a single commonality: the intentional or unintentional introduction of double-stranded RNA (dsRNA) (Montgomery and Fire, 1998). In plants, induction of post-transcriptional gene silencing by dsRNA caused the accumulation of smaller, ~25 nt RNA fragments (Hamilton and Baulcombe, 1999). In animals, the dsRNA was later shown to be processed into small, 21-22 nt fragments (Hammond et al., 2000; Zamore et al., 12 2000), and these fragments are the active species that mediate silencing (Elbashir et al., 2001a; Elbashir et al., 2001b). Since then, thousands of studies have used artificial dsRNA or the small active species to silence genes of interest, a technique called RNA interference (RNAi). Yet the use of RNAi as a tool belies the importance of the many forms of endogenous silencing, each mediated by distinct small RNAs (Ketting, 2011). With the exception of PIWI-interacting RNAs, the various small RNAs are derived from paired RNA, including transcribed hairpins with long stems, dual sense and antisense transcripts from a genomic locus, and duplexes synthesized by RNA-dependent RNA polymerases. They have a variety of evolutionarily-conserved biological functions, including the regulation of gene expression by degrading messages, repressing translation, or modifying chromatin; and the defense against viruses and other invasive genetic elements by cleaving gene products or the genomes themselves. These small RNAs and their diverse biological functions are interconnected by a web of related biogenesis and effector mechanisms, including those mediated by the RNase III and Argonaute protein families, leading to the view that RNAi is ancient and pervasive. It is ironic that the herald of this paradigm, lin-4, was once thought to be an oddity of nematode development. Understanding the cellular definition of an animal miRNA To study the common properties of miRNAs, it is crucial to distinguish those RNAs that belong to the miRNA class from others. Accordingly, a set of criteria was adopted for classifying small RNAs as miRNAs (Ambros et al., 2003). One set of criteria relates to size and expression: miRNAs should be ~22 nt RNAs, and thus detectable in cellular RNA by methods such as small RNA blotting or cDNA sequencing (Ambros et al., 2003). The second relates to origin: miRNAs should be derived from the stem region of relatively regular hairpins, without large internal loops or bulges; ideally, the pairing in the hairpin should be conserved, and the hairpin should be cleaved by an RNase III enzyme called Dicer (discussed below) (Ambros et al., 2003). Although these criteria are useful for human minds to classify certain small RNAs as miRNAs, they do not answer an important question: what is a miRNA to the cell? Since miRNAs are derived from precursor RNAs much longer than the mature miRNA, cells must somehow recognize certain RNA species as miRNA precursors, as distinct from precursors of 13 pri-miRNA pri-miRNA Unstructured terminal loop (? >10 nt) Recognition and binding (Microprocessor) ~1 helical turn Stem Watson–Crick pairing (~ 3 helical turns) Downstream unstructured sequence (> 20 nt) Upstream unstructured sequence (>20 nt) Basal stem junction Cleavage (Microprocessor) pre-miRNA Nuclear-cytosolic export (Exportin 5) Stem Watson–Crick pairing 3′ overhang (2-8 nt) P OH Unstructured terminal loop (? >14 nt) 3′ overhang (2 nt) 5′ phosphate 3′ hydroxyl 3′ hydroxyl OH Cleavage (Dicer) p Central stem mismatches OH 5′ phosphate Argonaute loading (RISC loading complex) P P Weaker pairing stability OH ~2 helical turns Stem Watson–Crick pairing (~ 2 helical turns) P OH other RNA species, and process the miRNA precursors accordingly. Thus, from the cellular perspective, miRNAs are defined by the ability of their precursors to enter a specific biogenesis pathway; the miRNAs themselves are simply the functional products of this pathway. To understand the cellular definition of an animal miRNA, we must therefore consider their biogenesis and the specificity of each step in the pathway for particular RNAs (Figure 1). The biogenesis of miRNAs In the canonical pathway of miRNA biogenesis, primary miRNA transcripts (primiRNAs) are synthesized by RNA polymerase II (Lee et al., 2004a) as noncoding transcripts, or as embedded sequences within introns of protein-coding “host” genes. While still in the nucleus, the pri-miRNA is cleaved (Lee et al., 2002). This cleavage is carried out by the “Microprocessor,” a large protein complex composed of Drosha, an RNase III enzyme, and a protein cofactor DGCR8, called Pasha and Psh-1 in Drosophila melanogaster and C. elegans, respectively (Lee et al., 2003; Denli et al., 2004; Gregory et al., 2004; Han et al., 2004; Landthaler et al., 2004). DGCR8 is thought to recognize the junction between the miRNA hairpin and flanking single strand RNA, positioning Drosha to cleave approximately one helical turn above the junction (Han et al., 2006; Yeom et al., 2006). The resulting hairpin is termed the precursor miRNA (pre-miRNA), and consists of a ~2-turn stem with a characteristic 2 nt 3′overhang. This distinctive hairpin is exported from the nucleus to the cytosol by exportin-5 (Yi et al., 2003; Bohnsack et al., 2004; Lund et al., 2004); in species where there is no exportin-5 ortholog, the pre-miRNA presumably makes use of exportin-t instead (Murphy et al., 2008). In the cytosol, the pre-miRNA is cleaved by a complex of proteins containing another RNase III enzyme called Dicer (Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001). For most miRNAs, one strand is preferentially loaded into an Argonaute family member based on the thermodynamic stability of the Dicer product (Khvorova et al., 2003; Schwarz et al., 2003). The mature miRNA strand and its Argonaute protein partner form the core of the silencing complex (Liu et al., 2004; Meister et al., 2004). Figure 1. Summary of the biogenesis of miRNAs and determinants in intermediate RNA species. For each intermediate along the path to maturity, determinants are shown that promote the processing of that intermediate. 15 The vast majority of annotated miRNAs mature through the canonical pathway, based on their dependencies on DGCR8/Pasha and Dicer (Calabrese et al., 2007; Wang et al., 2007; Babiarz et al., 2008). However, several miRNAs make use of alternative pathways which bypass various steps of the canonical pathway. For the miRNA introns, or “mirtrons,” one or both ends of the pre-miRNA are established by the spliceosome during intron excision (Okamura et al., 2007; Ruby et al., 2007a). For many mitrons, the debranched introns have all the features of a canonical pre-miRNA, including a 2 nt 3′-overhang, and they are exported, diced, and loaded like canonical pre-miRNAs (Okamura et al., 2007; Ruby et al., 2007a). In other mirtrons, the 3′ splice sites are downstream of the pre-miRNA ends; in these cases, the intron 3′ end is trimmed by the exosome before dicing (Flynt et al., 2010). Endogenous short hairpin RNAs also bypass Microprocessor cleavage. Although not well-studied as a class, these noncanonical miRNAs are probably derived from short transcription units that intrinsically produce a hairpin with the features of a pre-miRNA (Babiarz et al., 2008). A third alternative pathway bypasses Dicer. Like canonical miRNAs, the primary transcript of mir-451 is cleaved by the Microprocessor and the pre-miRNA is exported to the cytosol; however, unlike canonical pre-miRNAs, pre-mir-451 is cleaved by Argonaute 2 (Ago2) (Cheloufi et al., 2010; Cifuentes et al., 2010). Known determinants of pri-miRNA processing Two parallel lines of investigation converged on the early identification of Drosha in animals. One was incidental; the drosha locus in Drosophila melantogaster was encountered during genomic analysis of the rnh1 locus encoding RNase H1 amid questions about the role of RNase H proteins in animal biology. Microdeletions in a region adjacent to rnh1 caused lethality, and sequencing of this region revealed an open reading frame predicted to produce a 153 kDa protein with homology to the endonuclease domain of bacterial and yeast RNase III proteins (Filippov et al., 2000). Named Drosha, the the novel protein contained two tandem endonuclease domains instead of just one, and gene database searching revealed highly related homologues with tandem RNase III domains in the genomes of both C. elegans and humans (Filippov et al., 2000). Thanks to the growing power of large sequence databases, it was already known that two classes of RNase III enzymes were present in animals (Mian, 1997); Drosha was one type, while the other helicase-like (Rotondo and Frendewey, 1996) and would later be named Dicer (discussed below). At the same time, a separate group interested in mammalian 16 RNase III proteins used a phage cDNA clone library to build the full-length cDNA of a human protein containing RNase III domains (Wu et al., 2000). This protein degraded long dsRNA, albeit much more poorly than the E. coli RNase III. Importantly, the protein was nuclearlocalized, and it was believed to mediate rRNA processing, based on the functions of RNase III in yeast and bacteria (Wu et al., 2000). Because of its localization, the protein was named RNASEN, the human nuclear RNase III, but is now called Drosha like its ecdysozoan orthologs. The proteins remained unassociated with RNA interference and the biogenesis of miRNAs until two observations were made: first, that the initial post-transcriptional event of miRNA biogenesis was the cleavage of pri-miRNAs in the basal hairpin stem, an activity that was localized to the nucleus (Lee et al., 2002); and second, that the pre-miRNA product of this first cleavage had 2 nt 3′-overhangs, precisely the expected product of staggered cleavage by RNase III enzymes (Lee et al., 2003). Indeed, immunoprecipitated human Drosha (i.e., RNASEN) accurately excised pre-miRNAs from longer primary transcripts (Lee et al., 2003). In human and Drosophila lysates, Drosha and pri-miRNA cleavage activity fractionates with a ~600 kDa complex which has been called the Microprocessor (Denli et al., 2004; Gregory et al., 2004). In this complex, Drosha is tightly associated with a binding partner called Pasha in Drosophila and C. elegans, and DGCR8 in humans, which are homologues of each other (Denli et al., 2004; Gregory et al., 2004). DGCR8/Pasha is required for Microprocessor cleavage activity both in vitro and in vivo (Denli et al., 2004; Gregory et al., 2004), and recombinant human Drosha and DGCR8 are together sufficient to reconstitute the pri-miRNA cleavage in vitro (Gregory et al., 2004). DGCR8/Pasha contains two double-strand RNA binding domains (dsRBDs), and the presence of at least one is required to support pri-miRNA cleavage by the Microprocessor (Yeom et al., 2006). Thus, DGCR8/Pasha probably contributes significantly to pri-miRNA binding and recognition, since Drosha itself contains just one dsRBD (Lee et al., 2003). The functions of Drosha and Pasha/DGCR8 are so entwined that the Drosha protein is unstable in the absence of DGCR8, and Drosha regulates DGCR8 levels by cleaving a hairpin in the 5′ untranslated region of the DGCR8 mRNA (Han et al., 2009). General preferences of the Microprocessor Given its role as the gateway to the canonical miRNA biogenesis pathway, the Microprocessor and its substrate preferences have been subjected to intense scrutiny. Minimal 17 substrates for the Microprocessor in vitro are composed of the pre-miRNA hairpin flanked by at least 20-50 nt of genomic sequence; the determinants in these segments are necessary and sufficient to support at least minimal cleavage by the Microprocessor in vitro (Lee et al., 2003) and expression of the mature miRNA in vivo (Chen et al., 2004). This region is important in part because of Watson–Crick base pairing that extends basally to the pre-miRNA hairpin. Mutations that abolish base pairing impair cleavage, while mutations that preserve base pairing preserve cleavage, albeit at reduced efficiency (Lee et al., 2003). This pairing is consistent with the observation that C. elegans, Drosophila, and human pri-miRNAs strongly tend to have base pairing that extends beyond the pre-miRNA hairpin (Lim et al., 2003b; Han et al., 2006). Beyond the stem, a length of unstructured RNA flanking the stem is required for pri-miRNA processing, since the Microprocessor does not cleave substrates lacking flanking RNA (Zeng and Cullen, 2005; Han et al., 2006). The Microprocessor is thought to recognize the flank-stem junction, and uses the junction to guide cleavage approximately one helical turn above the base (Han et al., 2006). Consistent with this model, mutations that shorten or lengthen the base of the stem shift Microprocessor cleavage site accordingly, at least in vitro (Han et al., 2006). The structural basis of Microprocessor binding to the stem-flank junction is poorly understood. It was initially suggested that DGCR8 was responsible for junction recognition (Han et al., 2006), but substrate affinity studies with the DGCR8 dsRBDs have not consistently demonstrated that the dsRBDs can distinguish between hairpins and hairpins with flanking RNA (Sohn et al., 2007). It is worth noting that binding affinities measured in this study were in the 2–4 µM range, far higher than expected, suggesting that physiological binding of the dsRBDs to pri-miRNAs may require the presence of other domains in DGCR8, or a functional complex between DGCR8, Drosha, and perhaps other proteins in the Microprocessor complex. Others have suggested that DGCR8 binding to the pri-miRNA is cooperative, and that DGCR8 monomers can bind to multiple regions of the pri-miRNA (Faller et al., 2010). Whether this multimerization contributes to substrate specificity is unclear. The apical stem and terminal loop also contribute to recognition of pri-miRNAs by the Microprocessor, although their importance has been debated. Earlier in the characterization of the Microprocessor, it was suggested that the Drosha cleavage site was established by a molecular ruler two turns away from the loop (Zeng et al., 2005). This model was unsatisfying because it failed to accurately predict the Drosha cleavage site based on the thermodynamically 18 predicted stem terminus; by contrast, basal stem length robustly predicts cleavage site, even in artificial substrates that have no loop at all (Han et al., 2006). As a consequence, loop-based measurement is not widely accepted as the mechanism of cleavage site selection. Nevertheless, shortening of either the terminal loop or the apical stem impairs binding to DGCR8 and cleavage by the Microprocessor (Zeng et al., 2005; Han et al., 2006; Zhang and Zeng, 2010). The optimal length of the loop has not been determined, but the optimal length of the apical stem appears to be ~2 helical turns above the Microprocessor cleavage site. Thus the optimal structure of a primiRNA is a ~3-turn hairpin (one turn between the base and the cleavage site, and two turns between the cleavage site and the loop) flanked on each side by some length of unstructured RNA (Figure 1). Existing studies of Microprocessor preferences also hint at the existence of other, less well-defined sequence or structural determinants. First, sequence analysis of miRNA hairpins demonstrates a propensity for these hairpins to contain internal loops that are reasonably symmetric (Lim et al., 2003b; Han et al., 2006; Warf et al., 2011). These could contribute somehow to enhance binding and cleavage by the Microprocessor, although an early study of the hsa-mir-30a pri-miRNA suggested that its internal loops were dispensable (Lee et al., 2003). Alternatively, central internal loops could inhibit inappropriate or non-productive binding or cleavage in the apical stem, one helical turn from the stem-loop junction, thus biasing the Microprocessor to cleave at the appropriate location (Han et al., 2006). Central loops or bulges could also facilitate miRNA biogenesis downstream of the Microprocessor, particularly at the step of loading into Argonaute (discussed below). Second, a number of stem and loop mutations that impair Drosha processing in vitro have been described (Zeng et al., 2005; Gottwein et al., 2006), along with some single nucleotide polymorphisms (SNPs) thought to impair in vivo processing (Duan et al., 2007; Sun et al., 2009). The significance of these mutations in the hairpin is unclear. One view is that these mutations could be altering critical sequence motifs that are recognized by the Microprocessor or auxiliary recognition proteins, although such motifs were not delineated in the studies. Another view is that such mutations could substantially change the pri-miRNA folding landscape, biasing the ensemble of folding isoforms away from the optimal structure and thus preventing proper cleavage (P. Dallaire, personal communication). differentiate between these two models. 19 Additional investigation is needed to Finally, the lengths of upstream and downstream RNA required for pri-miRNA processing in vivo (Chen et al., 2004) are not fully explained by the stem-flank junction model for pri-miRNA cleavage, since just a few flanking nucleotides are sufficient for cleavage site determination in artificial substrates (Han et al., 2006). It is possible that additional primary sequence or structural determinants could reside in the RNA sequence flanking the stem. Indeed, a SNP downstream of the hsa-mir-16-1 pri-miRNA impairs its processing in cell lines, and is associated with B-cell chronic lymphocytic leukemia in humans (Calin et al., 2002; Calin et al., 2005). It is tempting to speculate that this SNP has affected recognition of a motif downstream of the pri-miRNA hairpin, although no specific motifs were identified in those studies. Regulation of cleavage in subsets of animal pri-miRNAs Several mechanisms for the dynamic regulation of pri-miRNA cleavage have been described. These mechanisms are typically mediated by individual proteins, and may only affect a subset of miRNAs at specific times. Most described regulatory mechanisms are thought to mildly enhance pri-miRNA cleavage by the Microprocessor, while the remaining ones sequester the pri-miRNA or induce its active degradation without necessarily inhibiting the activity of the Microprocessor per se. Some regulators of pri-miRNA cleavage appear to depend on binding to the pri-miRNA terminal loop. One example is the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1), a highly abundant protein thought to function in general mRNA metabolism, including splicing, mRNA export, and regulation of stability (Dreyfuss et al., 1993). A transcriptome-wide study of hnRNPA1 binding sites by crosslinking and immunoprecipitation incidentally found binding sites in the terminal loop of hsa-mir-18a; of note, there were no observed binding events to other terminal loops in the mir-17~92 cluster of miRNAs, of which mir-18a is a member (Guil and Caceres, 2007). Binding of hnRNPA1 is thought to promote Microprocessor cleavage of the mir-18a pri-miRNA by altering the conformation of the mid-stem (Michlewski et al., 2008), based on changes in RNase V1 accessibility, although it is not clear whether this is the consequence of stem melting or is actually due to occlusion of the RNase V1 cleavage sites by hnRNPA1 binding. This study noted that miRNA loops are generally more conserved than nearby sequence (although considerably less conserved than the mature miRNA and the miRNA* strand), suggesting that other proteins may also have conserved binding sites in many 20 pri-miRNAs (Michlewski et al., 2008). Nonetheless, the role of hnRNPA1 in pri-miRNA cleavage may not be straightforward; its binding to the let-7a pri-miRNA appears to antagonize cleavage by the Microprocessor (Michlewski and Caceres, 2010). How hnRNPA1 enhances processing in some pri-miRNAs but represses processing in others is not clear. Similarly, an RNA-binding protein that plays multiple roles in RNA metabolism, the KHtype splicing regulatory protein (KSRP), appears to bind the terminal loop of the let-7a primiRNA and enhance its cleavage by both the Microprocessor and Dicer (Trabucchi et al., 2010). KSRP may also regulate other miRNAs, based on reductions in the mature miRNA levels after KSRP knockdown. It has also been suggested that KSRP may compete with hnRNPA1 for binding to the let-7a terminal loop, and that the relative expression levels of these antagonistic proteins may dynamically establish the processing efficiency for let-7a (Michlewski and Caceres, 2010). The cleavage of a subset of pri-miRNAs is thought to be regulated by extracellular signals. Signaling through the transforming growth factor β (TGFβ) pathway upregulates mature miR-21 by increasing processing of the mir-21 pri-miRNA (Davis et al., 2008). The effector proteins downstream of TGFβ are the Smad proteins, which trimerize and act as transcription factors. TGFβ signaling appears to induce the association of both SMAD proteins and the helicase p68 with the hsa-mir-21 pri-miRNA, which results in enhanced cleavage by the Microprocessor (Davis et al., 2008). This effect of TGFβ signaling does not depend on Smad4, which is usually a necessary cofactor in trimeric Smad complexes that regulate transcription (Davis et al., 2008). Many miRNAs appear to be regulated by TGFβ signaling, and appear to have the common sequence motif 5′−CAGAC−3′ 3′−GUCUG−5′ in the mid-stem adjacent to the Microprocessor cleavage site (Davis et al., 2010). This motif, when grafted onto non-regulated pri-miRNAs, is sufficient to confer TGFβ regulation (Davis et al., 2010). Oddly, this is nearly exactly the 5′−AGAC−3′ canonical Smad binding element 3′− TCTG−5′ in DNA which the Smad MH1 domain recognizes by inserting a β-sheet “hairpin” into the major groove (Shi et al., 1998). The MH1 domain is also thought to mediate binding to pri-miRNAs (Davis et al., 2010), but it is not clear whether the domain is capable of inserting into the deeper and narrower major groove of the presumably Aform pri-miRNA stem. The Smad proteins are not the only transcription factor proteins proposed to bind primiRNAs and regulate their cleavage. The transcription factor All1 has been suggested to 21 enhance Microprocessor cleavage, but its association with the Microprocessor is DNAdependent (Nakamura et al., 2007), raising the possibility that the Microprocessor could be regulated by recruitment to sites of active transcription. Two other examples are the DEAD-box helicases p68 and p72, which are multifunctional DEAD-box helicases that, among other things, activate transcription in collaboration with other transcriptional regulators like p53 and steroid receptors. The mouse knockouts of these proteins resulted in reductions in the mature levels of a handful of mature miRNAs (Fukuda et al., 2007). The ability of Drosha to crosslink to these primiRNAs is impaired in the absence of these helicases, and antibodies against the helicases inhibit processing in vitro (Fukuda et al., 2007). It is not clear how these proteins promote cleavage by the Microprocessor, and how the effect is restricted to a subset of pri-miRNAs. Nevertheless, the association of these helicases with the critical DNA damage response regulator p53 instigated an analysis of the effect of p53 activation and loss-of-function on miRNA levels. Activation of p53 by doxorubicin increased the expression of a subset of mature miRNAs without affecting pri-miRNA levels, and this effect was dependent on p68 and p72 (Suzuki et al., 2009). Addition of p53 also enhanced pri-miRNA cleavage by the Microprocessor for some of these miRNAs (Suzuki et al., 2009). As with p68 and p72, the mechanism of p53-mediated enhancement of pri-miRNA cleavage and how the effect is restricted to a few pri-miRNAs remain to be elucidated. Further study could also shed light on whether different p53 mutations affect the levels of subsets of mature miRNAs, and whether these effects contribute to the pathophysiology of cancer. Cleavage of pri-miRNAs can also be regulated by “anti-determinants” that inhibit cleavage or stimulate degradation of the pri-miRNA. One of the earliest examples was the regulation of let-7 family members by binding of Lin28A to the pri-miRNA terminal loop; this binding was mediated by a specific motif which may be present in other pri-miRNAs (Newman et al., 2008; Piskounova et al., 2008; Nam et al., 2011) . Binding was thought to inhibit cleavage of let-7 pri-miRNAs, and this negative regulation helped maintain the pluripotent state in embryonic stem cells (Viswanathan et al., 2008). This mechanism was controversial because Lin-28A is largely localized to the cytosol in embryonic stem cells, and later work demonstrated that Lin-28A could induce the 3′ terminal polyuridylation of the let-7 pre-miRNA and subsequent degradation (discussed below). Recently, however, it has been shown that a closely related paralog, Lin-28B, also inhibits biogenesis of let-7 family members, but is localized to the 22 nucleus and does not appear to interact with terminyl uridyl transferases (Piskounova et al., 2011). In particular Lin-28B localizes to nucleoli, where there is little Microprocessor localization, leading to the model that Lin-28B sequesters let-7 pri-miRNAs to a location where it is inaccessible to the Microprocessor. Another example of negative regulation is ADAR1-mediated RNA editing, which causes the active degradation of some pri-miRNAs. ADAR1, an adenosine deaminase, catalyzes the conversion of adenosine to inosine in dsRNA. Editing can be highly specific, with editing of some adenosines but not others within the same substrate, although the contextual determinants that specify the edited adenosines are not well understood (Nishikura, 2010). A-to-I editing is detectable in human and mouse precursor and mature miRNAs, and is confined to brainexpressed miRNAs, consistent with the tissue expression pattern of ADAR1 (Blow et al., 2006; Landgraf et al., 2007; Kawahara et al., 2008; Chiang et al., 2010). The significance of mature miRNA editing is unclear, although the editing has a propensity to occur in the miRNA seed, opening the possibility that editing alters the target profile of the mature miRNA (Chiang et al., 2010). By contrast, the editing of two pri-miRNAs, mir-142 and mir-151, induces the degradation of the pri-miRNAs by TudorSN (Yang et al., 2006). Thus A-to-I editing may inhibit the processing of a subset of miRNAs in a tissue-specific manner. In summary, several regulatory paradigms have been described that influence the cleavage of subsets of pri-miRNAs. Since these regulatory schemes affect only subsets of primiRNAs, it seems unlikely that they, either individually or in aggregate, could explain how the Microprocessor recognizes pri-miRNAs. Nevertheless, these studies add nuance to our understanding of how processing can be sensitive to cell type, gene expression state, and extracellular signals, and could provide insight into how the dysregulation of pri-miRNA processing could contribute to human disease. Plant pri-miRNA processing: DCL1 Like the animal miRNAs, plant miRNA biogenesis depends on an RNase III protein. CARPEL FACTORY (CAF) was identified in a genetic screen for abnormal flower development (Jacobsen et al., 1999), and encodes an RNase III protein. Its developmental phenotypes and homology homology to the animal Dicer proteins inspired experiments that demonstrated its importance in miRNA biogenesis (Park et al., 2002; Reinhart et al., 2002). CAF was later 23 renamed DICER-LIKE1 (DCL1) in recognition of its sequence and functional homology (Schauer et al., 2002). DCL1 has functions in plants equivalent to those of both Drosha and Dicer in animals (Park et al., 2002; Reinhart et al., 2002; Kurihara and Watanabe, 2004). Like the animal Microprocessor, DCL1 and its partners SERRATE and HYPONASTIC LEAVES (HYL1) appear to recognize junctions between unstructured RNA in internal loops and base paired stems; the cleavage site is typically 15 bp above the junction (Dong et al., 2008; Mateos et al., 2010; Song et al., 2010). However, no additional determinants have been identified that might shed light on how the DCL1 complex distinguishes the appropriate loop-stem junction corresponding to the pri-miRNA cleavage site from other loop-stem junctions in the pri-miRNA, much less how DCL1 distinguishes pri-miRNAs from other structured RNAs. One study partially-randomized pri-miRNA sequences, expressed the sequences in plants, and selected functional molecules based on the miRNA overexpression phenotype (Mateos et al., 2010). In principle, this approach could explore the DCL1 cleavage determinants in great detail, but, in practice, the low numbers of variants that could be tested limited the study’s ability to find determinants other than the 15 bp basal stem. Determinants of canonical biogenesis downstream of the Microprocessor Specificity of nuclear export mediated by exportin-5 The product of Microprocessor cleavage, called the pre-miRNA, is exported from the nucleus by exportin-5 in animals (Yi et al., 2003; Bohnsack et al., 2004; Lund et al., 2004), and by the homologous protein HASTY in Arabidopsis (Park et al., 2005). Initially characterized due to its sequence homology to other karyopherin β proteins, exportin-5 was at first thought to recognize and export dsRBD containing proteins (Brownawell and Macara, 2002). It was soon shown to export tRNAs and the adenovirus VA1 RNA, both RNAs that contain helices with 3′ overhangs (Bohnsack et al., 2002; Calado et al., 2002; Gwizdek et al., 2003). In fact, short artificial helices with single-stranded 3′-overhangs are sufficient for exportin-5 recognition and RanGTP-mediated nuclear export (Gwizdek et al., 2003). A crystal structure of exportin-5 in complex with a pre-miRNA stem explains this preference: the protein is shaped like a mitt with a positively charged “palm” that partially wraps the helix, with a positively-charged tunnel at the 24 base of the mitt that accommodates the 3′-overhang (Okada et al., 2009). This tunnel is oriented in such a way that threading a 2 nt 5′-overhang through it results in a steric clash between the 3′ end and the protein (Okada et al., 2009), explaining the specificity for 3′-overhangs. Importantly, all protein–RNA contacts are mediated through the phosphate backbone (Okada et al., 2009), consistent with the view that exportin-5 substrates are defined by their end structure rather than by sequence. Since RNase III family proteins produce 2 nt 3′-overhangs, exportin-5 is theoretically capable of exporting any product of Drosha cleavage, and seems unlikely to impose additional constraints on miRNA maturation. Specificity of cleavage by Dicer The identification of Dicer emerged from studies of animal RNA interference (RNAi), a phenomenon where the introduction of exogenous dsRNA derived from the sequences of a protein-coding gene induced post-transcriptional silencing of that gene (Fire et al., 1998). Although the dsRNA used to induce RNAi was hundreds of nucleotides long, studies of RNAi using a Drosophila in vitro lysate system revealed that the long dsRNA was cleaved at regular, 21–22 nt intervals (Zamore et al., 2000). Fractionation of the lysate showed that these short fragments were associated with target mRNA cleavage activity (Hammond et al., 2000). These observations led to the view that the long dsRNA is actually a precursor molecule, and the small fragments derived from it are the active species that guide and induce target mRNA cleavage. This view was later strengthened by the demonstration that synthetic 21–22 nt fragments were sufficient to induce mRNA cleavage in the Drosophila lysate and post-transcriptional gene silencing in mammalian cells (Elbashir et al., 2001a; Elbashir et al., 2001b; Nykanen et al., 2001). Given that the RNase III family of enzymes was known to cleave dsRNA into discretelysized products, it seemed likely that the dsRNA-cleaving enzyme would contain an RNase III domain (Bass, 2000). At that time, the only animal RNase III enzyme that had been described was Drosha (discussed previously), but analysis of the then-newly available Drosophila melanogaster and C. elegans genomes picked up three additional, unnamed proteins containing tandem RNase III domains, one in C. elegans and two in Drosophila (Bernstein et al., 2001). The Drosophila and human enzymes were sufficient to produce the ~22 nt active fragments from long dsRNA, and were named “Dicer” accordingly (Bernstein et al., 2001). Consistent with its 25 role in generating the active RNAi-inducing species, loss of Dicer in Drosophila cells and C. elegans abolished RNAi (Bernstein et al., 2001; Knight and Bass, 2001). This characterization of the RNAi phenomenon dovetailed with studies of miRNA biogenesis when Dicer was shown to be necessary for the maturation of lin-4 and let-7 (Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001). In Drosophila, Dicer proteins are functionally specialized; Dicer-1 (Dcr-1) cleaves pre-miRNA hairpins, while Dicer-2 (Dcr-2) processes long dsRNA into siRNAs (Lee et al., 2004b). Like Drosha, Dicer requires the association of dsRBD-containing partners for full activity. Drosophila Dicer-1 (Dcr-1) is associated with Loquacious (Loqs) (Forstemann et al., 2005; Jiang et al., 2005; Saito et al., 2005), while Dicer-2 (Dcr-2) is associated with R2D2 (Liu et al., 2003) and additionally depends on isoforms of Loqs for some substrates (Czech et al., 2008; Okamura et al., 2008; Hartig et al., 2009; Zhou et al., 2009). In mammals, Dicer is associated with the TAR-element binding protein (TRBP) (Chendrimada et al., 2005; Haase et al., 2005) and another related protein called PACT (Lee et al., 2006). In humans, TRBP does not appear to be required for Dicer cleavage of pre-miRNAs (Chendrimada et al., 2005), although its presence enhances Dicer catalysis of pre-miRNAs and long dsRNA (Chakravarthy et al., 2010). Similarly, Drosophila Dicer2 processes dsRNA efficiently without R2D2 (Liu et al., 2003). By contrast, the association of Drosophila Dcr-1 with Loqs is required for processing of the premiRNAs in flies (Forstemann et al., 2005; Jiang et al., 2005; Saito et al., 2005), although some pre-miRNAs may not depend on Loqs (Liu et al., 2007). No canonical set of substrate binding preferences have been ascribed to the individual Dicer binding partners, but the Dicer binding partners can assist in restricting the specificity of Dicer paralogs to specific substrates (Cenik et al., 2011). The principal determinant of Dicer cleavage is the structure of the dsRNA ends. Human Dicer preferentially cleaves from the ends of a dsRNA, suggesting that the phase of dsRNA cleavage products is set by successive cleavage from the ends (Zhang et al., 2002). Based on systematic mutagenesis of Dicer, a model for substrate recognition was developed where the Dicer PAZ domain binds the duplex ends and positions the RNase III domains to cleave the dsRNA helix (Zhang et al., 2004); once the PAZ domain is positioned, the three-dimensional organization of domains in Dicer proteins of different organisms sets the length of Dicer products (Lau et al., 2012). The PAZ domain has a specific preferences for 3′ overhangs at the 26 end of the duplex; a 2 nt overhang with a free 3′-OH is the optimal structure, consistent with crystal structures of the PAZ domain in complex with a duplex RNA (Ma et al., 2004). In fact, the recognition of this end structure resides entirely in the PAZ domain: swapping the PAZ domain for the RNA binding domain of the spliceosomal protein U1A converts Dicer’s preference for 2 nt 3′-overhangs to a preference for the U1 RNA loop (MacRae et al., 2007). Consistent with the binding of its PAZ domain, Dicer cleavage of hairpins with 3′-overhangs is more efficient than cleavage of substrates with blunt ends (Vermeulen et al., 2005). The nucleotide identities of the overhanging bases influence binding affinity to the PAZ domain and Dicer cleavage efficiency, but their contribution is small relative to that of the 2 nt 3′-overhang (Ma et al., 2004; Vermeulen et al., 2005). Whether the 5′-phosphate or the 3′-OH in the overhang is more important for defining the cleavage site is currently debated. Based on the crystal structure of Giardia Dicer, it was believed that Dicer measured its cleavage site from the 3′-OH of the overhang (Macrae et al., 2006; MacRae et al., 2007). However, it has been recently proposed that the human Dicer primarily measures from the 5′-phosphate, and uses measurement from the 3′-OH as a backup mechanism, explaining why Dicer cleavage sites are preserved in 3′ uridylated pre-miRNAs (Park et al., 2011). Either way, pre-miRNAs with 2 nt 3′-overhangs are optimal substrates, since 5′ or 3′ single-stranded RNA extensions reduce cleavage efficiency, even when the cleavage site per se is unaffected (Park et al., 2011). Thus Dicer and its PAZ domain have evolved to recognize substrates produced by other RNase III cleavage events, consistent with its role in biogenesis downstream of Drosha and exportin-5. In addition to the overhang structure, the Drosophila Dcr1-Loqs complex appears to prefer pre-miRNA-like hairpins with ~22 bp stems capped by a 14 nt loop over substrates with longer stems and/or shorter loops (Miyoshi et al., 2010; Tsutsumi et al., 2011). Recognition of pre-miRNA loops depends on the Dcr-1 helicase domain, consistent with a structural model based on electron microscopy that localizes the Dicer helicase domains to the pre-miRNA apical stem and loop (Lau et al., 2012). Similarly, human Dicer cleavage is moderately impaired when the pre-miRNA contains an unusually small loop or short stem (Zhang and Zeng, 2010). This preference may contribute to the propensity of pri-miRNA hairpins to be ~3 helical turns long. One possible model is that the Microprocessor has an intrinsic preference for 3-turn helices with appropriately-sized terminal loops (Zeng et al., 2005; Gottwein et al., 2006; Zhang and Zeng, 27 2010). Dicer then reinforces this structural requirement by preferentially cleaving pre-miRNAs with ~2 helical turns and the same terminal loop, corresponding to the Microprocessor cleavage product. To the extent that Dicer has primary sequence preferences in the pre-miRNA stem, they are likely to pale beside the structural determinants. Short hairpin RNAs (shRNAs) have been used extensively to repress the expression of target genes. In most experimental systems, shRNAs are transcribed by RNA polymerase III to produce hairpins similar in structure to premiRNAs, which are cleaved by Dicer and ultimately induce repression of target mRNAs (Brummelkamp et al., 2002; Paddison et al., 2002). Libraries consisting of hundreds of thousands of artificial shRNAs have been generated for the purpose of performing loss-offunction genetic screens in mammalian cells; the sequences of the shRNA stems are very diverse, since it is these sequences which specify targets for cleavage by Argonaute2 within the RNA-induced silencing complex. Despite the sequence diversity of the library, most examined shRNAs are cleaved by Dicer and repress target gene expression (Moffat et al., 2006), demonstrating that Dicer cleaves many hairpins with different sequences but identical structures. Specificity of loading into Argonaute Argonaute family proteins form the core of the effector complex that represses targets in a variety of RNA-induced silencing pathways, including the miRNA pathways in both animals and plants. The founding member of the family, Arabidopsis AGO1, was identified genetically in a screen for mutants that cause altered leaf morphology (Bohmert et al., 1998). At the same time, genetic studies of C. elegans RNAi identified an Argonaute homolog, rde-1, which was required for RNAi. RNAi seemed similar to a plant phenomenon called post-transcriptional gene silencing (PTGS), in which the introduction of exogenous transgenes appeared to silence endogenous genes with the same sequence. This led to the hypothesis that Arabidopsis AGO1 might be related to PTGS; indeed, AGO1 mutants were defective in PTGS (Fagard et al., 2000). Meanwhile, studies of RNAi in extracts had shown the existence of a nuclease activity that specifically cleaved target mRNAs, and the protein complex that contained the activity was called the RNA Induced Silencing Complex (RISC) (Tuschl et al., 1999; Hammond et al., 2000; Zamore et al., 2000). When RISC was fractionated, a human Argonaute homolog, Argonaute2 28 (Ago2), copurified with the nuclease activity (Hammond et al., 2001), and purified Ago2 carried out small RNA guided target cleavage (Liu et al., 2004; Meister et al., 2004; Rivas et al., 2005). Within a given animal or plant species, many Argonaute homologs can be present, which may be associated with different small RNAs derived from various biogenesis pathways (Ketting, 2011). For example, the 27 Argonaute superfamily members in C. elegans associate with Piwi-associated RNAs, endogenously derived siRNAs, exogenously derived siRNAs, and other small RNAs; of the 27, only ALG-1 and ALG-2 associate with miRNAs (Grishok et al., 2001). In Drosophila, miRNAs primarily associate with Ago1 (Caudy et al., 2002; Miyoshi et al., 2009), while in mammals miRNAs are associated with Ago1, Ago2, Ago3, and Ago4 (Liu et al., 2004; Meister et al., 2004). Given the plethora of small RNAs and their sorting into different Argonautes, it is understandable that the Argonaute-loading process inspects Dicer cleavage products for specific features. Among miRNA-associated Argonautes in animals, Drosophila Ago1 and its loading are best understood. One principal determinant is stem secondary structure: Ago1 prefers RNA duplexes with central mismatches, particularly at positions 9-10 of the loaded strand (Forstemann et al., 2007; Tomari et al., 2007; Kawamata et al., 2009). Likewise, central mismatches drive small RNAs into C. elegans ALG-1 (Steiner et al., 2007). Two other determinants relate to strand selection: of the two strands in the Dicer cleavage product, only one strand is preferentially loaded into Ago1. The 5′ end of the loaded strand is usually derived from the less thermodynamically stable end of the duplex (Schwarz et al., 2003), and the 5′ nucleotide of the loaded strand is usually U (Czech et al., 2009; Okamura et al., 2009; Ghildiyal et al., 2010; Seitz et al., 2011). Finally, loading requires a 5′ phosphate, consistent with the products of Dicer cleavage (Kawamata et al., 2011). The sorting of small RNAs between Ago1 and Ago2 blurs the distinction between miRNAs and other small RNAs in flies. On one hand, only Drosophila Ago1 is capable of mediating targeting using the miRNA seed, which is the basis of most miRNA targeting in animals; Ago2 efficiently cleaves its targets but requires nearly perfect matches, which is atypical of metazoan miRNA target sites (Forstemann et al., 2007; Bartel, 2009). Thus RNA species that are predominantly loaded into Ago2 might not be considered proper miRNAs. On the other hand, there are hairpins (such as dme-mir-277) encoded in the Drosophila genome that are processed by the Microprocessor and Dcr-1/Loqs, but the Dcr-1 products are sorted primarily 29 into Ago2 due to extensive pairing (Tomari et al., 2007). Even when the miRNA strand is selectively loaded into Ago1, the miRNA* strand is often loaded into Ago2 (Czech et al., 2009; Okamura et al., 2009; Ghildiyal et al., 2010). Thus, different small RNA species can be derived from the same precursor molecules but come to rest in different maturation endpoints. From a biogenesis standpoint, it seems that sorting preferences are not really requirements for miRNA authenticity; instead, one might view them as different ways for the cell to utilize the RNA precursors that enter the miRNA biogenesis pathway. Ago loading preferences in vertebrates are less well studied, but are likely to be similar to those of Drosophila Ago1. Like Drosophila Ago1, loading of mammalian Ago proteins is more efficient when the duplex contains central mismatches (Yoda et al., 2010). Mammalian Ago proteins prefer 5′ U or A nucleotides, since mammalian miRNAs and functional shRNAs tend to start with U or A (Bartel, 2004; Fellmann et al., 2011); structurally, this preference is mediated by contacts between the Ago MID domain and A and U bases at the 5′ end (Frank et al., 2010). The 5′ end of the loaded strand is usually derived from the less thermodynamically stable end of the duplex (Khvorova et al., 2003). No evidence of significant miRNA sorting between Ago proteins has been found to date (Liu et al., 2004; Meister et al., 2004; Wang et al., 2012). Regulation of biogenesis in subsets of animal pre-miRNAs In addition to positive determinants that promote miRNA biogenesis, it is possible in principle to evolve “anti-determinants” that induce the active elimination of RNA species that are not authentic pre-miRNAs. There are no convincing examples of this paradigm in the literature, but individual miRNAs can be dynamically regulated by the active degradation of the pre-miRNA. This regulation does not help the cell define miRNAs as a class, but does help control the levels of specific miRNAs in response to internal or external cues. One example is the ADAR1-mediated editing of pre-mir-151, which inhibits its cleavage by Dicer (Kawahara et al., 2007). Another, well-studied example is the regulation of let-7 by Lin-28A. Lin-28A binds two specific motifs in the pre-miRNA loops of let-7 family members (Newman et al., 2008; Piskounova et al., 2008; Nam et al., 2011) and recruits a terminal uridyl transferase to add uridines to the 3′ end of the pre-miRNA (Heo et al., 2008; Hagan et al., 2009; Heo et al., 2009; Lehrbach et al., 2009). Uridylation inhibits Dicer processing of the pre-miRNA and recruits an unknown nuclease to degrade the pre-miRNA (Heo et al., 2008; Hagan et al., 2009; Heo et al., 30 2009; Lehrbach et al., 2009). Although Lin-28A regulation was thought to be exclusive to let-7 family members, several other pre-miRNAs have part of the Lin-28 binding motif in their loops, and the presence of the partial motif correlates with evidence of uridylation, albeit less than that of pre-let-7 (Heo et al., 2009). Analysis of mature miRNA sequences has shown that terminal uridylation of mature miRNAs is 3-fold more common in miRNAs derived from the pre-miRNA 3′ arm, suggesting that regulation by uridylation may occur surprisingly frequently (Chiang et al., 2010). Indeed, this is likely to be an underestimate of regulation by polyuridylation, since polyuridylated pre-miRNAs are both less likely to be Dicer processed and more likely to be degraded. The proteins that mediate this putative regulation have not been identified to date, and it is possible that proteins other than Lin-28 can recruit terminal uridyl transferases to premiRNAs. Finding additional biogenesis determinants in pri-miRNAs In summary, the substrate specificity of each successive step in miRNA biogenesis is largely dictated by the biochemistry of the previous step. This observation is not surprising, since these steps are joined in a contiguous maturation pathway, but it does reinforce the notion that authentic miRNA precursors are primarily defined at the first step of biogenesis, when primiRNAs are recognized and cleaved by the Microprocessor. Given the broad substrate specificity of the Microprocessor, it is not understood how the complex differentiates between miRNA structures and other hairpins. On an intellectual level, hairpins are common motifs in structured RNAs, and other RNAs may stochastically assemble into secondary structures that contain hairpins. Indeed, a genome wide-search found some 11 million hairpins in the human genome (Bentwich et al., 2005). To the extent that they are transcribed and functionally important, inappropriate cleavage of many of these structures by Drosha is surely detrimental to the cell. On a practical level, attempts to predict pri-miRNAs based on canonical secondary structure produce many false-positives, which must be eliminated using additional criteria, such as evolutionary conservation or experimental evaluation (Lim et al., 2003a; Lim et al., 2003b; Bentwich et al., 2005; Berezikov et al., 2006; Chiang et al., 2010). Of course, the Microprocessor has no direct way of assessing the conservation of a pri-miRNA substrate, so our inability to predict pri-miRNAs from the sequence of a single genome illustrates our poor understanding of how the Microprocessor recognizes its substrates. 31 The mystery of how Drosha can distinguish pri-miRNAs from other hairpins is part of a recurrent mystery of how an enzyme with minimal apparent preferences can distinguish its authentic substrates from other, superficially similar substrates. For example, questions about the substrate specificity of E. coli RNase III emerged early in the investigation of this enzyme. Polyoma virus dsRNA could be cleaved exhaustively to produce 11-13 nt fragments (Robertson and Dunn, 1975), suggesting that RNase III lacked strong nucleotide preferences. It seemed that RNase III was a general dsRNA endonuclease, at least in vitro, yet it did not seem possible that an enzyme with few if any discernible substrate preferences (other than secondary structure) could function in vivo. Indeed, Hugh D. Robertson and John J. Dunn concluded their 1975 paper on this note: “In conclusion, we can expect the specific sites in cellular RNAs which are processed by RNase III to have substantial double helical structure; to be greater than 20 base pairs in length; to contain 5′-phosphate and 3′-hydroxyl endgroups after cleavage; and to contain, in all probability, at least one further characteristic feature, either a common sequence or an additional structural element, to differentiate them from the many regions of potential secondary structure now thought to reside at frequent intervals in biological RNA sequence.” (Robertson and Dunn, 1975) Just as investigators studying RNase III in the 1970s recognized that the enzyme had to have additional determinants for substrate recognition, it is virtually certain that the Microprocessor recognizes pri-miRNA features beyond the common hairpin structure. I will describe the known specificity determinants in two classes of RNase III enzymes: the eubacterial RNase III enzymes, including the eponymous E. coli RNase III; and the yeast RNase III enzymes Rnt1p and Pac1. Considering these enzymes will provide inspiration about the location and nature of additional determinants that might define pri-miRNAs, and review experimental approaches that have been successful for defining substrate specificity in other recognition paradigms. 32 Substrate specificity in RNase III family proteins RNase III family proteins have evolved to have divergent cellular roles and substrates, so the specific preferences of individual RNase III family members may not translate well to the Microprocessor. However, it is reasonable to believe that the locations and types of preferences may overlap despite evolutionary divergence. Indeed, Cα superposition analysis shows considerable overlap of the structures of Mycobacterium tuberculosis RNase III, Aquifex aeolicus RNase III, the two endonuclease domains of Giardia intestinalis Dicer, and Sacchromyces castellii Dcr-1 (Akey and Berger, 2005; Gan et al., 2006; Macrae et al., 2006; Weinberg et al., 2011). Thus, even though the different classes of RNase III proteins may have their own idiosyncratic modes of recognition, the preferences of the endonuclease domains themselves could be relatively well-preserved. Eubacterial RNase III The founding member of the RNase III family, the Escherichia coli RNase III, was first identified as an enzyme which specifically caused dsRNA to become soluble in trichloroacetic acid (Robertson et al., 1967) in a Mg2+-dependent manner (Robertson et al., 1968), although it did not have a known biological function at that time. Several years later, another group determined that the conversion of the T7 phage early transcript from a ~7 kb primary transcript into five distinct mRNAs depended on a post-transcriptional “sizing factor” which had chromatographic qualities comparable to that of RNase III (Dunn and Studier, 1973b). Indeed, processing of the T7 early RNA was defective in an RNase III deficient E. coli strain, and purified RNase III was sufficient to generate the five messenger RNA products from in vitro transcribed T7 early RNA (Dunn and Studier, 1973a). Furthermore, the RNase III deficient strain had delayed production of 16S and 23S ribosomal RNA (rRNA) from a larger RNA species; as with the T7 early transcript, treatment of this larger RNA with purified RNase III generated products the same size as the 16S and 23S rRNAs (Dunn and Studier, 1973a). Consistent with the dsRNA cleavage activity of RNase III, these cleavage sites were later shown to reside in regions of contiguous pairing between RNA separated by thousands of nucleotides (Young and Steitz, 1978; Bram et al., 1980). This work was followed by the characterization of other RNase III cleavage sites, including ones in other phage RNAs (Hughes et al., 1987; Daniels et al., 1988) and E. coli mRNAs (Barry et al., 1980; Regnier and Portier, 1986; Portier et al., 33 1987; Regnier and Grunberg-Manago, 1989), including the mRNA encoding RNase III itself (Bardwell et al., 1989). RNase III also regulates other RNAs through binding without cleavage (Altuvia et al., 1987). Efforts to study the RNase III substrate specificity centered on comparative analysis of the known RNase III cleavage sites. The observation that cleavage sites in the T7 early transcript were at least superficially related to each other led to the view that RNase III could be a restriction endonuclease for RNA: an enzyme which preferentially cleaved dsRNA with a specific consensus sequence at or near the cleavage site (Robertson, 1982). However, this model eroded as more RNase III cleavage sites were characterized; in particular, lack of significant homology between the 16S and 23S rRNA cleavage sites and the T7 early transcript sites demonstrated that any consensus sequence was, at best, degenerate, if it existed at all. Nevertheless, aggregation of sequences flanking characterized cleavage sites resulted in the identification of a common motif: 5′−CUUN NN|−3′ 3′−GAAN|NN −5′ where N denotes any nucleotide and “|” marks the RNase III cleavage sites (Daniels et al., 1988), along with additional preferences further from the cleavage site (Krinke and Wulff, 1990). However, studies of T7 R1.1 variants that had altered nucleotide identities but retained Watson–Crick base-pairing showed nearly identical cleavage rates compared to wildtype (Chelladurai et al., 1991). Instead, it has been argued that RNase III preferences in this region (“proximal box”) and a second region 5 bp further away from the cleavage site (“distal box”) are driven by disfavored base pairs or “antideterminants.” Substitution of the T7 R1.1 base pairs with these disfavored base pairs results in considerable inhibition of cleavage (Zhang and Nicholson, 1997). On a practical front, it is difficult to tell whether the observed inhibition by antideterminants is due to truly inhibitory base pairs, or if the inhibition simply represents the difference between substrates with the most optimal base pairs and those with the least optimal base pairs. Indeed, a more recent analysis of determinants of RNase III cleavage has pushed the pendulum back towards the concept of a consensus sequence. In this study, RNase III preferred preferred 5′−AG−3′ 3′−UU−5′ 5′−CWUW NN|−3′ 3′−GWAW|NN −5′ in the proximal box, and in the distal box; shifting these motifs in a dsRNA context was sufficient to correspondingly shift the cleavage site (Pertzev and Nicholson, 2006). X-ray crystal structures of Aquifex aeolicus RNase III in complex with model RNA have revealed RNA-protein contacts in the proximal and distal boxes, and additional contacts between the two in a region termed the “middle box” (Figure 2A and 2B) (Gan et al., 2006; Gan et al., 34 2008). The study identified four RNA binding motifs (RBMs). RBMs 1 and 2 occur in the double strand RNA binding domain (dsRBD) and contact the proximal and middle boxes, respectively. RBM1 forms 8 contacts to the ribose 2′-OH or the backbone phosphates along the proximal box and at the cleavage site, explaining the specificity for A-form helical RNA, but only makes one base contact in the proximal box (Gan et al., 2006; Gan et al., 2008). Likewise, RBM2 forms a contact to a 2′-OH and a base in the middle box (Gan et al., 2006; Gan et al., 2008). RBMs 2 and 3 occur in the RNase III domain, and contact the cleavage site and proximal box, and the distal box, respectively. RBM3 extends into the contacts the two bases immediately adjacent to the cleavage site, which could translate into base identity preferences at the cleavage site (Gan et al., 2006; Gan et al., 2008). Interestingly, although RBM4 protrudes into the minor groove, it did not appear to make any contacts to the bases in the distal box (Gan et al., 2006; Gan et al., 2008). Overall, the structures explain the specificity of RNase III for dsRNA, but do not really address the weak preferences for (or against) individual base pairs. The substrate regions where RNase III has base pair preferences could be important in the recognition of pri-miRNAs (Figure 2C). Most RNase III contacts with RNA bases occur in the dsRBDs; since the dsRBDs in different RNase III family proteins contact different parts of the substrate RNA relative to the cleavage site, it is difficult to know where the dsRBDs of Drosha and DGCR8 bind the pri-miRNA. Thus, in drawing parallels between RNase III and the Microprocessor, the most relevant analogy is between the endonuclease domains in RNase III and Drosha, corresponding to RBM3 and RBM4. The RBM3 of Drosha may contact the two bases adjacent to the cleavage site (Gan et al., 2006; Gan et al., 2008), which may translate into a specific nucleotide preference near its cleavage site. Likewise, RBM4 would be situated on the 3p side of the basal stem, and extends into the minor groove on that side, corresponding to the distal box of RNase III substrates. Although RBM4 did not make base contacts in the distal box, it is possible that the RBM4 of Drosha could do so. Yeast RNase III: Rnt1p and Pac1 The RNase III family member with the most defined substrate preferences is Rnt1p. The gene encoding this protein was sequenced in the process of exploring Saccharomyces cerevisiae genome adjacent to a spliceosome factor CUS1. Consistent with its homology with E. coli RNase III and the Schizosaccharomyces pombe gene pac1, Rnt1p cleaved dsRNA in vitro and 35 A RNase III domain RNase III domain dsRBD domain dsRBD domain dsRBD domain RNase III domain B dsRBD domain C Bacterial RNAse III substrate RNase III domain Microprocessor pri-miRNA substrate pre-miRNA RBM2 P9 P7 “Proximal” box P6 P4 RBM1 P3 P2 RBM3 P1 P10 RBM4 (dsRBD) (RNase III) P9 P3 P2 RBM3 -1 -4 +2 -5 +3 -6 +4 -8 +6 -9 +7 -11 +9 -12 +10 Cleavage site RBM3 “Proximal” box +1 RBM2 (dsRBD) RBM4 (RNase III) “Distal” box +1 -4 +2 -5 +3 -6 +4 -8 +6 -9 +7 -11 +9 -12 +10 Cleavage site RBM4 -13 4 -1 +1 6 3’ 7 8 +1 5’ +1 “Mid” box RBM1 “Distal” box (dsRBD) RBM3 P4 P1 -1 (RNase III) “Distal” box (dsRBD) P10 “Proximal” box RBM4 “Proximal” box (RNase III) 5’ “Mid” box “Distal” box 3’ 5’ 3’ disruption of RNT1 impaired the removal of both 5′ and 3′ external transcribed spacers (ETS) in pre-rRNA (Elela et al., 1996). Correspondingly, purified Rnt1p cleaved the 5′ ETS at the A0 cleavage site, and cleaved in the 3′ ETS at a site 21 nt downstream of the 3′ end of mature 28S rRNA (Elela et al., 1996). Intriguingly, selection of the A0 site had been previously shown to be dependent on U3 snoRNA binding upstream (Beltrame and Tollervey, 1995), suggesting some interaction between snoRNAs and Rnt1p, although U3 binding is dispensable for actual cleavage (Elela et al., 1996). Despite the superficial similarity to eubacterial rRNA processing, the actual substrate preferences of Rnt1p are considerably more defined. Further characterization of the RNT1 disrupted strain demonstrated accumulation of many snoRNA precursors (of both H/ACA and C/D snoRNAs), all of which contained an internal or terminal tetraloop with the degenerate motif AGNN (Chanfreau et al., 1998). Analysis of an expanded catalog of Rnt1p substrate sequences, including a panel of snoRNAs, the U1, U2, U4, and U5 small nucleolar RNAs (snRNAs), and the 3′ ETS cleavage site invariably showed the presence of an (U/A)GNN tetraloop (nearly always AGNN), situated 13-16 base pairs from the Rnt1p cleavage site (Figure 2) (Chanfreau et al., 2000). Mutation of the A and G nucleotides either severely impaired or completely abolished cleavage, and changing the tetraloop position shifted the Rnt1p cleavage site accordingly, demonstrating that the AGNN tetraloop is required for both substrate recognition and cleavage site selection (Chanfreau et al., 2000). NMR structure analysis of AGNN tetraloops shows that the tetraloop has a distinct conformation notable for a syn conformation in the G nucleotide, allowing the base to stack with the first base of the tetraloop and hydrogen bond to the phosphate in ApG (Figure 2A) (Wu et al., 2001). The syn conformation causes a backbone turn at the GpN junction (Wu et al., 2001). The structure is stabilized by non-Watson–Crick base pairs between the first and last bases in the Figure 2. Specificity determinants in substrates of eubacterial RNase III. (A) Structural basis of RNase III recognition of substrates. Structure information was taken from PDB 2EZ6. (B) Locations of published specificity determinants, correlated with protein motifs in the RNase III dsRBD and RNase III endonuclease domain. The proximal, middle, and distal boxes are shaded in gray, outlined with the color of the corresponding region in (A). (C) Inference of potential specificity determinants in pri-miRNAs. The location of determinants is aligned based on the cleavage site of RNase III and Drosha. 37 tetraloop (Wu et al., 2001) and the adjacent Watson–Crick pairs of the hairpin helix, consistent with the necessity of these nearby pairs in what has been termed the “binding and stability box” (Lamontagne et al., 2003). The conformation was conserved between the AGAA, AGUU, and UGAA tetraloops, but was lost when G was mutated to C to form an ACAA tetraloop (Butcher et al., 1997; Wu et al., 2001). This opens the possibility that the apparent nucleotide preferences of Rnt1p actually translate into an RNA conformation preference, which is most easily (or exclusively) adopted in this specific nucleotide context. Indeed, an NMR structure of the Rnt1p dsRBD in complex with AGAA tetraloop (Figure 3A) is remarkable for its lack of base-specific contacts; instead, the α1 helix of the dsRBD contacts tetraloop backbone and the minor groove formed by its conformation (Wu et al., 2004). In order to distinguish the AGNN tetraloop from dsRNA, the Rnt1p α1 helix is oriented differently from α1 helices in other dsRBDs (Wu et al., 2004). Other contacts between the hairpin-tetraloop structure include the α1 helix extending into the minor groove of the apical stem region, the β3α2 loop resting superficially along stem major groove, and the β1β2 loop reaching into the minor groove one turn away from the loop (Wu et al., 2004). These contacts may explain why Rnt1p prefers certain Watson–Crick pairs in the binding and stability box (Lamontagne et al., 2003). The lack of base-specific contacts suggests that these preferences may also be conformational in nature. Even though a structure of the full-length Rnt1p protein is not available at present, it is very likely that the position of the dsRBD relative to the endonuclease domain in Rnt1p is quite different from that of other RNase III family members. The uniqueness of the Rnt1p dsRBD position and its unique conformation compared to other dsRBDs makes it difficult to draw significant parallels between the substrate recognition of Rnt1p and that of the Microprocessor. In fact, the other well-characterized yeast RNase III, Pac1, has no discernible preference for AGNN tetraloops, and instead resembles E. coli RNase III in its relatively relaxed substrate specificity (Figure 3B) (Rotondo and Frendewey, 1996). However, it is worth noting that, in C. elegans, trans-splicing between SL1 and cel-let-7 has been reported to be required for the processing of the pri-miRNA; it was argued that SL1 trans-splicing could alter the predicted structure of the let-7 pri-miRNA (Bracht et al., 2004). Intriguingly, NMR studies of the donated portion of the SL1 RNA have demonstrated the presence of an AGUU tetraloop structure above a buckled A:U pair (Greenbaum et al., 1996). Although the reported G is in a syn conformation, the base is oriented in the opposite direction as the G in the yeast AGNN tetraloops. It has been 38 argued that the difference is due to a misassigned resonance peak in the SL1 structure (Wu et al., 2001). Thus, it is tempting to speculate that AGNN tetraloop recognition plays a role in nematode pri-miRNA processing, and that SL1 trans-splicing is important to pri-let-7a processing because it brings an AGNN tetraloop into proximity with the Drosha cleavage site. Regions recognized by the endonuclease domains of yeast RNase III proteins are more likely to be relevant to pri-miRNA processing. For Rnt1p, changes to the base pairs immediately flanking the cleavage site alter the cleavage rate without affecting binding affinity (Lamontagne et al., 2003). This suggests that Rnt1p substrate affinity is driven primarily by the dsRBD. It may be more useful to consider Pac1, which has a preference for an internal loop near the cleavage site in a region analogous to the E. coli proximal box; the position of this loop may guide the Pac1 cleavage site (Figure 3B) (Lamontagne and Elela, 2004). Although it is not known whether this internal loop preference is read by the Pac1 dsRBD or the endonuclease domain, its positioning relative to the cleavage site suggests that the internal loop might be an endonuclease domain preference. If so, this observation reinforces the idea that RNase III domains may have substrate preferences in a common region relative to the cleavage site, although the specific preferences have diverged (Figure 3C). An exhaustive, quantitative approach to defining pri-miRNAs It is likely that the Microprocessor has substrate preferences beyond secondary structure, but no such recognition paradigm has emerged despite many published studies with individual pri-miRNA variants. When one compares studies of the Microprocessor and studies of bacterial RNase III, it is apparent that Microprocessor studies are missing two things that were critical to elucidating RNase III substrate determinants. First, pri-miRNA cleavage must be measured quantitatively. Nearly all published primiRNA cleavage experiments thus far have been non-quantitative; typical read-outs are “cleaved” or “uncleaved.” But these qualitative experiments will not suffice to elucidate the determinants that distinguish pri-miRNA hairpins from other hairpins. For bacterial RNase III, several determinants contribute relatively subtly to differences in binding affinity and cleavage rate; together, the presence of these determinants (or the absence of anti-determinants, if they indeed exist) define the RNase III substrate. The distinguishing features of pri-miRNA hairpins 39 A Minor groove Major groove dsRBD domain Minor groove dsRBD domain A A B G A dsRBD domain S. cerevisiae Rnt1p substrate 3’ C S. pombe Pac1p substrate 5’ 3’ Microprocessor pri-miRNA substrate 5’ pre-miRNA P1 P1 P1 -1 -1 -1 Cleavage site +2 dsRBD G -11 +9 -12 +10 RNase III (?) -13 8 A N +3 6 +14 N +2 +1 +13 -16 -5 -6 +1 -15 RNase III (?) Cleavage site 7 +12 +3 AGNN tetraloop +11 -14 +2 +1 4 -1 dsRBD -13 -5 -6 Cleavage site “Binding and stability” box +3 +1 +1 +1 5’ 3’ may individually contribute just the difference between moderate and maximal cleavage efficiency. Indeed, many mutations introduced into pri-miRNA substrates alter cleavage efficiency without completely abolishing cleavage, but these substrates are often lumped into the “cleaved” category, masking the potential contribution of additional determinants. For example, mutations in the basal stem that shift the Microprocessor cleavage site also reduce the overall efficiency of cleavage (Han et al., 2006), suggesting either that the mutations perturb an important sequence or that the new cleavage site is somehow in a suboptimal context. Either way, it is clear that our understanding of recognition and cleavage is incomplete. Second, studies must use large sets of variants derived from individual pri-miRNAs, enabling a more precise understanding of what specific sequence motifs or structural features are important to recognition by the Microprocessor. For example, conclusions about the optimal apical stem length for cleavage were primarily based on deletions or mutations that abolished base pairing (Zeng et al., 2005; Han et al., 2006; Zhang and Zeng, 2010). Based on these experiments, it is extremely difficult to tell whether recognition by the Microprocessor was impaired because stem length per se was important, or because a specific recognition determinant resides in this region and had been deleted or mutated. The approach of mutating away base pairs is even more prone to misinterpretation, since at least three mechanisms could be in play: stem length, loop or unstructured RNA length, and primary sequence. Similarly, studies of the flanking RNA sequence were largely based on deletions (Lee et al., 2003; Zeng and Cullen, 2005), and the results are compatible with the idea that additional, subtle determinants reside in the flanking RNA. By contrast, studies of the proximal, mid, and distal boxes in the substrates of bacterial RNase III systematically tested most or all possible Watson– Crick base pairs at the interrogated positions (Pertzev and Nicholson, 2006). Similarly detailed studies will be needed to make significant progress in understanding the cleavage of pri- Figure 3. Specificity determinants in substrates of yeast RNase III enzymes Rnt1p and Pac1. (A) Structural basis of Rnt1p recognition of the AGNN tetraloop. Structure information was taken from PDB 1T4L. (B) Locations of published specificity determinants, correlated with protein motifs where information is available. Recognition boxes for Rnt1p and Pac1 are shown. (C) Inference of potential specificity determinants in pri-miRNAs. The location of determinants is aligned based on the cleavage sites of Rnt1p, Pac1, and Drosha. 41 miRNAs. In seeking large sets of pri-miRNA variants, it will probably not be sufficient to gather collections of the pri-miRNAs encoded in animal genomes and derive determinants from their common sequences. Many groups have attempted to use sequence analysis and computational learning models to analyze pri-miRNAs for predictive determinants, largely for the purpose of computationally predicting novel pri-miRNAs (Grad et al., 2003; Lim et al., 2003a; Lim et al., 2003b; Bentwich et al., 2005; Nam et al., 2005; Berezikov et al., 2006). The resulting algorithms perform surprisingly poorly unless conservation or experimental data are taken into account (Lim et al., 2003a; Lim et al., 2003b; Ruby et al., 2006; Ruby et al., 2007b; Stark et al., 2007; Chiang et al., 2010). Others have attempted to quantify the Microprocessor cleavage of 250 pri-miRNAs, in the hope that measured cleavage efficiencies would offer an extra dimension of information that straight sequence analysis of pri-miRNAs may have lacked. This study was not much more successful than comparative sequence analysis: it found that conserved pri-miRNAs were processed more efficiently than less conserved pri-miRNAs, and that the general structural features defined previously were correlated with cleavage efficiency (Feng et al., 2011). The success of these studies was likely limited by the small number of “true positive” hairpins in the training sets, and their wide sequence and structural divergence. In particular, sequence and structural divergence severely hampers alignment, which in turn limits the power of computational analysis to discover short or degenerate motifs. What is needed is an experimental system that systematically generates a large number of related hairpin substrates, and quantifies the cleavage of these hairpins. These hairpins must be sufficiently divergent to sample suboptimal pri-miRNA sequences and structures, but sufficiently similar to enable computational sequence analysis. In the following chapters I will describe such an experimental and computational approach. Hundreds of billions of pri-miRNA variants were generated, each related to one of four human pri-miRNAs. Of these variants, millions of functional variants cleaved by the Microprocessor were sequenced. Computational analysis of the successful variants revealed a panel of important determinants of pri-miRNA recognition, and quantified the relative contribution of these determinants. Together, these evolutionarilyconserved features define the majority of authentic human pri-miRNAs. The elucidation of these features greatly expands the understanding of what pri-miRNAs are, and how the cell recognizes the correct hairpins to process into mature miRNAs. 42 Bibliography and References Cited Akey, D.L., and Berger, J.M. (2005). Structure of the nuclease domain of ribonuclease III from M. tuberculosis at 2.1 A. Protein Sci 14, 2744-2750. Altuvia, S., Locker-Giladi, H., Koby, S., Ben-Nun, O., and Oppenheim, A.B. (1987). RNase III stimulates the translation of the cIII gene of bacteriophage lambda. Proc Natl Acad Sci U S A 84, 6511-6515. Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., et al. (2003). A uniform system for microRNA annotation. RNA 9, 277-279. Arvey, A., Larsson, E., Sander, C., Leslie, C.S., and Marks, D.S. (2010). Target mRNA abundance dilutes microRNA and siRNA activity. Mol Syst Biol 6, 363. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008). Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicerdependent small RNAs. Genes Dev 22, 2773-2785. Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71. Bardwell, J.C., Regnier, P., Chen, S.M., Nakamura, Y., Grunberg-Manago, M., and Court, D.L. (1989). Autoregulation of RNase III operon by mRNA processing. EMBO J 8, 3401-3407. Barry, G., Squires, C., and Squires, C.L. (1980). Attenuation and processing of RNA from the rplJL--rpoBC transcription unit of Escherichia coli. Proc Natl Acad Sci U S A 77, 33313335. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281297. Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233. Bass, B.L. (2000). Double-stranded RNA as a template for gene silencing. Cell 101, 235-238. Bazzini, A.A., Lee, M.T., and Giraldez, A.J. (2012). Ribosome profiling shows that miR-430 reduces translation before causing mRNA decay in zebrafish. Science 336, 233-237. Beltrame, M., and Tollervey, D. (1995). Base pairing between U3 and the pre-ribosomal RNA is required for 18S rRNA synthesis. EMBO J 14, 4350-4356. Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., Einat, P., Einav, U., Meiri, E., et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37, 766-770. Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J., Verloop, R., van de Wetering, M., Guryev, V., Takada, S., et al. (2006). Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 16, 1289-1298. Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. (2001). Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409, 363-366. Bernstein, E., Kim, S.Y., Carmell, M.A., Murchison, E.P., Alcorn, H., Li, M.Z., Mills, A.A., Elledge, S.J., Anderson, K.V., and Hannon, G.J. (2003). Dicer is essential for mouse development. Nat Genet 35, 215-217. 43 Blow, M.J., Grocock, R.J., van Dongen, S., Enright, A.J., Dicks, E., Futreal, P.A., Wooster, R., and Stratton, M.R. (2006). RNA editing of human microRNAs. Genome Biol 7, R27. Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO J 17, 170-180. Bohnsack, M.T., Czaplinski, K., and Gorlich, D. (2004). Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. RNA 10, 185-191. Bohnsack, M.T., Regener, K., Schwappach, B., Saffrich, R., Paraskeva, E., Hartmann, E., and Gorlich, D. (2002). Exp5 exports eEF1A via tRNA from nuclei and synergizes with other transport pathways to confine translation to the cytoplasm. EMBO J 21, 6205-6215. Bracht, J., Hunter, S., Eachus, R., Weeks, P., and Pasquinelli, A.E. (2004). Trans-splicing and polyadenylation of let-7 microRNA primary transcripts. RNA 10, 1586-1594. Bram, R.J., Young, R.A., and Steitz, J.A. (1980). The ribonuclease III site flanking 23S sequences in the 30S ribosomal precursor RNA of E. coli. Cell 19, 393-401. Brownawell, A.M., and Macara, I.G. (2002). Exportin-5, a novel karyopherin, mediates nuclear export of double-stranded RNA binding proteins. J Cell Biol 156, 53-64. Brummelkamp, T.R., Bernards, R., and Agami, R. (2002). A system for stable expression of short interfering RNAs in mammalian cells. Science 296, 550-553. Butcher, S.E., Dieckmann, T., and Feigon, J. (1997). Solution structure of the conserved 16 Slike ribosomal RNA UGAA tetraloop. J Mol Biol 268, 348-358. Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. (2007). RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci U S A 104, 18097-18102. Calado, A., Treichel, N., Muller, E.C., Otto, A., and Kutay, U. (2002). Exportin-5-mediated nuclear export of eukaryotic elongation factor 1A and tRNA. EMBO J 21, 6216-6224. Calin, G.A., Dumitru, C.D., Shimizu, M., Bichi, R., Zupo, S., Noch, E., Aldler, H., Rattan, S., Keating, M., Rai, K., et al. (2002). Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A 99, 15524-15529. Calin, G.A., Ferracin, M., Cimmino, A., Di Leva, G., Shimizu, M., Wojcik, S.E., Iorio, M.V., Visone, R., Sever, N.I., Fabbri, M., et al. (2005). A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 353, 1793-1801. Calin, G.A., Sevignani, C., Dumitru, C.D., Hyslop, T., Noch, E., Yendamuri, S., Shimizu, M., Rattan, S., Bullrich, F., Negrini, M., et al. (2004). Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc Natl Acad Sci U S A 101, 2999-3004. Caudy, A.A., Myers, M., Hannon, G.J., and Hammond, S.M. (2002). Fragile X-related protein and VIG associate with the RNA interference machinery. Genes Dev 16, 2491-2496. Cenik, E.S., Fukunaga, R., Lu, G., Dutcher, R., Wang, Y., Tanaka Hall, T.M., and Zamore, P.D. (2011). Phosphate and R2D2 restrict the substrate specificity of Dicer-2, an ATP-driven ribonuclease. Mol Cell 42, 172-184. Chakravarthy, S., Sternberg, S.H., Kellenberger, C.A., and Doudna, J.A. (2010). Substratespecific kinetics of Dicer-catalyzed RNA processing. J Mol Biol 404, 392-402. Chanfreau, G., Buckle, M., and Jacquier, A. (2000). Recognition of a conserved class of RNA tetraloops by Saccharomyces cerevisiae RNase III. Proc Natl Acad Sci U S A 97, 3142-3147. Chanfreau, G., Legrain, P., and Jacquier, A. (1998). Yeast RNase III as a key processing enzyme in small nucleolar RNAs metabolism. J Mol Biol 284, 975-988. 44 Chelladurai, B.S., Li, H., and Nicholson, A.W. (1991). A conserved sequence element in ribonuclease III processing signals is not required for accurate in vitro enzymatic cleavage. Nucleic Acids Res 19, 1759-1766. Cheloufi, S., Dos Santos, C.O., Chong, M.M., and Hannon, G.J. (2010). A dicer-independent miRNA biogenesis pathway that requires Ago catalysis. Nature 465, 584-589. Chen, C.Z., Li, L., Lodish, H.F., and Bartel, D.P. (2004). MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-86. Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura, K., and Shiekhattar, R. (2005). TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436, 740-744. Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24, 992-1009. Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328, 1694-1698. Czech, B., Malone, C.D., Zhou, R., Stark, A., Schlingeheyde, C., Dus, M., Perrimon, N., Kellis, M., Wohlschlegel, J.A., Sachidanandam, R., et al. (2008). An endogenous small interfering RNA pathway in Drosophila. Nature 453, 798-802. Czech, B., Zhou, R., Erlich, Y., Brennecke, J., Binari, R., Villalta, C., Gordon, A., Perrimon, N., and Hannon, G.J. (2009). Hierarchical rules for Argonaute loading in Drosophila. Mol Cell 36, 445-456. Daniels, D.L., Subbarao, M.N., Blattner, F.R., and Lozeron, H.A. (1988). Q-mediated late gene transcription of bacteriophage lambda: RNA start point and RNase III processing sites in vivo. Virology 167, 568-577. Davis, B.N., Hilyard, A.C., Lagna, G., and Hata, A. (2008). SMAD proteins control DROSHAmediated microRNA maturation. Nature 454, 56-61. Davis, B.N., Hilyard, A.C., Nguyen, P.H., Lagna, G., and Hata, A. (2010). Smad proteins bind a conserved RNA sequence to promote microRNA maturation by Drosha. Mol Cell 39, 373384. Denli, A.M., Tops, B.B., Plasterk, R.H., Ketting, R.F., and Hannon, G.J. (2004). Processing of primary microRNAs by the Microprocessor complex. Nature 432, 231-235. Doench, J.G., and Sharp, P.A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev 18, 504-511. Dong, Z., Han, M.H., and Fedoroff, N. (2008). The RNA-binding proteins HYL1 and SE promote accurate in vitro processing of pri-miRNA by DCL1. Proc Natl Acad Sci U S A 105, 9970-9975. Dreyfuss, G., Matunis, M.J., Pinol-Roma, S., and Burd, C.G. (1993). hnRNP proteins and the biogenesis of mRNA. Annu Rev Biochem 62, 289-321. Duan, R., Pak, C., and Jin, P. (2007). Single nucleotide polymorphism associated with mature miR-125a alters the processing of pri-miRNA. Hum Mol Genet 16, 1124-1131. Dunn, J.J., and Studier, F.W. (1973a). T7 early RNAs and Escherichia coli ribosomal RNAs are cut from large precursor RNAs in vivo by ribonuclease 3. Proc Natl Acad Sci U S A 70, 3296-3300. Dunn, J.J., and Studier, F.W. (1973b). T7 early RNAs are generated by site-specific cleavages. Proc Natl Acad Sci U S A 70, 1559-1563. 45 Elbashir, S.M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., and Tuschl, T. (2001a). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411, 494-498. Elbashir, S.M., Lendeckel, W., and Tuschl, T. (2001b). RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev 15, 188-200. Elela, S.A., Igel, H., and Ares, M., Jr. (1996). RNase III cleaves eukaryotic preribosomal RNA at a U3 snoRNP-dependent site. Cell 85, 115-124. Fagard, M., Boutet, S., Morel, J.B., Bellini, C., and Vaucheret, H. (2000). AGO1, QDE-2, and RDE-1 are related proteins required for post-transcriptional gene silencing in plants, quelling in fungi, and RNA interference in animals. Proc Natl Acad Sci U S A 97, 11650-11654. Faller, M., Toso, D., Matsunaga, M., Atanasov, I., Senturia, R., Chen, Y., Zhou, Z.H., and Guo, F. (2010). DGCR8 recognizes primary transcripts of microRNAs through highly cooperative binding and formation of higher-order structures. RNA 16, 1570-1583. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821. Fellmann, C., Zuber, J., McJunkin, K., Chang, K., Malone, C.D., Dickins, R.A., Xu, Q., Hengartner, M.O., Elledge, S.J., Hannon, G.J., et al. (2011). Functional identification of optimized RNAi triggers using a massively parallel sensor assay. Mol Cell 41, 733-746. Feng, Y., Zhang, X., Song, Q., Li, T., and Zeng, Y. (2011). Drosha processing controls the specificity and efficiency of global microRNA expression. Biochim Biophys Acta 1809, 700707. Filippov, V., Solovyev, V., Filippova, M., and Gill, S.S. (2000). A novel type of RNase III family proteins in eukaryotes. Gene 245, 213-221. Fire, A., Albertson, D., Harrison, S.W., and Moerman, D.G. (1991). Production of antisense RNA leads to effective and specific inhibition of gene expression in C. elegans muscle. Development 113, 503-514. Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mello, C.C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806-811. Flynt, A.S., Greimann, J.C., Chung, W.J., Lima, C.D., and Lai, E.C. (2010). MicroRNA biogenesis via splicing and exosome-mediated trimming in Drosophila. Mol Cell 38, 900907. Forstemann, K., Horwich, M.D., Wee, L., Tomari, Y., and Zamore, P.D. (2007). Drosophila microRNAs are sorted into functionally distinct argonaute complexes after production by dicer-1. Cell 130, 287-297. Forstemann, K., Tomari, Y., Du, T., Vagin, V.V., Denli, A.M., Bratu, D.P., Klattenhoff, C., Theurkauf, W.E., and Zamore, P.D. (2005). Normal microRNA maturation and germ-line stem cell maintenance requires Loquacious, a double-stranded RNA-binding domain protein. PLoS Biol 3, e236. Frank, F., Sonenberg, N., and Nagar, B. (2010). Structural basis for 5'-nucleotide base-specific recognition of guide RNA by human AGO2. Nature 465, 818-822. Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92-105. Fukuda, T., Yamagata, K., Fujiyama, S., Matsumoto, T., Koshida, I., Yoshimura, K., Mihara, M., Naitou, M., Endoh, H., Nakamura, T., et al. (2007). DEAD-box RNA helicase subunits of the 46 Drosha complex are required for processing of rRNA and a subset of microRNAs. Nat Cell Biol 9, 604-611. Gan, J., Shaw, G., Tropea, J.E., Waugh, D.S., Court, D.L., and Ji, X. (2008). A stepwise model for double-stranded RNA processing by ribonuclease III. Mol Microbiol 67, 143-154. Gan, J., Tropea, J.E., Austin, B.P., Court, D.L., Waugh, D.S., and Ji, X. (2006). Structural insight into the mechanism of double-stranded RNA processing by ribonuclease III. Cell 124, 355366. Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P. (2011). Weak seedpairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18, 1139-1146. Ghildiyal, M., Xu, J., Seitz, H., Weng, Z., and Zamore, P.D. (2010). Sorting of Drosophila small silencing RNAs partitions microRNA* strands into the RNA interference pathway. RNA 16, 43-56. Gottwein, E., Cai, X., and Cullen, B.R. (2006). A novel assay for viral microRNA function identifies a single nucleotide polymorphism that affects Drosha processing. J Virol 80, 53215326. Grad, Y., Aach, J., Hayes, G.D., Reinhart, B.J., Church, G.M., Ruvkun, G., and Kim, J. (2003). Computational and experimental identification of C. elegans microRNAs. Mol Cell 11, 12531263. Greenbaum, N.L., Radhakrishnan, I., Patel, D.J., and Hirsh, D. (1996). Solution structure of the donor site of a trans-splicing RNA. Structure 4, 725-733. Gregory, R.I., Yan, K.P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and Shiekhattar, R. (2004). The Microprocessor complex mediates the genesis of microRNAs. Nature 432, 235-240. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-144. Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105. Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34. Guil, S., and Caceres, J.F. (2007). The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a. Nat Struct Mol Biol 14, 591-596. Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835-840. Guo, S., and Kemphues, K.J. (1995). par-1, a gene required for establishing polarity in C. elegans embryos, encodes a putative Ser/Thr kinase that is asymmetrically distributed. Cell 81, 611-620. Gwizdek, C., Ossareh-Nazari, B., Brownawell, A.M., Doglio, A., Bertrand, E., Macara, I.G., and Dargemont, C. (2003). Exportin-5 mediates nuclear export of minihelix-containing RNAs. J Biol Chem 278, 5505-5508. 47 Haase, A.D., Jaskiewicz, L., Zhang, H., Laine, S., Sack, R., Gatignol, A., and Filipowicz, W. (2005). TRBP, a regulator of cellular PKR and HIV-1 virus expression, interacts with Dicer and functions in RNA silencing. EMBO Rep 6, 961-967. Hagan, J.P., Piskounova, E., and Gregory, R.I. (2009). Lin28 recruits the TUTase Zcchc11 to inhibit let-7 maturation in mouse embryonic stem cells. Nat Struct Mol Biol 16, 1021-1025. Hamilton, A.J., and Baulcombe, D.C. (1999). A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286, 950-952. Hammond, S.M., Bernstein, E., Beach, D., and Hannon, G.J. (2000). An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293-296. Hammond, S.M., Boettcher, S., Caudy, A.A., Kobayashi, R., and Hannon, G.J. (2001). Argonaute2, a link between genetic and biochemical analyses of RNAi. Science 293, 11461150. Han, J., Lee, Y., Yeom, K.H., Kim, Y.K., Jin, H., and Kim, V.N. (2004). The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18, 3016-3027. Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. (2006). Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887-901. Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.K., Yeom, K.H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. (2009). Posttranscriptional crossregulation between Drosha and DGCR8. Cell 136, 75-84. Hartig, J.V., Esslinger, S., Bottcher, R., Saito, K., and Forstemann, K. (2009). Endo-siRNAs depend on a new isoform of loquacious and target artificially introduced, high-copy sequences. EMBO J 28, 2932-2944. Hendrickson, D.G., Hogan, D.J., McCullough, H.L., Myers, J.W., Herschlag, D., Ferrell, J.E., and Brown, P.O. (2009). Concordant regulation of translation and mRNA abundance for hundreds of targets of a human microRNA. PLoS Biol 7, e1000238. Heo, I., Joo, C., Cho, J., Ha, M., Han, J., and Kim, V.N. (2008). Lin28 mediates the terminal uridylation of let-7 precursor MicroRNA. Mol Cell 32, 276-284. Heo, I., Joo, C., Kim, Y.K., Ha, M., Yoon, M.J., Cho, J., Yeom, K.H., Han, J., and Kim, V.N. (2009). TUT4 in concert with Lin28 suppresses microRNA biogenesis through premicroRNA uridylation. Cell 138, 696-708. Hughes, J.A., Brown, L.R., and Ferro, A.J. (1987). Nucleotide sequence and analysis of the coliphage T3 S-adenosylmethionine hydrolase gene and its surrounding ribonuclease III processing sites. Nucleic Acids Res 15, 717-729. Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D. (2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293, 834-838. Izant, J.G., and Weintraub, H. (1984). Inhibition of thymidine kinase gene expression by antisense RNA: a molecular approach to genetic analysis. Cell 36, 1007-1015. Jacobsen, S.E., Running, M.P., and Meyerowitz, E.M. (1999). Disruption of an RNA helicase/RNAse III gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126, 5231-5243. Jiang, F., Ye, X., Liu, X., Fincher, L., McKearin, D., and Liu, Q. (2005). Dicer-1 and R3D1-L catalyze microRNA maturation in Drosophila. Genes Dev 19, 1674-1679. Jones-Rhoades, M.W., Bartel, D.P., and Bartel, B. (2006). MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol 57, 19-53. 48 Kawahara, Y., Megraw, M., Kreider, E., Iizasa, H., Valente, L., Hatzigeorgiou, A.G., and Nishikura, K. (2008). Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36, 5270-5280. Kawahara, Y., Zinshteyn, B., Chendrimada, T.P., Shiekhattar, R., and Nishikura, K. (2007). RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep 8, 763-769. Kawamata, T., Seitz, H., and Tomari, Y. (2009). Structural determinants of miRNAs for RISC loading and slicer-independent unwinding. Nat Struct Mol Biol 16, 953-960. Kawamata, T., Yoda, M., and Tomari, Y. (2011). Multilayer checkpoints for microRNA authenticity during RISC assembly. EMBO Rep 12, 944-949. Ketting, R.F. (2011). The many faces of RNAi. Dev Cell 20, 148-161. Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15, 2654-2659. Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs and miRNAs exhibit strand bias. Cell 115, 209-216. Knight, S.W., and Bass, B.L. (2001). A role for the RNase III enzyme DCR-1 in RNA interference and germ line development in Caenorhabditis elegans. Science 293, 2269-2271. Krinke, L., and Wulff, D.L. (1990). The cleavage specificity of RNase III. Nucleic Acids Res 18, 4809-4815. Kumar, M.S., Lu, J., Mercer, K.L., Golub, T.R., and Jacks, T. (2007). Impaired microRNA processing enhances cellular transformation and tumorigenesis. Nat Genet 39, 673-677. Kurihara, Y., and Watanabe, Y. (2004). Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci U S A 101, 12753-12758. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858. Lamontagne, B., and Elela, S.A. (2004). Evaluation of the RNA determinants for bacterial and yeast RNase III binding and cleavage. J Biol Chem 279, 2231-2241. Lamontagne, B., Ghazal, G., Lebars, I., Yoshizawa, S., Fourmy, D., and Elela, S.A. (2003). Sequence dependence of substrate recognition and cleavage by yeast RNase III. J Mol Biol 327, 985-1000. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414. Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical region gene 8 and Its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol 14, 2162-2167. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. Lau, P.W., Guiley, K.Z., De, N., Potter, C.S., Carragher, B., and MacRae, I.J. (2012). The molecular architecture of human Dicer. Nat Struct Mol Biol 19, 436-440. Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864. Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854. 49 Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., et al. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415419. Lee, Y., Hur, I., Park, S.Y., Kim, Y.K., Suh, M.R., and Kim, V.N. (2006). The role of PACT in the RNA silencing pathway. EMBO J 25, 522-532. Lee, Y., Jeon, K., Lee, J.T., Kim, S., and Kim, V.N. (2002). MicroRNA maturation: stepwise processing and subcellular localization. EMBO J 21, 4663-4670. Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004a). MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051-4060. Lee, Y.S., Nakahara, K., Pham, J.W., Kim, K., He, Z., Sontheimer, E.J., and Carthew, R.W. (2004b). Distinct roles for Drosophila Dicer-1 and Dicer-2 in the siRNA/miRNA silencing pathways. Cell 117, 69-81. Lehrbach, N.J., Armisen, J., Lightfoot, H.L., Murfitt, K.J., Bugaut, A., Balasubramanian, S., and Miska, E.A. (2009). LIN-28 and the poly(U) polymerase PUP-2 regulate let-7 microRNA processing in Caenorhabditis elegans. Nat Struct Mol Biol 16, 1016-1020. Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. (2003a). Vertebrate microRNA genes. Science 299, 1540. Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., and Bartel, D.P. (2003b). The microRNAs of Caenorhabditis elegans. Genes Dev 17, 9911008. Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 is the catalytic engine of mammalian RNAi. Science 305, 1437-1441. Liu, Q., Rand, T.A., Kalidas, S., Du, F., Kim, H.E., Smith, D.P., and Wang, X. (2003). R2D2, a bridge between the initiation and effector steps of the Drosophila RNAi pathway. Science 301, 1921-1925. Liu, X., Park, J.K., Jiang, F., Liu, Y., McKearin, D., and Liu, Q. (2007). Dicer-1, but not Loquacious, is critical for assembly of miRNA-induced silencing complexes. RNA 13, 23242329. Llave, C., Xie, Z., Kasschau, K.D., and Carrington, J.C. (2002). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., et al. (2005). MicroRNA expression profiles classify human cancers. Nature 435, 834-838. Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. (2004). Nuclear export of microRNA precursors. Science 303, 95-98. Ma, J.B., Ye, K., and Patel, D.J. (2004). Structural basis for overhang-specific small interfering RNA recognition by the PAZ domain. Nature 429, 318-322. MacRae, I.J., Zhou, K., and Doudna, J.A. (2007). Structural determinants of RNA recognition and cleavage by Dicer. Nat Struct Mol Biol 14, 934-940. Macrae, I.J., Zhou, K., Li, F., Repic, A., Brooks, A.N., Cande, W.Z., Adams, P.D., and Doudna, J.A. (2006). Structural basis for double-stranded RNA processing by Dicer. Science 311, 195-198. Mateos, J.L., Bologna, N.G., Chorostecki, U., and Palatnik, J.F. (2010). Identification of microRNA processing determinants by random mutagenesis of Arabidopsis MIR172a precursor. Curr Biol 20, 49-54. 50 Mayr, C., Hemann, M.T., and Bartel, D.P. (2007). Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 315, 1576-1579. Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and Tuschl, T. (2004). Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol Cell 15, 185-197. Mian, I.S. (1997). Comparative sequence analysis of ribonucleases HII, III, II PH and D. Nucleic Acids Res 25, 3187-3195. Michlewski, G., and Caceres, J.F. (2010). Antagonistic role of hnRNP A1 and KSRP in the regulation of let-7a biogenesis. Nat Struct Mol Biol 17, 1011-1018. Michlewski, G., Guil, S., Semple, C.A., and Caceres, J.F. (2008). Posttranscriptional regulation of miRNAs harboring conserved terminal loops. Mol Cell 32, 383-393. Miyoshi, K., Miyoshi, T., and Siomi, H. (2010). Many ways to generate microRNA-like small RNAs: non-canonical pathways for microRNA production. Mol Genet Genomics 284, 95103. Miyoshi, K., Okada, T.N., Siomi, H., and Siomi, M.C. (2009). Characterization of the miRNARISC loading complex and miRNA-RISC formed in the Drosophila miRNA pathway. RNA 15, 1282-1291. Moffat, J., Grueneberg, D.A., Yang, X., Kim, S.Y., Kloepfer, A.M., Hinkle, G., Piqani, B., Eisenhaure, T.M., Luo, B., Grenier, J.K., et al. (2006). A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124, 1283-1298. Montgomery, M.K., and Fire, A. (1998). Double-stranded RNA as a mediator in sequencespecific genetic silencing and co-suppression. Trends Genet 14, 255-258. Murphy, D., Dancis, B., and Brown, J.R. (2008). The evolution of core proteins involved in microRNA biogenesis. BMC Evol Biol 8, 92. Nakamura, T., Canaani, E., and Croce, C.M. (2007). Oncogenic All1 fusion proteins target Drosha-mediated microRNA processing. Proc Natl Acad Sci U S A 104, 10980-10985. Nam, J.W., Shin, K.R., Han, J., Lee, Y., Kim, V.N., and Zhang, B.T. (2005). Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res 33, 3570-3581. Nam, Y., Chen, C., Gregory, R.I., Chou, J.J., and Sliz, P. (2011). Molecular Basis for Interaction of let-7 MicroRNAs with Lin28. Cell. Napoli, C., Lemieux, C., and Jorgensen, R. (1990). Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. Plant Cell 2, 279-289. Newman, M.A., Thomson, J.M., and Hammond, S.M. (2008). Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA 14, 1539-1549. Nishikura, K. (2010). Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem 79, 321-349. Nykanen, A., Haley, B., and Zamore, P.D. (2001). ATP requirements and small interfering RNA structure in the RNA interference pathway. Cell 107, 309-321. O'Connell, R.M., Rao, D.S., Chaudhuri, A.A., Boldin, M.P., Taganov, K.D., Nicoll, J., Paquette, R.L., and Baltimore, D. (2008). Sustained expression of microRNA-155 in hematopoietic stem cells causes a myeloproliferative disorder. J Exp Med 205, 585-594. Okada, C., Yamashita, E., Lee, S.J., Shibata, S., Katahira, J., Nakagawa, A., Yoneda, Y., and Tsukihara, T. (2009). A high-resolution structure of the pre-microRNA nuclear export machinery. Science 326, 1275-1279. 51 Okamura, K., Chung, W.J., Ruby, J.G., Guo, H., Bartel, D.P., and Lai, E.C. (2008). The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs. Nature 453, 803-806. Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89-100. Okamura, K., Liu, N., and Lai, E.C. (2009). Distinct mechanisms for microRNA strand selection by Drosophila Argonautes. Mol Cell 36, 431-444. Paddison, P.J., Caudy, A.A., Bernstein, E., Hannon, G.J., and Conklin, D.S. (2002). Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev 16, 948-958. Park, J.E., Heo, I., Tian, Y., Simanshu, D.K., Chang, H., Jee, D., Patel, D.J., and Kim, V.N. (2011). Dicer recognizes the 5' end of RNA for efficient and accurate processing. Nature 475, 201-205. Park, M.Y., Wu, G., Gonzalez-Sulser, A., Vaucheret, H., and Poethig, R.S. (2005). Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A 102, 36913696. Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12, 1484-1495. Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89. Pertzev, A.V., and Nicholson, A.W. (2006). Characterization of RNA sequence determinants and antideterminants of processing reactivity for a minimal substrate of Escherichia coli ribonuclease III. Nucleic Acids Res 34, 3708-3721. Piskounova, E., Polytarchou, C., Thornton, J.E., LaPierre, R.J., Pothoulakis, C., Hagan, J.P., Iliopoulos, D., and Gregory, R.I. (2011). Lin28A and Lin28B inhibit let-7 microRNA biogenesis by distinct mechanisms. Cell 147, 1066-1079. Piskounova, E., Viswanathan, S.R., Janas, M., LaPierre, R.J., Daley, G.Q., Sliz, P., and Gregory, R.I. (2008). Determinants of microRNA processing inhibition by the developmentally regulated RNA-binding protein Lin28. J Biol Chem 283, 21310-21314. Portier, C., Dondon, L., Grunberg-Manago, M., and Regnier, P. (1987). The first step in the functional inactivation of the Escherichia coli polynucleotide phosphorylase messenger is a ribonuclease III processing at the 5' end. EMBO J 6, 2165-2170. Regnier, P., and Grunberg-Manago, M. (1989). Cleavage by RNase III in the transcripts of the met Y-nus-A-infB operon of Escherichia coli releases the tRNA and initiates the decay of the downstream mRNA. J Mol Biol 210, 293-302. Regnier, P., and Portier, C. (1986). Initiation, attenuation and RNase III processing of transcripts from the Escherichia coli operon encoding ribosomal protein S15 and polynucleotide phosphorylase. J Mol Biol 187, 23-32. Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906. Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). MicroRNAs in plants. Genes Dev 16, 1616-1626. 52 Rivas, F.V., Tolia, N.H., Song, J.J., Aragon, J.P., Liu, J., Hannon, G.J., and Joshua-Tor, L. (2005). Purified Argonaute2 and an siRNA form recombinant human RISC. Nat Struct Mol Biol 12, 340-349. Robertson, H.D. (1982). Escherichia coli ribonuclease III cleavage sites. Cell 30, 669-672. Robertson, H.D., and Dunn, J.J. (1975). Ribonucleic acid processing activity of Escherichia coli ribonuclease III. J Biol Chem 250, 3050-3056. Robertson, H.D., Webster, R.E., and Zinder, N.D. (1967). A nuclease specific for doublestranded RNA. Virology 32, 718-719. Robertson, H.D., Webster, R.E., and Zinder, N.D. (1968). Purification and properties of ribonuclease III from Escherichia coli. J Biol Chem 243, 82-91. Romano, N., and Macino, G. (1992). Quelling: transient inactivation of gene expression in Neurospora crassa by transformation with homologous sequences. Mol Microbiol 6, 33433353. Rotondo, G., and Frendewey, D. (1996). Purification and characterization of the Pac1 ribonuclease of Schizosaccharomyces pombe. Nucleic Acids Res 24, 2377-2386. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. (2006). Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127, 1193-1207. Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007a). Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83-86. Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. (2007b). Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 17, 1850-1864. Ruvkun, G., Wightman, B., and Ha, I. (2004). The 20 years it took to recognize the importance of tiny RNAs. Cell 116, S93-96, 92 p following S96. Saetrom, P., Heale, B.S., Snove, O., Jr., Aagaard, L., Alluin, J., and Rossi, J.J. (2007). Distance constraints between microRNA target sites dictate efficacy and cooperativity. Nucleic Acids Res 35, 2333-2342. Saito, K., Ishizuka, A., Siomi, H., and Siomi, M.C. (2005). Processing of pre-microRNAs by the Dicer-1-Loquacious complex in Drosophila cells. PLoS Biol 3, e235. Schauer, S.E., Jacobsen, S.E., Meinke, D.W., and Ray, A. (2002). DICER-LIKE1: blind men and elephants in Arabidopsis development. Trends Plant Sci 7, 487-491. Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208. Seitz, H., Tushir, J.S., and Zamore, P.D. (2011). A 5'-uridine amplifies miRNA/miRNA* asymmetry in Drosophila by promoting RNA-induced silencing complex formation. Silence 2, 4. Shi, Y., Wang, Y.F., Jayaraman, L., Yang, H., Massague, J., and Pavletich, N.P. (1998). Crystal structure of a Smad MH1 domain bound to DNA: insights on DNA binding in TGF-beta signaling. Cell 94, 585-594. Shin, C., Nam, J.W., Farh, K.K., Chiang, H.R., Shkumatava, A., and Bartel, D.P. (2010). Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38, 789-802. Sohn, S.Y., Bae, W.J., Kim, J.J., Yeom, K.H., Kim, V.N., and Cho, Y. (2007). Crystal structure of human DGCR8 core. Nat Struct Mol Biol 14, 847-853. 53 Song, L., Axtell, M.J., and Fedoroff, N.V. (2010). RNA secondary structural determinants of miRNA precursor processing in Arabidopsis. Curr Biol 20, 37-41. Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146. Stark, A., Kheradpour, P., Parts, L., Brennecke, J., Hodges, E., Hannon, G.J., and Kellis, M. (2007). Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res 17, 1865-1879. Steiner, F.A., Hoogstrate, S.W., Okihara, K.L., Thijssen, K.L., Ketting, R.F., Plasterk, R.H., and Sijen, T. (2007). Structural features of small RNA precursors determine Argonaute loading in Caenorhabditis elegans. Nat Struct Mol Biol 14, 927-933. Sun, G., Yan, J., Noltner, K., Feng, J., Li, H., Sarkis, D.A., Sommer, S.S., and Rossi, J.J. (2009). SNPs in human miRNA genes affect biogenesis and function. RNA 15, 1640-1651. Suzuki, H.I., Yamagata, K., Sugimoto, K., Iwamoto, T., Kato, S., and Miyazono, K. (2009). Modulation of microRNA processing by p53. Nature 460, 529-533. Tang, G., Reinhart, B.J., Bartel, D.P., and Zamore, P.D. (2003). A biochemical framework for RNA silencing in plants. Genes Dev 17, 49-63. Thai, T.H., Calado, D.P., Casola, S., Ansel, K.M., Xiao, C., Xue, Y., Murphy, A., Frendewey, D., Valenzuela, D., Kutok, J.L., et al. (2007). Regulation of the germinal center response by microRNA-155. Science 316, 604-608. Tomari, Y., Du, T., and Zamore, P.D. (2007). Sorting of Drosophila small silencing RNAs. Cell 130, 299-308. Trabucchi, M., Briata, P., Filipowicz, W., Ramos, A., Gherzi, R., and Rosenfeld, M.G. (2010). KSRP promotes the maturation of a group of miRNA precursors. Adv Exp Med Biol 700, 36-42. Tsutsumi, A., Kawamata, T., Izumi, N., Seitz, H., and Tomari, Y. (2011). Recognition of the premiRNA structure by Drosophila Dicer-1. Nat Struct Mol Biol 18, 1153-1158. Tuschl, T., Zamore, P.D., Lehmann, R., Bartel, D.P., and Sharp, P.A. (1999). Targeted mRNA degradation by double-stranded RNA in vitro. Genes Dev 13, 3191-3197. Ui-Tei, K., Naito, Y., Nishi, K., Juni, A., and Saigo, K. (2008). Thermodynamic stability and Watson-Crick base pairing in the seed duplex are major determinants of the efficiency of the siRNA-based off-target effect. Nucleic Acids Res 36, 7100-7109. van der Krol, A.R., Mur, L.A., Beld, M., Mol, J.N., and Stuitje, A.R. (1990). Flavonoid genes in petunia: addition of a limited number of gene copies may lead to a suppression of gene expression. Plant Cell 2, 291-299. Vermeulen, A., Behlen, L., Reynolds, A., Wolfson, A., Marshall, W.S., Karpilow, J., and Khvorova, A. (2005). The contributions of dsRNA structure to Dicer specificity and efficiency. RNA 11, 674-682. Viswanathan, S.R., Daley, G.Q., and Gregory, R.I. (2008). Selective blockade of microRNA processing by Lin28. Science 320, 97-100. Wang, D., Zhang, Z., O'Loughlin, E., Lee, T., Houel, S., O'Carroll, D., Tarakhovsky, A., Ahn, N.G., and Yi, R. (2012). Quantitative functions of Argonaute proteins in mammalian development. Genes Dev 26, 693-704. Wang, Y., Medvid, R., Melton, C., Jaenisch, R., and Blelloch, R. (2007). DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal. Nat Genet 39, 380385. 54 Warf, M.B., Johnson, W.E., and Bass, B.L. (2011). Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 17, 563-577. Weinberg, D.E., Nakanishi, K., Patel, D.J., and Bartel, D.P. (2011). The inside-out mechanism of Dicers from budding yeasts. Cell 146, 262-276. Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862. Wu, H., Henras, A., Chanfreau, G., and Feigon, J. (2004). Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proc Natl Acad Sci U S A 101, 8307-8312. Wu, H., Xu, H., Miraglia, L.J., and Crooke, S.T. (2000). Human RNase III is a 160-kDa protein involved in preribosomal RNA processing. J Biol Chem 275, 36957-36965. Wu, H., Yang, P.K., Butcher, S.E., Kang, S., Chanfreau, G., and Feigon, J. (2001). A novel family of RNA tetraloop structure forms the recognition site for Saccharomyces cerevisiae RNase III. EMBO J 20, 7240-7249. Yang, W., Chendrimada, T.P., Wang, Q., Higuchi, M., Seeburg, P.H., Shiekhattar, R., and Nishikura, K. (2006). Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13, 13-21. Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596. Yeom, K.H., Lee, Y., Han, J., Suh, M.R., and Kim, V.N. (2006). Characterization of DGCR8/Pasha, the essential cofactor for Drosha in primary miRNA processing. Nucleic Acids Res 34, 4622-4629. Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. (2003). Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17, 3011-3016. Yoda, M., Kawamata, T., Paroo, Z., Ye, X., Iwasaki, S., Liu, Q., and Tomari, Y. (2010). ATPdependent human RISC assembly pathways. Nat Struct Mol Biol 17, 17-23. Young, R.A., and Steitz, J.A. (1978). Complementary sequences 1700 nucleotides apart form a ribonuclease III cleavage site in Escherichia coli ribosomal precursor RNA. Proc Natl Acad Sci U S A 75, 3593-3597. Zamore, P.D., Tuschl, T., Sharp, P.A., and Bartel, D.P. (2000). RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 2533. Zeng, Y., and Cullen, B.R. (2005). Efficient processing of primary microRNA hairpins by Drosha requires flanking nonstructured RNA sequences. J Biol Chem 280, 27595-27603. Zeng, Y., Yi, R., and Cullen, B.R. (2005). Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J 24, 138-148. Zhang, H., Kolb, F.A., Brondani, V., Billy, E., and Filipowicz, W. (2002). Human Dicer preferentially cleaves dsRNAs at their termini without a requirement for ATP. EMBO J 21, 5875-5885. Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. (2004). Single processing center models for human Dicer and bacterial RNase III. Cell 118, 57-68. Zhang, K., and Nicholson, A.W. (1997). Regulation of ribonuclease III processing by doublehelical sequence antideterminants. Proc Natl Acad Sci U S A 94, 13437-13441. Zhang, X., and Zeng, Y. (2010). The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 38, 7689-7697. 55 Zhou, R., Czech, B., Brennecke, J., Sachidanandam, R., Wohlschlegel, J.A., Perrimon, N., and Hannon, G.J. (2009). Processing of Drosophila endo-siRNAs depends on a specific Loquacious isoform. RNA 15, 1886-1895. 56 57 58 Chapter 2. Beyond secondary structure: primary-sequence determinants license pri-miRNA hairpins for processing Contents Summary ........................................................................................................................................61 Introduction ....................................................................................................................................61 Results ............................................................................................................................................63 Existence of auxiliary elements for efficient pri-miRNA processing ................................... 63 Functional substrates from large libraries of pri-miRNA variants. ...................................... 66 Importance of an 11 bp basal stem flanked by ≥ 5 unstructured nucleotides ....................... 70 A basal UG motif enhances processing ................................................................................ 73 The broadly conserved CNNC motif enhances processing .................................................. 75 Loop and apical stem elements can enhance processing ...................................................... 79 Rescue of C. elegans miRNA expression in human cells ..................................................... 81 Discussion ......................................................................................................................................83 Experimental Procedures ...............................................................................................................85 Ectopic pri-miRNA expression in HEK293 cells and S2 cells ............................................. 85 Whole-cell lysate with overexpressed Microprocessor complex.......................................... 86 Competitive binding and cleavage assays............................................................................. 87 Synthesis of pools of pri-miRNA variants ............................................................................ 87 In vitro selection and high-throughput sequencing ............................................................... 88 Sequence analysis ................................................................................................................. 89 Positional enrichments of sequence motifs ........................................................................... 91 59 pri-miRNA collections .......................................................................................................... 91 Accession Numbers .............................................................................................................. 92 Acknowledgements ............................................................................................................... 92 Supplemental Materials .................................................................................................................99 Supplemental Figures............................................................................................................ 99 Supplemental Table S1: Oligonucleotides used in the in vitro selections .......................... 112 Supplemental Table S2. Pri-miRNA collections ............................................................... 116 60 Summary To use microRNAs to down-regulate mRNA targets, cells must first process these ~22 nt RNAs from primary transcripts (pri-miRNAs). These transcripts form RNA hairpins important for processing, but additional unknown determinants must distinguish pri-miRNAs from the many other hairpin-containing transcripts expressed in each cell. Illustrating the complexity of this recognition, we show that most Caenorhabditis elegans pri-miRNAs lack determinants required for processing in human cells. To find these determinants, we generated >1011 variants of four human pri-miRNAs, sequenced millions that retained function and compared them with the starting variants. Our results confirmed the importance of pairing in the stem and revealed three primary-sequence determinants, including a CNNC motif found downstream of most primiRNA hairpins in bilaterian animals but not in nematodes. Adding this and other determinants to C. elegans pri-miRNAs imparted efficient processing in human cells, further illustrating the importance of primary-sequence determinants for distinguishing pri-miRNAs from other hairpincontaining transcripts. Introduction MicroRNAs (miRNAs) are ~22 nt RNAs that pair to the mRNAs of protein-coding genes to direct the post-transcriptional repression of these mRNAs (Bartel, 2004, 2009). In animals, miRNAs are processed from hairpin-containing primary transcripts (pri-miRNAs) that undergo successive cleavage steps before yielding the functional small RNA. In the canonical pathway, pri-miRNAs are first cleaved in the nucleus by the Microprocessor, a protein complex containing an RNase III enzyme Drosha and its cofactor DGCR8 (called Pasha and Psh-1 in Drosophila melanogaster and C. elegans, respectively) (Lee et al., 2003; Denli et al., 2004; Gregory et al., 2004; Han et al., 2004; Landthaler et al., 2004). The liberated portion of the hairpin, termed the pre-miRNA, is exported to the cytosol (Lee et al., 2002; Yi et al., 2003; Bohnsack et al., 2004; Lund et al., 2004), where it is cleaved by the RNase III enzyme Dicer (Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001) approximately two helical turns from the base of the hairpin to remove the loop (Lee et al., 2004a) and generate two ~22 nt strands that pair to each other with 2 nt 3′ overhangs (Lim et al., 2003b). One strand of each duplex is loaded into an Argonaute protein to form the core of the silencing complex, whereas the other strand is discarded (Khvorova et al., 2003; Schwarz et al., 2003; Liu et al., 2004; Meister et al., 2004). 61 Noncanonical pathways also contribute to the miRNA repertoire of animal cells. For example, mirtron miRNA precursors are excised from the primary transcript by the spliceosome rather than Drosha; after debranching, the excised mirtron folds into a hairpin that enters the canonical pathway at the step of export to the cytoplasm (Okamura et al., 2007; Ruby et al., 2007a). Tailed mirtrons and endogenous small-hairpin RNAs (shRNAs) are other types of noncanonical precursors that bypass Microprocessor cleavage (Babiarz et al., 2008), and miR451 is unusual in bypassing Dicer cleavage (Cheloufi et al., 2010; Cifuentes et al., 2010). A long-standing mystery of canonical miRNA biogenesis has been how the animal cell determines which of its many hairpin-containing transcripts are recognized by the Microprocessor to enter the miRNA pathway. Determinants of subsequent Dicer cleavage are better understood (Zhang et al., 2004; Macrae et al., 2006; Park et al., 2011), as illustrated by both the success in designing artificial Dicer substrates that bypass Drosha processing (Brummelkamp et al., 2002; Paddison et al., 2002) and the success in accurately predicting mirtrons from a single genomic sequence, without considering evolutionary conservation (Chung et al., 2011). With regard to Microprocessor recognition, sequences within 40 nt upstream and 40 nt downstream of the pre-miRNA hairpin are required for ectopic miRNA expression (Chen et al., 2004), which is consistent with the observation that these flanking sequences tend to pair to each other to extend the stem another turn of the helix beyond the site of cleavage (Lim et al., 2003b). The pairing within this helical extension and a lack of pairing immediately following the last pair of the basal stem is required for productive Microprocessor recognition, as illustrated by in vitro studies demonstrating that the human Microprocessor complex can cleave artificial sequences that form perfectly paired stems flanked by single-strand RNA (ssRNA) (Han et al., 2006). However, these known structural determinants cannot fully explain the specificity of Microprocessor cleavage. Many cellular transcripts have paired regions flanked by ssRNA, and most of these are not endogenous substrates of the Microprocessor. Indeed, attempts to predict canonical miRNA hairpins from genomic sequence yield thousands or millions of false-positive predictions, which must be eliminated using additional criteria, such as analysis of conservation or experimental evaluation (Lim et al., 2003a; Lim et al., 2003b; Bentwich et al., 2005; Berezikov et al., 2006; Chiang et al., 2010), illustrating a large gap in our understanding of how the Microprocessor distinguishes between bona fide pri-miRNA substrates and other transcribed hairpins. 62 In this study, we found that transcripts that enter the miRNA pathway in C. elegans failed to do so in human cells. Thus, the definition of a pri-miRNA in one species differs from that in another, which adds a new dimension to the mystery of pri-miRNA recognition. To elucidate sequence and structural features of human pri-miRNAs, we generated >1011 variants of four human pri-miRNA sequences and sequenced millions that were cleaved by the human Microprocessor complex. Comparison of the cleaved variants with the initial pool of variants revealed sequence and structural features important for Microprocessor recognition and cleavage. These features were evolutionarily conserved in non-nematode lineages and sufficient to increase the processing efficiency of C. elegans hairpins in human cells. Results Existence of auxiliary elements for efficient pri-miRNA processing To examine whether miRNA processing features are shared across animals, we ectopically expressed C. elegans, D. melanogaster and human pri-miRNAs in human cells and compared the yields of mature miRNA. For each miRNA investigated, the hairpin and ~100 nt of flanking genomic sequence were expressed upstream of the human mir-1-1 pri-miRNA on a bicistronic transcript under control of the CMV promoter (Figure S1A). Cells transfected with each vector were harvested and pooled, and small RNAs were sequenced. As reported previously, most human miRNAs were efficiently expressed (Chiang et al., 2010), as were four of nine tested Drosophila miRNAs (Figure 1A). However, the tested C. elegans miRNAs were less efficiently expressed in HEK293 cells (Figure 1A, p = 1.4×10–5, Wilcoxon rank-sum test). Likewise, in Drosophila S2 cells, C. elegans miRNAs were expressed less efficiently than were human miRNAs (p = 0.024). These results indicated that most nematode pri-miRNAs are missing determinants required for proper processing by human or insect cells. To isolate the processing defect, we probed for processing intermediates. For each inefficiently expressed miRNA examined, the primary transcript was present (Figure S1A and S1B), but no pre-miRNA or mature miRNA was detected (Figure 1B). These results suggested that C. elegans pri-miRNAs are not productively recognized as substrates of the Microprocessor in the first step of miRNA maturation. To assay directly for binding to the human Microprocessor 63 – + – + – + – + D P22 P1 –1 –51 –41 1.00 0.09 1.00 –31 –21 –11 5p position 0.08 1.00 +1 –13 +11 –1 0.09 1.00 0.8 0.7 0.6 0.5 0.33 P1 cel-mir-235 cel-mir-60 cel-mir-59 cel-mir-50 cel-mir-46 600 400 200 0 0 Fly Worm C Nitrocellulose filtration Reference Basal stem 1 0.9 P11 P21 P31 P41 P51 pre-miRNA position 0.5 1 11 hsa-mir-1-1 hsa-mir-128-1 hsa-mir-205 cel-lin-4 cel-lsy-6 cel-mir-40 cel-mir-50 cel-mir-230 cel-mir-240 293 cells 21 31 41 3p position cel-mir-50 + 0.07 1151 6752 2777 4919 10362 3037 3271 5024 3705 4171 499 1000 cel-mir-40 – 1.00 800 cel-lsy-6 + cel-mir-44 Human cel-lin-4 hsa-mir-1-1 – cel-lsy-6 Microprocessor Average BLS + 0.13 Query miRNA: hsa-mir-122 – 1.00 Query + 0.21 Reference – 1.00 DroshaTN DGCR8 1.65 Query 1.00 hsa-mir-1-1 hsa-mir-17 hsa-mir-18a hsa-mir-19a hsa-mir-20a hsa-mir-19b-1 hsa-mir-92a-1 hsa-mir-122 hsa-mir-125a hsa-mir-128-1 hsa-mir-133a-1 hsa-mir-138-2 hsa-mir-142 hsa-mir-205 dme-mir-2a-1 dme-mir-4 dme-mir-5 dme-mir-34 dme-mir-92a dme-mir-125 dme-mir-286 dme-mir-279 dme-mir-281-1 cel-mir-2 cel-lin-4 cel-lsy-6 cel-mir-34 cel-mir-40 cel-mir-43 cel-mir-44 cel-mir-46 cel-mir-50 cel-mir-59 cel-mir-60 cel-mir-124 cel-mir-235 cel-mir-240 Normalized hairpin reads A 30 25 20 S2 cells 15 10 5 Human 1 Basal stem 0.9 0.8 0.7 0.6 51 Worm B pre-miRNA 70 mature miRNA 20 15 fmol cognate control 750 300 Microprocessor, we established a competitive-binding assay that compared the ability of different pri-miRNAs to bind catalytically-deficient Drosha and DGCR8 (Figure 1C). Whereas human pri-mir-122 bound the Microprocessor somewhat better than did human pri-mir-125a, all seven tested C. elegans pri-miRNAs bound worse (Figure 1C). Thus, most C. elegans primiRNAs are missing some of the determinants needed for efficient recognition and processing by the Microprocessor complex. As a result, many transcripts recognized as pri-miRNAs in C. elegans cells are not recognized as pri-miRNAs by human cells. Known features of C. elegans and human pri-miRNAs appear largely similar, as illustrated by the accuracy of an algorithm trained on C. elegans pri-miRNA features in predicting most miRNA genes conserved in human, mouse and fish (Lim et al., 2003a). Nonetheless, the poor specificity of this algorithm when predicting non-conserved miRNAs supports the idea that unknown features also exist and help the cell define authentic pri-miRNAs. To look for clues regarding previously unknown features that might be required for human primiRNA recognition, we analyzed the sequence immediately flanking human pre-miRNAs for conservation in other vertebrates. In a meta-analysis of human pri-miRNAs conserved in other Figure 1. The existence of auxiliary elements that specify human pri-miRNA transcripts. (A) Processing of human, fly, and worm pri-miRNAs in human HEK293T cells and Drosophila S2 cells. Cells were transfected with plasmids expressing the indicated pri-miRNA hairpins with ~100 flanking genomic nucleotides on each side of each hairpin, and total RNA was pooled for small-RNA sequencing. Graphs plot the small RNA reads derived from the indicated pri-miRNAs. (B) Attempted detection of pre-miRNA and mature miRNA production in HEK293T cells. RNA blots of total RNA isolated from cells transfected with the indicated pri-miRNA were probed for the cognate miRNA. Blots also included lanes with 15 fmol in vitro transcribed standards derived from the corresponding pri-miRNAs (pri-RNA controls). (C) Relative binding of C. elegans and human pri-miRNAs to the Microprocessor. In the competitive binding assay (left, schematic), radiolabeled query pri-miRNA was mixed with the radiolabeled shorter reference pri-miRNA (human mir-125a) and incubated with immunopurified, catalytically impaired Drosha (Drosha-TN) and DGCR8 in conditions of RNA excess. Bound RNA was isolated on nitrocellulose filters and eluted for analysis on a denaturing gel. Phosphorimaging (right) indicated the relative amounts of query and reference RNA in the input (–) and bound to the Microprocessor (+). Numbers below each lane indicate the ratio of bound query miRNA relative to bound reference RNA, normalized to the input ratio. (D) Nucleotide conservation of human pri-miRNAs conserved to mouse. At each position, the average branch-length score (BLS) is plotted, in which each BLS indicates the phylogenetic branch lengths of all the aligned mammalian species in which the ancestral identity was preserved, divided by the branch lengths of all the species in which the miRNA is preserved. Positions are numbered based on the inferred Drosha cleavage site (inset); negative indices are upstream of the 5p Drosha cleavage site, indices with “P” count from the 5′ end of the pre-miRNA, and positive indices are downstream of the 3p Drosha cleavage site. 65 mammals, residues extending 13 nt upstream of the 5p Drosha cleavage site (i.e., the site corresponding to the 5′ end of the pre-miRNA) and 11 nt downstream of the 3p Drosha cleavage site were conserved above background, consistent with the importance of the ~11 bp basal stem for pri-miRNA processing (Figure 1D). Upstream of the hairpin, conservation dropped rapidly with distance from the cleavage site, with just a few nucleotides immediately flanking the basal stem conserved above background. Conservation also dropped on the 3p side of the pre-miRNA, but not quite as rapidly in the region 15–25 nt from the cleavage site (Figure 1D). This asymmetry in the conservation drop-off hinted at potential determinants downstream of the hairpin. However, the overall weakness of the conservation signal beyond the basal stem suggested that any determinants in these flanking regions might either be at variable distances from the hairpin or present in some subsets of miRNAs but not others, making them difficult to identify using only comparative sequence analyses. Functional substrates from large libraries of pri-miRNA variants. To identify sequence and structural features important for Microprocessor recognition and cleavage, we generated >1011 variants of a pri-miRNA, sequenced millions that retained function and compared these sequences to those of the initial pool of variants (Figure 2A). At each variable nucleotide position, most molecules had the wild-type residue, and a minority had the other three alternatives. This approach resembled classical in vitro selection approaches, particularly those that started with degenerate libraries with the goal of characterizing nucleic acids known to function as ligands and substrates (Ellington and Szostak, 1990; Bartel et al., 1991; Breaker et al., 1994), except we did not perform multiple rounds of selection. Instead, we collected the variants that were cleaved by the Microprocessor and then directly prepared them for high-throughput sequencing. Similar strategies have been used for DNA-binding and ribozyme experiments (Zykovich et al., 2009; Jolma et al., 2010; Pitt and Ferre-D'Amare, 2010; Slattery et al., 2011). Because the differences from the starting pool were from a single round of cleavage, and because both the starting pool and the selected pool were subject to the same number of transcription, reverse-transcription and amplification steps, any differences observed between the two pools were subject to neither the compounding effects of multiple rounds nor the confounding effects of amplification biases. Moreover, because in each sample millions of molecules were sequenced, the differences were not influenced by stochastic sampling of small 66 numbers. Thus, compared to the results of classical approaches, enrichment or depletion of a residue was a more direct reflection of its contribution to biochemical specificity. To query sequence and structural determinants at the base of the hairpin and flanking the hairpin, pools of variants were constructed in which residues >8 nt upstream of the 5p Drosha cleavage site or >8 nt downstream of the 3p cleavage site were varied while the remaining hairpin residues were not varied. Suspecting that different pri-miRNAs might use different determinants, four different pools were constructed, based on the pri-miRNAs of human mir125a, mir-16-1, mir-30a, and mir-223, respectively. Each pool was produced by in vitro transcription of a DNA template constructed using degenerate oligonucleotides in which variable positions had non-wild-type residues introduced at a frequency of 21%. For example, at a variable position in which the wild-type residue was an A, 79% of the pool molecules would have an A, whereas 7% would have a C, 7% would have a G, and 7% would have a U. The other key design element was that, borrowing from a strategy used to identify variants of RNAcleaving ribozymes (Pan and Uhlenbeck, 1992), each variant was circularized (Figure 2A). Without circularization, some variable nucleotides would have resided in the upstream cleavage product, whereas others would have resided in a separate downstream product, making it impossible to reconstruct the starting variant from the sequenced products. With circularization, all the variable nucleotides resided in a single product, the sequence of which revealed the starting variant, thereby enabling a full analysis of sequence interdependencies and covariation. In vitro cleavage of circularized pri-miRNA variants was carried out in whole-cell lysate of HEK293T cells overexpressing Drosha and DGCR8. Very little pri-miRNA cleavage occurred in lysate from cells not overexpressing the proteins, which indicated that the dominant pri-miRNA cleavage activity depended on overexpressed Drosha or DGCR8 (Figure 2B). At a time in which the lysate cleaved linear and circularized pri-mir-125a nearly to completion, much of the pool of pri-mir-125a variants remained uncleaved, which indicated that substitutions in the basal stem and flanking regions can attenuate Microprocessor cleavage in vitro (Figure 2C). Analogous results were obtained with pools of variants based on the other three pri-miRNAs (data not shown). Variants that were cleaved by the Microprocessor were recovered by gel purification, ligated to sequencing adaptors, and prepared for high-throughput, paired-end sequencing (Figure 2A). Sequence analyses were restricted to products cleaved at the wild-type processing sites, 67 –0.4 5p Position 25 27 29 31 33 35 37 39 41 43 45 25 27 29 31 33 35 37 39 41 43 45 25 27 29 31 33 3p Position 45 C U 47 43 A G 45 41 +1 43 39 37 Stem-loop 41 39 0.4 35 –9 33 –1 31 0.6 29 0.8 27 25 23 21 19 17 15 13 293T Transfection 37 13 9 11 Drosha DGCR8 B 35 23 23 23 9 11 21 0 21 –0.2 21 0.2 19 0.4 19 0.6 19 0.8 17 hsa-mir-223 17 0 17 –0.2 15 0.2 15 0.4 13 0.6 15 0.8 13 hsa-mir-30a 9 0 11 0.2 9 hsa-mir-125a 11 Drosha DGCR8 Mock Circular pri-miRNA substrate (pool of variants) –9 –11 –13 –15 –17 –19 –21 –23 –25 –27 Nonfunctional variants –29 1 –31 Information content (bits) Drosha DGCR8 –33 1.2 –35 –0.4 –37 1 –39 1.2 –41 –0.4 –43 1 –45 1.2 –47 –51 –49 –47 –45 –43 –41 –39 –37 –35 –33 –31 –29 –27 –25 –23 –21 –19 –17 –15 –13 –11 –9 –0.4 –49 –47 –45 –43 –41 –39 –37 –35 –33 –31 –29 –27 –25 –23 –21 –19 –17 –15 –13 –11 –9 Information content (bits) 1 –49 –47 –45 –43 –41 –39 –37 –35 –33 –31 –29 –27 –25 –23 –21 –19 –17 –15 –13 –11 –9 Information content (bits) 1.2 –49 Information content (bits) A C pri-mir-125a WT WT Pool Topology Linear Circ. Circ. Functional variants Splint-ligated product Library for paired-end sequencing D +9 Invariant residues –14 +12 –0.2 hsa-mir-16-1 0.8 0.6 0.4 0.2 –0.2 0 which were inferred from the dominant reads in small-RNA sequencing data (Landgraf et al., 2007; Bar et al., 2008; Chiang et al., 2010; Witten et al., 2010), except for miR-16-1* and miR223, which appear to undergo post-cleavage 3′-end trimming (Han et al., 2011). Because product ligation and computational analysis both selected for cleavage at the wild-type site, nucleotide changes that altered the site of Microprocessor cleavage were not distinguished from those that abolished cleavage. At each variant position, we compared the odds of each nucleotide in the properly cleaved pool to the odds of that nucleotide in the starting pool. These odds ratios were used to calculate the information content of each nucleotide possibility at each variant position—the greater the information content, the more favorable the influence on Microprocessor recognition and cleavage, with positive values indicating a favorable influence and negative values indicating a disruptive influence. Information content was chosen as the metric for displaying enrichment or depletion in the cleaved pool because it effectively indicated the relative influence of the nucleotide, regardless of whether it was the wild-type possibility or one of the other three possibilities. Some positions had substantial enrichment of one or more nucleotide possibilities, with corresponding depletion of the others (Figure 2D). To validate the influence of representative positions on recognition and cleavage, pri-mir-125a mutants were tested in the competitivecleavage assay, comparing cleavage to that of wild-type pri-mir-125a (Figure S2A and B). The Figure 2. In vitro selection for functional pri-miRNA variants. (A) Schematic of the selection in which variable sequences (red) flanked the Drosha cleavage site. PrimiRNA variants were circularized by ligation and incubated in whole-cell lysate from HEK293T cells overexpressing Drosha and DGCR8. Cleaved variants were gel-purified, ligated to adaptors, reverse transcribed and amplified for high-throughput sequencing. (B) Cleavage of linear hsa-let-7a in whole-cell lysate from HEK293T cells (mock) and whole-cell lysate from HEK293T cells transfected with plasmids expressing Drosha and DGCR8. Incubations were for 90 minutes. Body-labeled reactants and products were resolved on a denaturing polyacrylamide gel and visualized by phosphorimaging. (C) Cleavage of linear and circular pri-mir-125a (WT linear and WT circ., respectively) and a pool of circular hsa-mir-125a variants (pool). RNAs were incubated for 5 minutes in the extracts supplemented with Drosha and DGCR8 and analyzed as in (B). The WT linear RNA was 5′ endlabeled; the other RNAs were body-labeled. (D) Enrichment and depletion at variable residues in functional pri-miRNA variants. At each varied position (inset, red inner line), information content was calculated for each residue (green, cyan, black, and red for A, C, G, and U, respectively). 69 results of changing specific residues closely matched those predicted from analysis of sequenced variants, thereby confirming that the calculated relative cleavage faithfully reflected the influence on Microprocessor recognition and cleavage in vitro. The effects of these changes were also confirmed in HEK293T cells (Figure S2C). Importance of an 11 bp basal stem flanked by ≥ 5 unstructured nucleotides For all four miRNAs, some of the variable residues with the greatest influence fell within the basal stem (Figure 2D). The high information content at these paired positions could be due to either the importance of primary sequence or the need to pair to the nucleotide on the other arm of the hairpin, or both. To distinguish between these possibilities, we examined covariation matrices, generated by calculating the odds of each pair of nucleotide identities at each predicted base pair, relative to the odds of those identities in the initial pool. These matrices showed overall preference for Watson–Crick geometry at each of these basal pairs, with the G:U wobble being the most frequently preferred non-Watson–Crick alternative (Figure 3A, S3A). For example, the most favored alternatives to the C:G pair at positions –11 and +9 of mir-125a are the G:C and U:A pairs, and to a lesser extent the A:U, G:U and U:G pairs (Figure 3A). In fact, Watson–Crick pairing was strongly preferred even if it did not occur in the wild-type sequence. For example, the wild-type A:C pair at positions –12 and +10 of mir-30a was disfavored, whereas the four Watson–Crick pairs were most strongly favored (Figure 3A). Similarly, the bulged A at position +10 of mir-223 was preferentially incorporated into an alternative continuous helix (Fig. S3A–B). Layered on top of the overall preference for Watson–Crick pairing were primarysequence preferences specific to each basal pair. For example, at positions –11 and +9 the C:G pair was strongly favored over the A:U alternative. The primary-sequence preference was most acute at position –13. At this position the preference for a G was often stronger than the preference for Watson–Crick pairing, in that the G:U wobble and sometimes the G:A or G:G mismatches were less disruptive than were the other three Watson–Crick alternatives (Figure 3A). We conclude that primary-sequence features supplement and sometimes supersede structural features important for basal-stem recognition. 70 Using the same covariation analysis, we screened for evidence of Watson–Crick pairing between all possible pairs of varied positions. For each of these >3000 possible pairs, the degree of Watson–Crick preference was evaluated using a scoring metric that compared the average odds of Watson–Crick pairs to that of non-Watson–Crick alternatives. In each case, the highestscoring pairs were those of the basal stem (Fig. S3C). In the case of mir-223, the highest scoring pairs also included the alternative pairs that incorporated the bulged A at +10 into a contiguous helix. For each pri-miRNA, we inspected the next four highest-scoring pairs, and in each case, the covariation matrix did not appear consistent with Watson–Crick pairing (data not shown). These results indicate that in the sequence flanking the pre-miRNA, Watson–Crick pairing important for Microprocessor recognition and cleavage is restricted to the basal stem. The Microprocessor recognizes the junction between the miRNA hairpin and flanking ssRNA, and thereby positions the active site to cleave approximately one helical turn (11 bp of A-form RNA) from the base of the duplex (Han et al., 2006; Yeom et al., 2006). To examine whether a specific number of pairs was preferred in the basal stem, we calculated the relative cleavage of different stem-length variants, normalizing to that of an 8 bp stem. Invariant mismatches within symmetric internal loops (e.g., the A:C mismatch at positions –6 and +4 of mir-30a) were assumed to be non-canonical pairs that stacked within the stem to contribute to its length, whereas mismatches at varied positions were assumed to disrupt further pairing and thereby terminate the inferred basal stem. For all four pri-miRNAs, an 11 bp basal stem was optimal (Figure 3B), consistent with the single-turn measurement for cleavage-site selection. Indeed, an 11 bp basal stem was preferred for mir-223, even though the wild-type sequence was predicted to form a 12 bp stem (Figures 3A and S3A). For most pri-miRNAs, however, the efficiency of the 12-pair stem approached that of the 11-pair stem (Figure 3B). This tolerance of a twelfth pair hinted at the influence of other features, such as the G at position –13, in overriding the single-turn measurement to specify the precise site of cleavage. The model for single-turn measurement also posits that the nucleotides immediately flanking the basal stem are unstructured (Han et al., 2006; Yeom et al., 2006). To test this part of the model, we used RNAfold (Hofacker and Stadler, 2006) to predict the minimum free-energy structure of all sequenced miRNA variants in the selected pools and the initial pools. For each sequence with wild-type predicted pairing in the stem, the number of nucleotides between the base of the stem and the most proximal two consecutive structured bases was recorded. 71 −0.2 −0.4 −0.6 −0.8 −1 Relative cleavage Timepoint 2 U 0.34 –1.02 –0.04 Position 10 C G U A –1.48 –1.18 –1.20 –0.23 C –1.11 –1.60 G –0.54 U –0.15 –1.50 –0.40 –0.76 1.17 –1.14 0.17 –0.41 –0.16 Position 11 C G U A –0.50 –0.93 –0.58 –0.22 C –0.98 –1.61 –0.40 –1.02 G –0.35 U –1 U C C-G +1 G-U U-A G A A G C-G G-U A-U –11 C-G +9 –12 U-A +10 –13 G-C +11 U C A A A U C A 0.36 –0.85 –0.30 –0.48 0.81 –0.27 hsa-mir-16-1 Wildtype basal stem 0.04 –0.04 –0.74 –0.28 –0.30 hsa-mir-125a A –4.17 –3.5 –3.45 –0.39 C –3.29 –3.72 2.5 –2.45 G –2.85 0.41 –2.6 0.14 U 0.09 A –2.04 –0.67 –1.06 0.35 C –1.25 –1.83 1.03 –0.95 G –0.22 0.7 0.39 U 0.74 0.15 –1.44 –0.03 –0.04 Position 11 C G U Pair 11 A A –1.12 –2.05 –1.23 –0.93 C –1.51 –3.12 –0.69 –1.65 G 0.32 U –0.57 –1.7 –0.71 –1.22 0.74 hsa-mir-30a Wildtype basal stem –0.16 0.37 hsa-mir-30a 64 32 32 32 16 16 16 16 8 8 8 8 4 4 4 4 2 2 2 2 1 1 1 9 10 11 12 13 Basal stem pairs 8 9 10 11 12 13 Basal stem pairs 8 Position 9 C G U A –3.34 –3.01 –3.07 –0.98 C –2.18 –2.93 2.64 G –0.98 0.13 G –1.60 U –1.66 –3.63 –1.98 –1.88 –2.18 –1.29 –0.08 1 9 10 11 12 13 Basal stem pairs 2.01 –0.51 –0.21 hsa-mir-223 64 32 8 A –1 C U –0.27 –2.72 –1.74 –1.82 A G-U +1 C-G Position 10 G-C Pair 10 A C G U A C A –0.92 –0.65 –0.22 1.63 G-U U-A C 0.18 –1.45 2.39 –0.14 G-C G 0.00 1.57 0.58 0.99 A-U U 1.63 –1.00 1.38 0.48 –11 C-G +9 C +10 –12 A –13 G-C +11 U U Position 11 Pair 11 A U C C G U G G A –1.80 –2.40 –1.15 –0.96 U G C –2.37 –2.92 –0.95 –2.17 –2.16 –0.66 –1.29 Position 10 C G U Pair 10 A hsa-mir-16-1 64 Position –11 –0.90 1.14 –0.90 Position –11 G Pair 11 A 64 Timepoint 1 –0.95 –1.27 Pair 10 A hsa-mir-125a Wildtype basal stem B –1.37 –1.22 –1.35 –0.03 Pair 9 pre-miRNA Position –12 0 A C Position 9 C G U A Position –13 0.2 –1 G G A-U +1 U-G C-G U C C G U U G-C A-U –11 C-G +9 –12 C-G +10 –13 G-C +11 U C U C G C U A Pair 9 pre-miRNA Position –12 0.4 Odds ratio (log2) 0.6 Position –11 0.8 Position –12 1 Position 9 C G U A Position –13 Pair 9 pre-miRNA Position –13 A 8 9 10 11 12 13 Basal stem pairs C 1 hsa-mir-125a 0.8 0.2 0 −0.2 −0.4 3p unstructured nucleotides 0.4 Odds ratio (log2) 0.6 −0.6 −0.8 5 10 10 8 8 6 6 5 4 0 0 0 2 4 6 8 10 12 5 10 5p unstructured nucleotides 0 10 6 5 4 0 10 8 5 4 2 2 0 12 10 10 10 hsa-mir-223 hsa-mir-30a hsa-mir-16-1 12 12 2 0 0 2 4 6 8 10 12 5 10 5p unstructured nucleotides 0 0 0 0 2 4 6 8 10 12 5 10 5p unstructured nucleotides 0 0 5 10 5p unstructured nucleotides −1 Timepoint 1 Timepoint 2 16 Relative cleavage D hsa-mir-125a 16 hsa-mir-16-1 16 hsa-mir-30a 16 8 8 8 8 4 4 4 4 2 2 2 2 1 0 2 4 6 8 10 12 14 16 18 20 Flanking unstructured nucleotides 1 0 2 4 6 8 10 12 14 16 18 20 Flanking unstructured nucleotides 1 0 2 4 6 8 10 12 14 16 18 20 Flanking unstructured nucleotides 1 hsa-mir-223 0 2 4 6 8 10 12 14 16 18 20 Flanking unstructured nucleotides Although at best this metric was a rough estimate of the size of the unstructured region flanking the base of the helix, we observed a clear correlation between the number of flanking unstructured nucleotides and enrichment in the selection (Figure 3C). Pairing was tolerated in one flank, provided that the other flank contained at least 5–7 unstructured bases, a result consistent with the observation that pri-miRNAs are cleaved with partial efficiency when only one flanking segment is present (Zeng and Cullen, 2005; Han et al., 2006). When summing the flanking unstructured bases from both sides, the optimum plateaued at ~9–18 nt, depending on the pre-miRNA (Figure 3D). A basal UG motif enhances processing Among the nucleotides upstream of the stem-loop, the most striking enrichment was for a U at position –14 (Figure 2D). This U immediately preceded the position that, as mentioned above, displayed a strong primary-sequence preference for a G, either when paired with a C at position +11 to form the most basal Watson–Crick pair of the helix or when partnered with a wobble or mismatch. The U and G at positions –14 and –13 both appeared to contribute independently to recognition; variants with either a U or a G were enriched over variants with neither, and variants containing both were even more enriched (Figure 4A). For mir-223, the UG at positions –14 and –13 was preferred (Figure 2D), even though wild-type mir-223 has a UG at Figure 3. Basal stem secondary structure in functional pri-miRNA variants. (A) Predicted basal secondary structure and covariation matrices for mir-125a, mir-16-1, and mir-30a. For each pair of positions, joint nucleotide distributions were tabulated from sequencing data and the odds ratio calculated. Favored pairs have positive odds ratios and are colored red, whereas disfavored pairs have negative odds ratios and are colored blue, with color intensity indicating magnitudes, according to the key (left). (B) Relative cleavage of variants with different stem lengths. The number of contiguous Watson–Crick pairs was counted, and the relative cleavage calculated, normalized to the 8 bp stem. (C) Enrichment for unstructured nucleotides flanking the basal stem. Predicted folds of variant sequences were generated, and the subset of sequences with wild-type basal stem pairing were classified based on the distance to the nearest structured consecutive nucleotides upstream of position –13 and the nearest structured consecutive nucleotides downstream of position +11. Enrichment (red) and depletion (blue) of different unstructured lengths for the selected variants are colored according the key (left). Black indicates that the sequencing data were insufficient to calculate enrichment values. (D) Relative cleavage of variants with numbers of total unstructured nucleotides flanking the basal stem. Unstructured lengths upstream and downstream calculated in (C) were summed, and the relative cleavage calculated, normalized to zero unstructured nucleotides. 73 4 4 2 2 2 2 1 1 1 1 0.5 0.5 0.5 0.5 0 4 –15 –4 –13 –11 5p Position –17 0.6 0 T G T T G A C A –4 –15 –13 –11 5p Position C hsa-mir-223 chrX 65,238,719–65,238,726 (+) 4 0 C A A T G T C A A –17 C –15 –13 –11 5p Position G C C T G C A G T –4 –17 –15 –13 –11 5p Position U 0.5 0.4 Frequency –17 G(–13) only hsa-mir-30a chr6 72,113,329–72,113,336 (–) 0 T G T T G C C A –4 No (–14) motif hsa-mir-16-1 chr13 50,623,097–50,623,104 (–) 4 0.3 0.2 0.1 0.0 –19 –17 –15 –13 –11 –9 Position D Human UG Position 24% miRNAs with positioned UG PhyloP vertebrate conservation 4 UG(–14) UG(–14) G(–13) only U(–14) only hsa-mir-125a chr19 52,196,503–52,196,510 (+) 16% 8% 0% H. sapiens * D. rerio * C. intestinalis * A. gambiae D. melanogaster D. pulex C. elegans C. briggsae P. pacificus C. teleta * L. gigantea S. mediterranea N. vectensis Chordates Arthropods Nematodes Ecdysozoans B UG(–14) 4 U(–14) only 4 G(–13) only 8 U(–14) only 8 Timepoint 2 No (–14) motif 8 Timepoint 1 8 UG(–14) hsa-mir-223 16 G(–13) only hsa-mir-30a 16 No (–14) motif Relative cleavage hsa-mir-16-1 16 U(–14) only hsa-mir-125a 16 No (–14) motif A Lophotrochozoans –22 –20 –18 –16 –14 –12 –10 –8 Position –6 –4 –2 Drosha cleavage site positions –15 and –14, respectively. The –14 preference was also observed among variants of mir-125a selected for Microprocessor binding rather than for Microprocessor cleavage (Figure S4B), which indicated that this preference was due at least in part to increased binding affinity to the Microprocessor. We refer to this dinucleotide motif as the basal UG. The basal UG was conserved in vertebrate orthologs of mir-16-1 and mir-30a (Figure 4B). The motif was also enriched in other mammalian pri-miRNAs, as illustrated by the sequence composition of human pri-miRNAs conserved to mouse, which show clear preferences for U at position –14 and G at position –13 (Figure 4C). Enrichment was also observed in primiRNAs of zebrafish (D. rerio) and sea squirt (C. intestinalis) but only sporadically in more distantly related lineages, suggesting that recognition of the UG motif emerged in a common ancestor of the chordates (Figure 4C). The broadly conserved CNNC motif enhances processing Examination of nucleotides preferred in the 3′ flanking sequence revealed a strong preference for a pair of C residues, separated by two intervening nucleotides, located 17–18 nt downstream of the Drosha cleavage site in mir-16-1, mir-30a, and mir-223 (Figure 2D). The two C residues of this CNNC motif (in which N signifies any nucleotide) seemed to act synergistically, in that variants that retained neither C residue were not disfavored much more than those that retained one (Figure 5A). As expected, the C residues enriched in the active variants were also conserved in vertebrate orthologs of these three pri-miRNAs (Figure 5B). Figure 4. Enrichment and conservation of the basal UG motif. (A) Relative cleavage of variants with a full UG motif, a partial motif, and no motif. Relative cleavage values were normalized to that of variants with no motif. (B) PhyloP conservation across 30 vertebrate species in the region of the basal UG motif (red letters) for the four selected miRNAs. Bars extending beyond the scale of the graph are truncated (red). Nucleotides predicted to be paired in the wildtype basal stem are shaded. (C) Frequencies of A, C, G, and U (green, cyan, black, and red, respectively) at the indicated positions of human pri-miRNAs conserved to mouse. Analysis was of 202 pri-miRNAs, each representing a unique miRNA paralog family (Table S2). (D) Enrichment for the UG dinucleotide in the pri-miRNAs of representative animals with sequenced genomes (Table S2). For each species, pri-miRNA sequences were aligned according to the predicted Drosha cleavage site, and upstream UG occurrences tabulated. Species with a statistically –5 significant enrichment at position –14 are indicated (asterisks, empirical p-value < 10 ). 75 2 1 1 1 1 0.5 0.5 0.5 0.5 C Signal / background 0 15 17 19 21 3p Position -4 23 Human miRNAs (conserved to mouse) NNNNN NNNN NNN NN 15 17 19 21 3p Position 6 5 5 4 3 2 1 4 7 10 13 16 19 22 25 28 Position -4 23 4 1 0 A C T C T A C A G D. melanogaster miRNAs CNNC 2 0 4 4 A C C A C A C A C 15 17 19 21 3p Position -4 23 C. elegans miRNAs 7 10 13 16 19 22 25 28 Position 15 17 19 21 3p Position 23 S. mediterranea miRNAs 12 10 6 0 T A C C A G C T C 14 8 0 D 0 G A C T T C A A G 2 1 4 4 3 1 1 chrX 65,238,816–65,238,824 (+) 4 2 1 4 7 10 13 16 19 22 25 28 Position 0 1 4 7 10 13 16 19 22 25 28 Position Human CNNC window H. sapiens D. rerio C. intestinalis A. gambiae D. melanogaster D. pulex C. elegans C. briggsae P. pacificus C. teleta L. gigantea S. mediterranea N. vectensis * * * * * * Chordates Arthropods Nematodes * * * 1 3 Drosha cleavage site Lophotrochozoans 5 7 9 11 13 15 Position 17 19 21 23 25 27 29 miRNAs with positioned CNNC PhyloP Vertebrate Conservation -4 chr6 72,113,234–72,113,242 (–) chr13 50,623,097–50,623,105 (–) 4 0 C(18) only CNNC C(19) only C(16) only Neither C Relative cleavage chr19 52,196,597–52,196,605 (+) CNNC 2 C(21) Only 2 CNNC 2 C(20) Only 4 C(17) Only 4 Neither C 4 CNNC 4 C(21) only 8 Neither C 8 Timepoint 2 B 3 hsa-mir-223 8 8 4 hsa-mir-30a C(18) Only hsa-mir-16-1 Timepoint 1 Neither C hsa-mir-125a Ecdysozoans A 30% 20% 10% 0% The mir-125a pri-miRNA also had four C residues in the vicinity (positions 16–21), which gave rise to a CNNC at position 16 and the possibility of creating a CNNC at positions 17 or 18 (by changing A20 to a C or changing A18 to a C, respectively). However, neither C of the CNNC at position 16 was preferred in the selection, nor were either of the single-nucleotide changes that could create a CNNC, and the position 16 CNNC was not conserved in vertebrate orthologs (Figure 2D, Figure 5A and 5B). These results indicate that unidentified sequence features present in mir-16-1, mir-30a, and mir-223 but absent in the mir-125a pri-miRNA are required for the CNNC to exert its effect in increasing Microprocessor cleavage efficiency. For the three pri-miRNAs in which the CNNC motif was effective, its position fell in a small window 17–18 nt downstream of the Drosha cleavage site. In variants in which neither wild-type C was present, an alternative CNNC was strongly preferred one or two nucleotide registers downstream, which further indicated that a CNNC motif within a small range of positions can contribute to pri-miRNA recognition (Figure S5A). Analyses of the 3′ regions of conserved human pri-miRNAs revealed that of the 64 possible dinucleotide motifs with 0–3 intervening nucleotides, CNNC was most highly enriched (Figure 5C). Moreover, enrichment was limited a small range of positions 16–18 nt downstream of the Drosha cleavage site, peaking at positions 17 and 18, which matched the positions of the motifs originally found within mir-16-1, mir-30a, and mir-223. These results suggest that the CNNC motif enhances recognition and cleavage of many human pri-miRNAs. Analyses of miRNAs in non-mammalian species indicated strong, position-specific Figure 5. Enrichment and conservation of the downstream CNNC motif. (A) Relative cleavage of variants with a full CNNC motif, a partial motif, and no motif. Relative cleavage values were normalized to that of variants with no motif. (B) PhyloP conservation across 30 vertebrate species in the region of the downstream CNNC motif (blue letters) for the four selected miRNAs. Bars extending beyond the scale of the graph are truncated (red). (C) Comparison of CNNC enrichment to that of 63 other spaced dinucleotide motifs. Pri-miRNAs sequences from each species (Table S2) were aligned according to the predicted Drosha cleavage site, and the occurrences of each spaced dinucleotide motif tabulated. Occurrences were normalized to those expected by chance, based on the nucleotide composition downstream of primiRNAs in each species. (D) Enrichment of the CNNC motif in the pri-miRNAs of representative bilaterian animals (Table S2). For each species, pri-miRNA sequences were aligned as in (C) and downstream CNNC occurrences tabulated. Species with a statistically significant enrichment at positions 16, 17, or 18 are marked –4 with an asterisk (empirical p-value < 10 ). 77 A Linear pri-miRNA substrate (pool of variants) ppp Drosha DGCR8 Functional variants ppp Nonfunctional variants Library for single-end sequencing 0.4 0.2 0 −0.2 A –0.69 –0.79 –1.03 0.52 C –0.96 –0.97 0.62 G –0.14 0.63 –0.76 0.28 U −0.6 −0.8 A Pair 18 0.49 0.19 –0.76 –0.64 Pair 17 −0.4 Position P43 C G U 0.30 Position P42 C G U A –0.93 –1.29 –0.72 0.22 C –0.98 –1.58 1.03 –1.00 G –0.7 U 0.33 –1.02 –0.2 –0.62 0.55 –0.4 –0.25 A Position P41 C G U A –0.88 –0.54 –0.49 0.51 C –1.03 –1.09 0.45 –1.05 G –0.62 0.88 –0.03 –0.28 U 0.87 –0.67 –0.29 –0.82 Pair 19 A A –1.3 –1.38 –1.77 0.26 –0.9 –1.33 0.31 –0.91 G –1.27 19 21 23 25 Apical stem pairs Pair 20 Position P40 C G U C U 17 19 21 23 25 Apical stem pairs Position P22 0.6 17 hsa-mir-223 64 32 16 8 4 2 1 0.5 0.25 0.125 1.4 –1.16 –0.37 Position P39 C G U A –1.17 –0.93 –1.23 C –0.63 –0.88 0.81 –1.02 G –0.96 0.39 –0.73 –0.23 U 0.98 –0.93 –0.13 –1.04 Pair 21 0.27 –1.32 –0.91 –0.83 A A 0.10 Position P38 C G U A –1.24 –1.35 –1.06 0.12 C –1.11 –1.23 0.47 –0.94 G –0.77 U 1.1 –0.94 –0.08 0.16 –1.01 –0.16 –0.37 −1 D hsa-mir-30a 2 1 Timepoint 2 Mature miRNA P20 P22 P24 P26 P28 P30 0.5 E 4 0 A AG CT G T GAAG chr6 –4 72,113,290–72,113,300 (–) F H. sapiens D. rerio C. intestinalis A. gambii D. melanogaster D. pulex C. elegans C. briggsae P. pacificus C. teleta L. gigantea S. mediterranea N. vectensis Human UGU/GUG window * * * * * * * Position miRNAs with positioned UGU/GUG Timepoint 1 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 G A-U P23 G-C P38 P22 U-A P39 P21 G-C P40 P20 U-A P41 P19 C-G P42 P18 C-G P43 15th Pair A-U A | U | U-G U-A hsa-mir-125a C-G C-G pre-miRNA C U A-U G-C A-U G-U U-G C-G C-G C A P1 U-G C C P62 0.8 A Odds ratio (WT base) A G G Position P18 G Pair 16 1 C 19 21 23 25 Apical stem pairs Position P20 UCC A C A G 17 19 21 23 Apical stem pairs hsa-mir-30a 64 32 16 8 4 2 1 0.5 0.25 0.125 Position P21 15 hsa-mir-16-1 64 32 16 8 4 2 1 0.5 0.25 0.125 Position P23 hsa-mir-125a Position P19 C 64 32 16 8 4 2 1 0.5 0.25 0.125 Timepoint 2 Odds ratio (log2) Relative cleavage Timepoint 1 PhyloP vertebrate conservation B 20% 15% 10% 5% 0% enrichment of the CNNC motif in chordates, arthropods and lophotrochozoans (Figure 5D). Indeed the CNNC motif was the most enriched dinucleotide motif in the downstream region of both Drosophila and planarian pri-miRNAs (Figure 5C). Positional enrichment of CNNC was not observed in sea anemone (Nematostella vectensis), suggesting that usage of motif for primiRNA recognition emerged after the divergence of bilaterians, around the time of the evolution of the core bilaterian miRNAs (Sempere et al., 2006; Grimson et al., 2008). Interestingly, enrichment was also absent in nematodes (Figure 5C and D), suggesting an isolated loss of this mode of recognition in the nematode but not the arthropod branch of the ecdysozoans. The contributions of basal sequence and structure motifs were confirmed in HEK293T cells (Figure S5C). Mutation of the basal UG and CNNC motifs each reduced accumulation of mature miR-30a; mutation of both together reduced accumulation ~8-fold relative to wild type. Loop and apical stem elements can enhance processing In addition to the basal stem and flanking regions, another potential location for features required for processing is in the pri-miRNA loop and apical stem. Indeed, this part of the premiRNA has been the region most intensively studied as potentially harboring determinants of pri-miRNA recognition. Point mutations in this region abolish cleavage (Zeng et al., 2005; Gottwein et al., 2006) and this region contains binding sites for proteins reported to modulate Figure 6. Identification of the apical pairing features and the UGUG motif. (A) Schematic of the in vitro selection for functional pri-miRNA variants with partially-randomized sequences in the apical stem and terminal loop. Linear pri-miRNA variants substrates were incubated in whole-cell lysate from HEK293T cells overexpressing Drosha and DGCR8. Cleaved pre-miRNA variants were gel-purified, reverse transcribed, and amplified for high-throughput sequencing. (B) Relative cleavage of variants with different apical stem lengths. The number of contiguous Watson– Crick pairs was counted and the relative cleavage calculated, normalized to that of the 15 bp stem. (C) Predicted secondary structure and covariation matrices for the apical stem of mir-125a. Otherwise, as in Figure 3A. (D) Relative cleavage of variants with the apical UGUG motif beginning at the indicated positions, normalized to variants without the motif. Nucleotides of the mature miRNA are shaded in yellow. (E) Conservation of the region centered on the apical UGUG of mir-30a. Otherwise, as in Figure 4B. (F) Enrichment for UGU or GUG trinucleotides in the terminal loops of metazoan pri-miRNAs (Table S2). For each species, pri-miRNA sequences were aligned according to the predicted Drosha cleavage site and occurrences of loop UGU or GUG trinucleotides tabulated. Species with –5 statistically significant enrichment are marked with an asterisk (empirical p-value < 10 ). 79 pri-miRNA processing (Guil and Caceres, 2007; Michlewski et al., 2008; Viswanathan et al., 2008; Trabucchi et al., 2009). Moreover, the distance from the junction of the terminal loop and apical stem was reported to determine the Microprocessor cleavage site of hsa-mir-30a (Zeng et al., 2005), although the terminal loop is dispensable for processing of hsa-mir-16-1 (Han et al., 2006). To find processing determinants in this region, we partially randomized the loop and apical stem sequences of each of the four pri-miRNAs, incubated each pool of variants with Microprocessor-containing extract, gel-purified the pre-miRNA cleavage products, and prepared them for high-throughput sequencing (Figure 6A). Comparison to sequences of the starting pools showed that pairing at the apical portion of the stem contributed to pri-miRNA recognition and processing for some miRNAs, although the preferred structures differed for different primiRNAs, as might have been suspected based on the different conclusions drawn previously from studies of different pri-miRNAs (Zeng et al., 2005; Han et al., 2006). For mir-125a, 22 bp above the 5p Drosha cleavage site was strongly preferred; longer stems were tolerated, whereas shorter stems were disfavored (Figure 5B). Watson–Crick pairing throughout the apical stem was supported by analysis of covariation (Figure 6C). A 22-pair apical stem was also preferred, albeit more weakly, in mir-30a (Figure 6B, Figure S6B). By contrast, no preference for apical pairing was observed in the stems of mir-16-1 and mir-223 (Figure 6B, Figure S6C). Indeed, lengthening of the mir-16-1 apical stem at the expense of loop size was detrimental (Figure 5B), which was consistent with a previous report (Zhang and Zeng, 2010). Because several loop-binding protein regulators of miRNA processing have been reported, we looked for evidence of primary-sequence motifs in the loops. Overall, enrichment was weaker than that observed for flanking residues, particularly for mir-16-1, which showed almost no primary-sequence enrichment throughout the variable region (Figure S6A). The best candidate for a loop-binding motif was observed only in mir-30a, in which the wild-type UGUG at positions P24–27 was preferred (Figure S6A). This motif overlapped a region of vertebrate conservation that included the last base of the most commonly sequenced isoform of mature miR-30a (Figure 6D). Human and zebrafish miRNAs were enriched in UGU or GUG in this region of the loop (empirical p < 10-5 for each species), as were the arthropods and one of three lophotrochozoans examined (empirical p < 10-5 for each) (Figure 6E). However, the lack of 80 enrichment in several other representative species raises the question of whether the usage of this motif arose independently in multiple lineages or was ancestral and lost multiple times. Rescue of C. elegans miRNA expression in human cells The primary-sequence motifs found in this study are absent in nematode clade, either because an ancestral mode of recognition was lost (e.g., downstream CNNC), or because the use of a particular motif is an innovation more recent than the divergence of the vertebrate lineage from nematodes (e.g., basal UG). Using our newly acquired knowledge of pri-miRNA recognition, we tested whether missing primary-sequence motifs might account for the failure of C. elegans pri-miRNAs to be processed in human cells. We focused on the basal UG and the flanking CNNC motifs because these were implicated in three of the four human pri-miRNAs analyzed in detail and thus seemed most likely to function in a variety of pri-miRNA contexts. These motifs were systematically added to cel-mir-44 in the context of the bicistronic vector, after first disrupting the predicted pairing between positions –14 and +12 and substituting the G:C pair at positions –13 and +11 (Figure 7A, construct mir44.1). These changes, which were expected to simultaneously enhance processing by shortening the basal stem to its optimal length and inhibit processing by replacing the fortuitous G at position –13, had a marginal net effect on production of mature miR-44 in human HEK293T cells (Figure 7A). Adding a basal UG (construct mir44.3) enhanced production of mature miR-44 by 5-fold (8-fold over the wildtype), primarily from restoring the G at –13 (Figure 7A). Adding a CNNC 17 nt downstream of the cleavage site (mir44.4) enhanced production another 8-fold, thereby yielding a 64-fold increase over wildtype (Figure 7A). Likewise, converting the wild-type, asymmetrically bulged stem of cel-mir-50 to a regular, 11-pair stem and adding the basal UG and CNNC motifs enhanced expression of mature miR-50 by 31-fold (Figure S7A), while adding the basal UG and CNNC motifs to cel-mir-40 enhanced expression of mature miR-40 by 5-fold (Figure S7B). We conclude that primary-sequence motifs discovered in this study enable human cells to distinguish pri-miRNAs hairpins from other hairpins and that the absence of these motifs in C. elegans primiRNAs helps to explain why human cells do not regard these transcripts as pri-miRNAs. 81 A mir44.wt UGAAA- Query pri-miRNA ? ? Gppp U GU --- AA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC \ CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C CGU C A -AC CA CA UUUUGA hsa-mir-1-1 TK pA mir44.1 UG(–14) CNNC WT -14 Mismatch -14 Mismatch -14 Mismatch -14 Mismatch AG AC CG UG UG None None None None CNNC(+17) 4 cel-miR-44-3p hsa-miR-1 128 mir44.2 UGAAA- C AAU U - GU -- -- AA AG GAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC | UC CUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C UUUUGA U CGU C A -AC CA CA mir44.3 UGAAA- U AAU U - GU -- -- 64 miR-44-3p expression Basal Stem 3 20- A AAU U GU --- AA AG CAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC | UC GUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C UUUUGA C CGU C A -AC CA CA mir44.wt mir44.1 mir44.2 mir44.3 mir44.4 2 30- UGAAA- Construct wt 1 70605040- cel-miR-44-3p probe binding site AAAAA mir-44 mutant CMV Promoter AAU AGAGAA UCUCUU Control pri-miRNA AA AG GAA GGCCAA CUGGAUGUG CUC UGGUCAUA GACG UC | UC CUU CCGGUU GACUUACAC GAG AUCAGUAU UUGU AG C UUUUGA U CGU C A -AC CA CA 32 16 8 4 2 1 0.5 wt 1 2 3 4 B G UG Apical stem structure –1 5p Arm +1 3p Arm Basal stem structure CNNC Other loop positions Other loop positions Apical stem 0 Basal stem 0 Loop GUG 0 Apical stem 0.1 0 CNNC 0.1 Other basal positions 0.1 UG 0.2 0.1 Basal stem 0.2 UG 0.2 CNNC Other basal positions apical stem Other loop positions 0.3 0.2 Basal stem 0.4 0.3 Other loop positions 0.4 0.3 Apical stem 0.4 0.3 UG 0.4 Other basal positions hsa-mir-223 0.5 CNNC hsa-mir-30a 0.5 Other basal positions hsa-mir-16-1 0.5 UG hsa-mir-125a 0.5 Basal stem C Average information (bits/nt) UG D UG(–14) 1.3% 17% 7.4% 9.9% 33% 4.5% 2.5% 5.0% 0.6% 0.1% Loop GUG(P22–P24) CNNC(16–17) No motif E 2.2% 76% 79% 8.2% 0.7% 6.6% 4.7% 12% 9.3% 9.9% 21% Human pri-miRNAs 65% Chance 5.6% 1.9% 5.6% C. elegans pri-miRNAs 7.9% 0.5% 4.7% Chance Discussion We find that secondary structure is generally inadequate on its own to specify pri-miRNA hairpins: Primary-sequence features, including the basal UG, the CNNC and the apical GUG motifs, also contribute to efficient processing in human cells (Figure 7B). Complicating the story (and perhaps explaining why these primary-sequence features had not been observed earlier), different pri-miRNAs differentially benefit from the different motifs (Figure 7C). Among human pri-miRNAs, these motifs were nonetheless highly enriched over chance expectation, with 79% of the conserved human miRNAs containing at least one of the three motifs (Figure 7D). The motifs were not enriched in C. elegans pri-miRNAs (Figure 7E), and when added to the C. elegans pri-miRNAs, the motifs conferred more efficient processing in mammalian cells (Figure 7A and Figure S7). These experiments that added mammalian features to C. elegans miRNAs also showed the benefit of disrupting pairing normally present at positions –14 and +12 of the C. elegans miRNAs (Figure 7A and Figure S7). The presence of pairing that is inhibitory to mammalian processing suggests that measurement from the base of the helix might also differ in nematodes. We conclude that despite the many broadly conserved features of miRNAs, some Figure 7. Structural and primary-sequence features important for human pri-miRNA processing. (A) Processing enhancement from addition of human pri-miRNA motifs to C. elegans mir-44. Changes that sequentially introduce the listed features were incorporated into mir-44 within the bicistronic expression vector (left). Secondary structures are shown for mutations predicted to affect the wildtype basal stem (middle) with the annotated Drosha cleavage sites (purple arrowheads). After transfection into HEK293T cells, accumulation of miR-44-3p was assessed on an RNA blot, normalizing to the expression of the hsa-miR-1 control, and increased miR-44-3p expression is plotted (geometric mean ± standard error, n = 3, right). (B) Summary of human pri-miRNA recognition determinants. (C) Contributions of individual motifs to in vitro processing. For each pri-miRNA, average information content per nucleotide is plotted for the indicated features and positions. (D) Enrichment of primary-sequence motifs in human pri-miRNAs conserved to mouse (Table S2). Human pri-miRNAs were classified based on whether they had the basal UG, an apical GUG or UGU, or the flanking CNNC motif (left). Expectations by chance (right) were estimated based on the nucleotide composition of human pri-miRNAs upstream of the Drosha cleavage site, in the premiRNA, and downstream of the cleavage site for the basal UG, apical GUG or UGU, and flanking CNNC motifs, respectively. (E) A search for human primary-sequence motifs in C. elegans pri-miRNAs conserved in other nematodes (Table S2). Pri-miRNA sequences were analyzed as in (D); the smaller Venn diagrams reflect the smaller number of analyzed miRNAs. 83 primary-sequence features and some secondary-structure features differ in mammals and nematodes, which implies that important aspects of pri-miRNA biogenesis differ in different metazoan lineages. About a fifth of human pri-miRNAs lack all three newly identified primary-sequence determinants (Figure 7D). These are attractive subjects for further study, in that the combinatorial approach implemented here presumably would identify additional unique determinants used by these pri-miRNAs. Sequence and structural determinants probably also exist at the Microprocessor cleavage site and the middle of pri-miRNA stem, regions that were inaccessible to our approach as implemented. Indeed, point mutations that disrupt pairing in the middle of the stem dramatically impair processing (Gottwein et al., 2006; Duan et al., 2007; Jazdzewski et al., 2008; Sun et al., 2009), and although the cleavage site has not been directly implicated in processing in human miRNAs, the Drosha cleavage sites of C. elegans miRNAs are enriched for symmetric internal loops, which presumably reflect preferences at the level of pri-miRNA processing, nuclear export, dicing, or loading (Warf et al., 2011). Also hinting at the possibility of additional primary-sequence preferences within the stem are results from bacterial RNase III, which avoids specific base-pair identities in the “proximal box” and the “distal box” (Zhang and Nicholson, 1997), and fungal Rnt1 and Pac1, which are also influenced by similarly positioned motifs (Lamontagne and Elela, 2004). The proximal box is adjacent to the cleavage site, and the distal box is 8 bp away from the cleavage site, but nonetheless in a region not interrogated in our experiments. Although more needs to be learned about the recognition of pri-miRNAs for processing, the emerging picture is that of a modular phenomenon in which each module contributes modestly to overall discrimination, and different pri-miRNAs depend on any individual module to varying degrees. Our results quantify the relative importance of each module for each primiRNA (Figure 7C). Pairing within the basal stem was crucial, as expected from previous analyses (Lim et al., 2003b; Han et al., 2006). In addition, all four miRNAs made use of the basal UG motif, which provided as much or more information content per nucleotide as the basal stem nucleotides. For the three miRNAs that used a CNNC motif, the motif information content per nucleotide was comparable to that of the basal stem nucleotides. Compared to these motifs, other flanking nucleotides contributed very little to the selection information content. 84 Apical and terminal loop elements were less important than the basal motifs (Figure 7C). We detected significant contributions only in pri-mir-125a, in which the apical stem nucleotides were as important as the basal stem nucleotides, and in pri-mir-30a, in which the loop UGUG motif contributed some information, albeit less than any of the three basal motifs. Together, both basal and apical motifs described here explained 61–78% of the nucleotide enrichment observed in the selected sequences. The remaining information content was diffusely distributed among the other partially-randomized positions. Although some of this remaining enrichment could have reflected small beneficial contributions of flanking bases, we suspect that most reflected avoidance of detrimental alternative structures. A better understanding of features important for miRNA biogenesis will aid in interpreting human mutations that affect mature miRNA levels. For example, loss of mir-16-1 expression associated with chronic lymphocytic leukemia (CLL) is typically due to deletions spanning the intron that contains both hsa-mir-15a and hsa-mir-16-1 (Calin et al., 2002). However, in a study of 75 CLL patients, two had tumors that retain the pri-miRNA hairpins and instead carry a germline C>T single-nucleotide polymorphism (SNP) downstream of the mir-161 hairpin (Calin et al., 2005). This SNP lowers overexpression of miR-16-1 in HEK293 cells, and in both patients heterozygosity for the SNP was lost in the tumor, which suggests that it was a driver mutation (Calin et al., 2005). This SNP corresponds to the C at +18, the first C in the mir-16-1 CNNC motif, which explains why this mutation flanking the hairpin lowers miR-16 accumulation and leads to CLL: it affects pri-miRNA processing by disrupting the mir-16-1 CNNC motif. Discovery of additional motifs for pri-miRNA recognition and processing and identification of proteins that recognize these motifs may lead to improved diagnostic and therapeutic tools in cancer and other diseases in which miRNAs are dysregulated. Experimental Procedures Ectopic pri-miRNA expression in HEK293 cells and S2 cells A genomic fragment corresponding to the human mir-1-1 hairpin and flanking sequences was amplified and cloned into both pcDNA3.2/V5-DEST (Invitrogen) and pMT-DEST (Invitrogen) expression plasmids. Query pri-miRNA sequences were cloned into these plasmids using the Gateway system (Invitrogen), such that they were transcriptionally fused upstream of 85 mir-1-1. Expression plasmids and pMAX-GFP were co-transfected into HEK293 cells using Lipofectamine 2000 (Invitrogen) and co-transfected into S2 cells using Cellfectin (Invitrogen) according to manufacturer’s instructions. After 36–48 h, total RNA was collected by addition of Tri-Reagent (Ambion) according to manufacturer’s instructions. RNA blots for detecting mature and pre-miRNAs were as described. Ribonuclease protection assays were performed with the RPA III kit (Invitrogen) according to manufacturer’s instructions. For detection of expression by sequencing, total RNA from individual transfections was combined and libraries for small-RNA sequencing prepared as described (Chiang et al., 2010). Sequencing reads were mapped to a miRNA hairpin collection composed of the miRBaseannotated hairpins of miRNAs endogenously expressed in the cell line and the miRBaseannotated hairpins of the transfected miRNAs. Reads were included if they perfectly matched a hairpin in this library and excluded if they matched more than one hairpin corresponding to a transfected miRNA. Read counts were normalized to the total reads matching a set of endogenous hairpins that had no transfected counterparts. For each expressed pri-miRNA hairpin, number of reads reported is the number obtained after subtracting the number observed in a normalized, mock-transfected control library. Whole-cell lysate with overexpressed Microprocessor complex Microprocessor lysates were prepared as described (Lee and Kim, 2007), with minor modifications. HEK293T cells were transfected with a mixture of pCK-Drosha-FLAG (Lee and Kim, 2007) pFLAG-HA-DGCR8 (Landthaler et al., 2004), and a transfection-control plasmid pMAX-GFP (Amaxa) using Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instructions. For catalytically-inactive Microprocessor lysates, pCK-DroshaTN-FLAG replaced the wild-type Drosha plasmid (Han et al., 2009). After 72 h, cells were harvested by rinsing the monolayer in phosphate buffered saline (PBS, 137 mM NaCl, 2.7 mM KCl, 1.5 mM KH2PO4, 8 mM Na2HP04, [pH 7.4]). Cells were pelleted, resuspended in sonication buffer (100 mM KCl, 0.2 mM EDTA, 20 mM Tris-Cl pH 8.0, and 0.7 µl/ml 2-mercaptoethanol) supplemented with mini-EDTA Free Protease Inhibitor tablets (Roche), and sonicated. The sonication lysate was cleared by centrifugation and cell lysis was confirmed by the liberation of GFP into the supernatant. The supernatant was distributed into single-use aliquots, and stored in liquidnitrogen vapor phase. 86 Competitive binding and cleavage assays The competitive binding assay was based on that of Bartel et al. (1991). T7-transcribed ~200 nt pri-miRNA substrates were gel-purified, treated with calf intestinal phosphatase (NEB), extracted in Tri-Reagent (Invitrogen), and 5′ end-labeled by phosphorylation using T4 Polynucleotide Kinase (NEB) and γ-[32P]-ATP. Reference substrates were the same, except for a 10–25 nt difference in length, which enabled separation on denaturing gels. Complexes containing Drosha-TN and DGCR8 were immunopurified as described (Lee and Kim, 2007; Han et al., 2009). Competitor and reference RNAs were mixed and incubated with immunoprecipitated Drosha-TN and DGCR8 for 15-30 min [final concentrations, 250 nM each RNA, 100 mM KCl, 1 mM MgCl2, 0.2 mM EDTA, 20 mM Tris-Cl (pH 8.0), 0.7 µl/ml 2mercaptoethanol and 300 ng/µl yeast total RNA (Ambion)]. RNA-protein complexes were filtered on Immobilon-NC nitrocellulose filter discs (Millipore), washed with at least 10 reaction volumes of sonication buffer. RNA was eluted from the membrane by incubating in elution buffer (300 mM NaCl, 8M urea, and 25 mM EDTA) for 10 min at 85ºC, ethanol precipitated and resolved on a denaturing 5% polyacrylamide gel. For competitive cleavage, 5′ end-labeled query and reference pri-miRNA substrates were mixed and incubated in whole-cell lysate from HEK293T cells overexpressing Drosha and DGCR8 [final concentrations, 50 nM each RNA, 100 mM KCl, 1 mM MgCl2, 0.2 mM EDTA, 20 mM Tris-Cl (pH 8.0), 0.7 µl/ml 2-mercaptoethanol, 300 ng/µl yeast total RNA, 10 nM Microprocessor complex (concentration estimated exploiting the single-turnover behavior of the Microprocessor when cleaving linear pri-miR-125a)]. After incubation for 30 seconds at 37ºC the reaction was stopped by addition of Tri-Reagent (Ambion) with mixing. Extracted RNA was precipitated with isopropanol, then resuspended and resolved on a denaturing 5% polyacrylamide gel. Synthesis of pools of pri-miRNA variants Linear pri-miRNA variants for the apical stem and loop selections were transcribed by T7 RNA polymerase from oligonucleotide templates (Table S1) that were synthesized using nucleoside phosphoramidite mixtures such that they varied at specified positions (IDT). The transcription reaction included α-[32P]-UTP to body-label the product. Pri-miRNA pools were gel-purified on urea-polyacrylamide gels before use. 87 For circular pri-miRNA variants, body-labeled linear precursors were transcribed by T7 RNA polymerase from synthetic oligonucleotide templates (Table S1). Each transcript ended with a minimal HDV ribozyme (Schurer et al., 2002) that co-transcriptionally self-cleaved at a defined position to produce a homogenous 3′ end. After treatment with TurboDNAse (Ambion), transcripts were gel-purified, treated with calf intestinal phosphatase (NEB) to remove the 5′ triphosphate, extracted with Tri-Reagent, precipitated with isopropanol, and treated with T4 polynucleotide kinase (NEB) to remove the 2′-3′ cyclic phosphate as described (Guo et al., 2010). After ethanol precipitation, the transcripts were 5′ phosphorylated with T4 polynucleotide kinase, diluted, and ligated using T4 RNA ligase 1 (NEB). Circularized pri-miRNAs were purified from linear species on denaturing polyacrylamide gels. In vitro selection and high-throughput sequencing Pools of variants were incubated in HEK293T whole-cell lysate overexpressing FLAGtagged Drosha and FLAG-HA-tagged DGCR8 (Lee and Kim, 2007). At one or two time points (for circularized pri-miRNA variants, 1 minute for mir-125a, 1 and 4 minutes for mir-16-1, 1 and 5 minutes for mir-30a, and 3 and 15 minutes for mir-223; for apical stem and loop variants, 5 seconds and 15 seconds for mir-125a, 15 seconds and 2 minutes for mir-16-1, 30 seconds and 2 minutes for mir-30a, and 30 seconds and 2 minutes for mir-223) reactions were stopped by addition of Tri-Reagent (Ambion) with mixing, and cleaved products were purified from denaturing gels. Cleavage products of circularized pri-miRNA variants were splint-ligated (Moore, 1999) to oligonucleotide adaptors containing barcode sequences, reverse transcribed, and amplified. To sequence the initial pools, a sample of phosphorylated, uncircularized RNA was reverse transcribed and amplified. Amplicons from the initial pools and the cleaved products were pooled for Illumina paired-end sequencing (75 nt reads per end). Pre-miRNA cleavage products of linear pri-miRNA variants were reverse transcribed, amplified. To sequence the initial pools, a sample of the pool was taken before cleavage, reverse transcribed, and amplified. Amplicons from the initial pools and cleaved pre-miRNA products were pooled for Illumina single-read sequencing (54 nt reads) 88 Sequence analysis High-throughput sequencing reads were divided into individual experimental groups according to constant sequences specific to each pri-miRNA, and further divided based on barcode. After filtering for sequencing quality, discarding any sequences that had an error rate ≥0.1 (phred score ≤10) at any variant position, the sequencing error averaged <0.001 per variant position (average phred score >30). We also discarded sequences in which the length of a partially randomized region differed from that of the wildtype, thereby eliminating many sequences with insertions or deletions. Libraries were collapsed so that sequences that appeared multiple times with the same bar code were considered just once in the analysis (although in retrospect this precaution was not required because there was no group of dominant, multi-copy sequences that would have biased the analysis). To calculate the information content at each position, we used the data from the initial pool sequences and the product sequences to calculate the relative cleavage of each base versus the other three bases. For example, for the A residue, the three relative cleavage values are given below, where P(N) is estimated by the frequency of a base in the initial pool, and P(N|cleavage) is estimated by the frequency of that base in the product sequences. 𝑃(cleavage|𝐶) 𝑃(𝐶|cleavage) 𝑃(𝐶) = � 𝑃(cleavage|𝐴) 𝑃(𝐴|cleavage) 𝑃(𝐴) 𝑃(cleavage|𝐺) 𝑃(𝐺|cleavage) 𝑃(𝐺) = � 𝑃(cleavage|𝐴) 𝑃(𝐴|cleavage) 𝑃(𝐴) 𝑃(cleavage|𝑈) 𝑃(𝑈|cleavage) 𝑃(𝑈) = � 𝑃(cleavage|𝐴) 𝑃(𝐴|cleavage) 𝑃(𝐴) We then used Bayes’ Theorem (Pitman, 1993) to infer the nucleotide composition that would have resulted after selection from a pool of variants in which there was an equal probability of an A, C, G, or U at this position. For example, the formula to infer the frequency of A at a particular position after selection from such a pool was 89 𝑃(cleavage|𝐶) 𝑃inferred A = 𝑃(𝐴|cleavage) = �1 + 𝑃(cleavage|𝐴) + 𝑃(cleavage|𝐺) + 𝑃(cleavage|𝑈) � 𝑃(cleavage|𝐴) 𝑃(cleavage|𝐴) −1 The inferred post-selection distribution was then used to calculate information content scores for each nucleotide at each position. For example, the information content for A at a particular position was calculated as 𝐼𝐴 = 𝑃inferred A × [𝑙𝑜𝑔2 (𝑃inferred A ) + 2] If results from two time points were available, information content values were averaged. For evaluating motifs, we calculated a relative cleavage value based on the frequencies of the motif in the reference and selected pools [P(motifi) and P(motifi)|cleavage), respectively], and the frequencies of a reference motif in the reference and selected pools [P(motifref) and P(motifref)|cleavage), respectively]. Relative cleavage = 𝑃(motif𝑖 |cleavage) 𝑃(motif𝑖 ) � 𝑃(motifref |cleavage) 𝑃(motifref ) We also used an odds ratio score to calculate the enrichment for particular motifs by using the frequency of the motif in the reference and selected pools [P(motifi) and P(motifi)|cleavage), respectively]. Odds ratio = 𝑃(motif𝑖 |cleavage) 𝑃(motif𝑖 ) � 1 − 𝑃(motif𝑖 |cleavage) 1 − 𝑃(motif𝑖 ) If two timepoints were available, the geometric mean of the ratios was reported, unless noted otherwise. To screen for specifically for Watson–Crick pairing between all possible combinations of randomized positions, we used a scoring metric to compare the geometric average of odds ratios for Watson–Crick pairing to that of odds ratios for non-Watson–Crick pairs. 90 Pairing score = � � Watson–Crick 1/4 Odds ratio� −� � non−Watson–Crick 1/12 Odds ratio� Positional enrichments of sequence motifs Enrichment of a motif at a set of positions relative to the cleavage site was computed by generating 100,000 cohorts of miRNAs in which the upstream, downstream and pre-miRNA sequences were independently shuffled. An empirical P-value was computed by comparing the number of miRNAs that contained at least one match to the motif in the window to the number of matches in each of the random cohorts. pri-miRNA collections A list of representative pri-miRNAs used for analyses is provided (Table S2). Coordinates of miRNA loci in miRBase version 17 (Kozomara and Griffiths-Jones, 2011) were used to extract the sequences of the annotated hairpin and 200 genomic bases flanking each side. miRBase hairpin sequences and flanking genomic sequences (20 nt on each side) were folded using RNAFold (Hofacker and Stadler, 2006). The Microprocessor cleavage site was inferred using the mature sequences annotated in miRBase. Only hairpins for which the predicted folding and the annotated mature sequences could be reconciled as a 2 nt 3′ overhang were carried forward for analysis. For hairpins in miRBase-annotated miRNA families, a single representative was chosen to represent the family in each species. For human, D. melanogaster, and C. elegans, the family member with the most conserved pre-miRNA sequence was chosen. For other species, the representative was chosen at random. Whole-genome alignments and phylogenetic trees were obtained from the UCSC genome browser (Fujita et al., 2011). Conservation of a pre-miRNA was defined as the average conservation across the pre-miRNA, and the conservation of a base was defined as the ratio between the total branch length of the species that contained the same base as the reference sequence and the total branch length of the species that had an aligned base at that position. 91 Accession Numbers Sequencing data have been deposited into the Short Read Archive (SRA, accession number SRA051323). Acknowledgements This work was done in collaboration with Igor Ulitsky, who carried out much of the evolutionary conservation analysis. I thank D. Shechner for help with circularized-substrate selections; C. Jan, O. Rissland, D. Weinberg, J. Ruby, J. Nam, and V.N. Kim for valuable discussions; and L. Schoenfeld and J. Lassar for technical assistance This work was supported by NIH grant GM067031 (to D.P.B.). 92 Bibliography and References Cited Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008). Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicerdependent small RNAs. Genes Dev 22, 2773-2785. Bar, M., Wyman, S.K., Fritz, B.R., Qi, J., Garg, K.S., Parkin, R.K., Kroh, E.M., Bendoraite, A., Mitchell, P.S., Nelson, A.M., et al. (2008). MicroRNA discovery and profiling in human embryonic stem cells by deep sequencing of small RNA libraries. Stem Cells 26, 2496-2505. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281297. Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233. Bartel, D.P., Zapp, M.L., Green, M.R., and Szostak, J.W. (1991). HIV-1 Rev regulation involves recognition of non-Watson-Crick base pairs in viral RNA. Cell 67, 529-536. Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., Einat, P., Einav, U., Meiri, E., et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37, 766-770. Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J., Verloop, R., van de Wetering, M., Guryev, V., Takada, S., et al. (2006). Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 16, 1289-1298. Bohnsack, M.T., Czaplinski, K., and Gorlich, D. (2004). Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. RNA 10, 185-191. Breaker, R.R., Banerji, A., and Joyce, G.F. (1994). Continuous in vitro evolution of bacteriophage RNA polymerase promoters. Biochemistry 33, 11980-11986. Brummelkamp, T.R., Bernards, R., and Agami, R. (2002). A system for stable expression of short interfering RNAs in mammalian cells. Science 296, 550-553. Calin, G.A., Dumitru, C.D., Shimizu, M., Bichi, R., Zupo, S., Noch, E., Aldler, H., Rattan, S., Keating, M., Rai, K., et al. (2002). Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A 99, 15524-15529. Calin, G.A., Ferracin, M., Cimmino, A., Di Leva, G., Shimizu, M., Wojcik, S.E., Iorio, M.V., Visone, R., Sever, N.I., Fabbri, M., et al. (2005). A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 353, 1793-1801. Cheloufi, S., Dos Santos, C.O., Chong, M.M., and Hannon, G.J. (2010). A dicer-independent miRNA biogenesis pathway that requires Ago catalysis. Nature 465, 584-589. Chen, C.Z., Li, L., Lodish, H.F., and Bartel, D.P. (2004). MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-86. Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24, 992-1009. Chung, W.J., Agius, P., Westholm, J.O., Chen, M., Okamura, K., Robine, N., Leslie, C.S., and Lai, E.C. (2011). Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans. Genome Res 21, 286-300. Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328, 1694-1698. 93 Denli, A.M., Tops, B.B., Plasterk, R.H., Ketting, R.F., and Hannon, G.J. (2004). Processing of primary microRNAs by the Microprocessor complex. Nature 432, 231-235. Duan, R., Pak, C., and Jin, P. (2007). Single nucleotide polymorphism associated with mature miR-125a alters the processing of pri-miRNA. Hum Mol Genet 16, 1124-1131. Ellington, A.D., and Szostak, J.W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822. Fujita, P.A., Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., et al. (2011). The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39, D876-882. Gottwein, E., Cai, X., and Cullen, B.R. (2006). A novel assay for viral microRNA function identifies a single nucleotide polymorphism that affects Drosha processing. J Virol 80, 53215326. Gregory, R.I., Yan, K.P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and Shiekhattar, R. (2004). The Microprocessor complex mediates the genesis of microRNAs. Nature 432, 235-240. Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and evolution of microRNAs and Piwiinteracting RNAs in animals. Nature 455, 1193-1197. Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34. Guil, S., and Caceres, J.F. (2007). The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a. Nat Struct Mol Biol 14, 591-596. Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835-840. Han, B.W., Hung, J.H., Weng, Z., Zamore, P.D., and Ameres, S.L. (2011). The 3′-to-5′ Exoribonuclease Nibbler Shapes the 3′ Ends of MicroRNAs Bound to Drosophila Argonaute1. Curr Biol. Han, J., Lee, Y., Yeom, K.H., Kim, Y.K., Jin, H., and Kim, V.N. (2004). The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18, 3016-3027. Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. (2006). Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887-901. Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.K., Yeom, K.H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. (2009). Posttranscriptional crossregulation between Drosha and DGCR8. Cell 136, 75-84. Hofacker, I.L., and Stadler, P.F. (2006). Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics 22, 1172-1176. Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D. (2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293, 834-838. Jazdzewski, K., Murray, E.L., Franssila, K., Jarzab, B., Schoenberg, D.R., and de la Chapelle, A. (2008). Common SNP in pre-miR-146a decreases mature miR expression and predisposes to papillary thyroid carcinoma. Proc Natl Acad Sci U S A 105, 7269-7274. 94 Jolma, A., Kivioja, T., Toivonen, J., Cheng, L., Wei, G., Enge, M., Taipale, M., Vaquerizas, J.M., Yan, J., Sillanpaa, M.J., et al. (2010). Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res 20, 861873. Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15, 2654-2659. Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs and miRNAs exhibit strand bias. Cell 115, 209-216. Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39, D152-157. Lamontagne, B., and Elela, S.A. (2004). Evaluation of the RNA determinants for bacterial and yeast RNase III binding and cleavage. J Biol Chem 279, 2231-2241. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414. Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical region gene 8 and Its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol 14, 2162-2167. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., et al. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415419. Lee, Y., Jeon, K., Lee, J.T., Kim, S., and Kim, V.N. (2002). MicroRNA maturation: stepwise processing and subcellular localization. EMBO J 21, 4663-4670. Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004). MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051-4060. Lee, Y., and Kim, V.N. (2007). In vitro and in vivo assays for the activity of Drosha complex. Methods Enzymol 427, 89-106. Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. (2003a). Vertebrate microRNA genes. Science 299, 1540. Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., and Bartel, D.P. (2003b). The microRNAs of Caenorhabditis elegans. Genes Dev 17, 9911008. Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 is the catalytic engine of mammalian RNAi. Science 305, 1437-1441. Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. (2004). Nuclear export of microRNA precursors. Science 303, 95-98. Macrae, I.J., Zhou, K., Li, F., Repic, A., Brooks, A.N., Cande, W.Z., Adams, P.D., and Doudna, J.A. (2006). Structural basis for double-stranded RNA processing by Dicer. Science 311, 195-198. Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and Tuschl, T. (2004). Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol Cell 15, 185-197. Michlewski, G., Guil, S., Semple, C.A., and Caceres, J.F. (2008). Posttranscriptional regulation of miRNAs harboring conserved terminal loops. Mol Cell 32, 383-393. 95 Moore, M.J. (1999). Joining RNA molecules with T4 DNA ligase. Methods Mol Biol 118, 1119. Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89-100. Paddison, P.J., Caudy, A.A., Bernstein, E., Hannon, G.J., and Conklin, D.S. (2002). Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev 16, 948-958. Pan, T., and Uhlenbeck, O.C. (1992). In vitro selection of RNAs that undergo autolytic cleavage with Pb2+. Biochemistry 31, 3887-3895. Park, J.E., Heo, I., Tian, Y., Simanshu, D.K., Chang, H., Jee, D., Patel, D.J., and Kim, V.N. (2011). Dicer recognizes the 5′ end of RNA for efficient and accurate processing. Nature 475, 201-205. Pitman, J. (1993). Probability (New York, Springer-Verlag). Pitt, J.N., and Ferre-D'Amare, A.R. (2010). Rapid construction of empirical RNA fitness landscapes. Science 330, 376-379. Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83-86. Schurer, H., Lang, K., Schuster, J., and Morl, M. (2002). A universal method to produce in vitro transcripts with homogeneous 3′ ends. Nucleic Acids Res 30, e56. Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208. Sempere, L.F., Cole, C.N., McPeek, M.A., and Peterson, K.J. (2006). The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zool B Mol Dev Evol 306, 575-588. Slattery, M., Riley, T., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J., et al. (2011). Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270-1282. Sun, G., Yan, J., Noltner, K., Feng, J., Li, H., Sarkis, D.A., Sommer, S.S., and Rossi, J.J. (2009). SNPs in human miRNA genes affect biogenesis and function. RNA 15, 1640-1651. Trabucchi, M., Briata, P., Garcia-Mayoral, M., Haase, A.D., Filipowicz, W., Ramos, A., Gherzi, R., and Rosenfeld, M.G. (2009). The RNA-binding protein KSRP promotes the biogenesis of a subset of microRNAs. Nature 459, 1010-1014. Viswanathan, S.R., Daley, G.Q., and Gregory, R.I. (2008). Selective blockade of microRNA processing by Lin28. Science 320, 97-100. Warf, M.B., Johnson, W.E., and Bass, B.L. (2011). Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 17, 563-577. Witten, D., Tibshirani, R., Gu, S.G., Fire, A., and Lui, W.O. (2010). Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biol 8, 58. Yeom, K.H., Lee, Y., Han, J., Suh, M.R., and Kim, V.N. (2006). Characterization of DGCR8/Pasha, the essential cofactor for Drosha in primary miRNA processing. Nucleic Acids Res 34, 4622-4629. Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. (2003). Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17, 3011-3016. 96 Zeng, Y., and Cullen, B.R. (2005). Efficient processing of primary microRNA hairpins by Drosha requires flanking nonstructured RNA sequences. J Biol Chem 280, 27595-27603. Zeng, Y., Yi, R., and Cullen, B.R. (2005). Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J 24, 138-148. Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. (2004). Single processing center models for human Dicer and bacterial RNase III. Cell 118, 57-68. Zhang, K., and Nicholson, A.W. (1997). Regulation of ribonuclease III processing by doublehelical sequence antideterminants. Proc Natl Acad Sci U S A 94, 13437-13441. Zhang, X., and Zeng, Y. (2010). The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 38, 7689-7697. Zykovich, A., Korf, I., and Segal, D.J. (2009). Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res 37, e151. 97 0.0 cel-lsy-6 X cel-mir-50 Query pri-miRNA cel-mir-40 B cel-lin-4 Unprocessed pri-miRNA X Gppp hsa-mir-1-1 11 oc (o n m k1 ly) oc lin k2 -4 ls y6 m ir4 m 0 ir5 m 0 ir2 le 40 t-7 m ir1 m ir2 m ir3 m 4 ir4 m 3/4 ir- 4 4 m 6 ir5 m 9 ir6 m 0 ir1 m 24 ir23 5 ir- m m CMV Promoter Relative pri-miRNA ir1 ir2a m -1 ir2 le 86 t-7 /4 /5 m ir3 m 4 ir9 m 2 ir12 m 5 ir2 m 79 ir2 le 81 t-7 m a ir9 m 2a ir2 m 05 ir1 m 7~ ir- 20 1 m 25 ir- a 1 m 28 ir- -1 1 m 38 ir- -2 1 m 22 ir1 m 33 ir- a14 1 2 m m A Control pri-miRNA hsa-mir-1-1 AAAAA dme miRNAs miR-1 mature miR-1-1 pre U6 snRNA Glu tRNA 2.5 2.0 1.5 1.0 0.5 TK pA cel miRNAs miR-1 mature miR-1-1 pre U6 snRNA Glu tRNA hsa miRNAs Supplemental Materials Supplemental Figures Figure S1. Human, fly, and worm pri-miRNA transcription in human cells, related to Figure 1. (A) Schematic of HEK293 overexpression system and detection of miR-1-1 processing products. HEK293 cells were individually transfected with plasmids bearing a human, D. melanogaster, or C. elegans pri-miRNAs transcriptionally fused to human pri-mir-1-1. Mature miR-1 and pre-miR-1-1 derived from the transcriptional fusion were detected by RNA blot. Results from vectors in which let-7 and mir-1 were the query pri-miRNAs are shown. However, results from these let-7 and mir-1 vectors are not shown in Figure 1A because the corresponding mature miRNAs were indistinguishable from those of other transfected vectors after total RNA was pooled for small-RNA sequencing. (B) Direct detection of C. elegans pri-miRNA transcription. Selected human and C. elegans pri-miRNAs were detected by a ribonuclease protection assay and signals were normalized to that of neomycin phosphotransferase mRNA expressed from the same plasmid. 99 A Basal Stem mir125.wt (Wildtype) mir125.1 mir125.2 UCU GCCAG CUAGG CGGUC GGU CCAACC UGC pre-miRNA CAUGAA UCU GCCAG CUAGG CGGUC GGU CCAACC UGC pre-miRNA CAUGCC UCU GCCAG CUAGG CGGUC GGU CCAACC UGC pre-miRNA UCU GGGCCAG CUAGG CCCGGUC GGU CCAA UGC pre-miRNA mir125.4 CAUGUUA UCU CCAG CUAGG GGUC GGU CCAACCC UGC pre-miRNA mir125.5 CAUGUUAA UCU CAG CUAGG GUC GGU CCAACCCG UGC pre-miRNA mir125.6 CAUGUUAAG UCU AG CUAGG UC GGU CCAACCCGG UGC pre-miRNA mir125.3 B CAUGUU CAUG Control miRNA (WT mir-125a) C Query miRNA *P ? *P ? 1 CMV Promoter 3 4 5 AAAAA mir-125a mutant wt 2 Control miRNA ? Gppp mir-125a mutants 1 2 3 hsa-mir-1-1 4 5 TK pA 6 605040- Competitive Cleavage Prediction from mir-125a selection (1 mM Mg) wt ? 6 30- 1 hsa-miR-125a 20- hsa-miR-1 0.25 0.125 0.0625 0.03125 1 Relative expression Relative Cleavage 0.5 0.5 0.25 wt 1 2 3 4 5 6 Figure S2. Confirmation of hsa-mir-125a selection results in vitro and in HEK293T cells, related to Figure 2. (A) Predicted basal stem secondary structure of mir-125a variants tested in the experiment. (B) Competitive cleavage of individual mir-125a variants, relative to wildtype mir-125a. Variants were mixed with a reference wildtype mir-125a, which was longer at its 5′ end, and incubated in whole-cell lysate from HEK293T cells overexpressing Drosha and DGCR8. Cleavage products were separated on denaturing gels, and the ratio of wildtype and variant products quantified (blue, geometric mean ± standard error, n = 3), together with the relative cleavage inferred from the selection experiment (gray). (C) Evaluation of mir-125a variants in HEK293T cells. Variants were transcriptionally fused to pri-mir-1-1 as in Figure S1A and expressed in HEK293T cells. Accumulation of mature miR-125a was quantified by RNA blot and normalized to the level of mature miR-1 (geometric mean ± standard error, n = 3). 101 A Position 12 C G A 0.29 C –1.14 –1.57 0.99 Pair 12 A 0.1 –0.2 1.65 U –0.32 –0.06 –1.85 –0.16 –0.02 –0.08 –0.64 –0.97 –0.78 A 1.48 Position 13 C G 0.06 0.54 U –0.67 C 1.03 –0.34 –1.01 –1.11 G 0.99 –0.94 –0.04 –1.31 U 1.79 1.61 1.48 Position –12 A –1.82 –1.77 –0.91 0.24 C –1.99 –2.18 0.69 –1.51 G –0.94 1.00 –1.41 0.23 U 0.47 –0.21 0.12 –0.13 hsa-mir-223 Wildtype basal stem –1 C U C-G +1 G-U C-G A C C-G C-G G-C –10 U-A +8 –11 G-C +9 –12 N-N +10 –13 N-N +11 –14 N-N +12 N +13 –15 N N N N N N N N N B hsa-mir-125a 1024 256 256 64 64 16 1 0 0.5 1 0 Threshold score hsa-mir-30a 1024 256 256 64 64 16 16 4 4 1 0 0.5 Threshold score 1 1 0 C –1.25 –0.83 1.62 –0.89 G –0.51 1.74 –0.16 0.78 U 0.86 –0.27 0.35 –0.37 A Position 11 C G U A –1.36 –1.00 –1.21 –0.05 C –1.58 G –1.44 0.75 U –0.48 –1.51 –1.25 –0.81 A –1.10 –0.14 0.68 –1.63 Position 12 C G A –0.59 –0.53 0.47 0.2 U –1.08 C –0.71 G –1.03 –1.11 –0.34 –1.59 U 0.64 –0.70 –0.11 –1.25 0.76 1.82 0.13 Timepoint 2 2 1 9 10 11 12 Basal stem pairs 0.5 1 Threshold score hsa-mir-223 1024 0.8 Timepoint 1 4 1 U –0.15 4 16 Basal stem 4 0.55 8 hsa-mir-16-1 1024 Position 10 C G hsa-mir-223 Alternative pairs 16 0.5 C –0.48 Pair 12 0.73 A A Pair 11 hsa-mir-223 Alternative basal stem Relative cleavage G U Position –14 A U C-G +1 G-U C-G A C C-G C-G G-C –10 U-A +8 –11 G-C +9 | A +10 –12 A-U +11 –13 C-G +12 –14 G-C +13 U U C U C A U C Candidate pairs Position –13 Pair 11 –1 C 0.45 U Position –13 1.93 A Pair 10 Position –14 0.23 Position –11 1.26 Candidate pairs Position –12 U pre-miRNA Pair 9 1 –0.21 –0.21 Odds ratio (log2) –2.23 0.49 Position 9 C G 0.8 1.64 –1.69 –1.19 –0.98 0.6 1.73 –0.9 1.67 0.4 –1.22 –1.08 U 0.2 –1.24 C G –1.32 0 0.69 0.00 −0.2 –2.26 –1.79 –1.39 –1.22 –1.78 0.22 –1.47 pre-miRNA U A C G −0.4 Position 11 C G U –2.44 –0.59 –2.89 0.33 −0.6 A Position 8 C G −0.8 Pair 10 A A −1 Position –10 Pair 8 0.5 Threshold score 1 13 Figure S3. Analysis of mir-223 basal stem structure, related to Figure 3. (A) Wild-type (left) and alternative (right) basal stem structures for hsa-mir-223. In the wild-type basal stem, the A at +10 is predicted to be bulged, whereas some variants are predicted to have an alternative structure in which the nucleotide at +10 is Watson–Crick paired within a contiguous helix. Covariation matrices for both conformations were calculated as in Figure 3A. (B) Relative cleavage of variants with different lengths of the alternative basal stem. Cleavage values were calculated as in Figure 3B and normalized to the 9 bp stem. (C) Screen for Watson–Crick pairs involving any two varied positions. A Watson–Crick-pairing score was calculated for each of the >3000 possible pairs of varied positions in each of the four pri-miRNAs. The number of Watson–Crick candidates is plotted as a function of threshold score, in which a pair is considered a Watson–Crick candidate if its score exceeds the threshold. The number of pairs corresponding to the basal stem is shown (dashed line). 103 A Linear pri-miRNA substrate (pool of variants) ppp DroshaTN DGCR8 ppp ppp Nonfunctional variants Functional variants Nitrocellulose filtration X Library preparation for paired-end sequencing B 1 A 0.8 G hsa-mir-125a C U –1 0.4 0 +1 –14 +12 Invariant residues 0.2 43 45 45 41 39 37 Stem-loop –1 Cleavage selection 0.8 35 33 31 29 27 25 23 21 19 17 15 13 –15 –17 –19 –21 –23 –25 –27 –29 –31 hsa-mir-125a 1 43 1.2 –33 –35 –37 –39 –41 –43 –45 –47 –49 –0.2 –0.4 –9 0.6 +1 +9 –14 +12 0.4 Invariant residues 0.2 0 41 39 37 35 33 31 29 27 25 23 21 19 17 15 13 9 11 –9 –11 –13 –15 –17 –19 –21 –23 –25 –27 –29 –31 –33 –35 –37 –39 –41 –43 –45 –47 –0.4 –49 –0.2 –51 Information content (bits) Stem-loop TN binding selection Round 1 0.6 –51 Information content (bits) 1.2 Figure S4. Selection for Microprocessor-binding variants of hsa-mir-125a, related to Figure 4. (A) Schematic of the in vitro selection. Linear, partially-randomized variants of mir-125a were incubated with immunopurified DGCR8 and catalytically-inactive Drosha (DroshaTN). Bound variants were recovered by nitrocellulose filtration, reverse-transcribed, and amplified for high-throughput sequencing. (B) Information content after one round of selection for Microprocessor binding. Information content was calculated as in Figure 2D. Information content after one round of cleavage selection (Figure 2D) is reproduced here for comparison. The nucleotides varied in the initial pools are shown for each selection (insets, red inner lines). 105 A Timepoint 1 Odds ratio of CNNC Timepoint 2 hsa-mir-16-1 16 16 8 8 4 4 2 2 1 1 0.5 0.5 Odds ratio of CNNC 0.25 hsa-mir-30a Odds ratio of CNNC 14 15 16 17 18 19 20 21 0.25 8 8 4 4 2 2 1 1 0.5 0.5 0.25 hsa-mir-223 Sequences without wildtype CNNC All Sequences 14 15 16 17 18 19 20 21 0.25 8 8 4 4 2 2 1 1 0.5 0.5 0.25 14 15 16 17 18 19 20 21 0.25 NA NA 14 15 16 17 18 19 20 21 NA NA NA NA ? ? Position Control pri-miRNA Gppp mir-30a mutant CMV Promoter Construct Basal Stem mir30.wt mir30.1 mir30.2 mir30.3 mir30.4 mir30.5 mir30.6 mir30.7 WT –12 Paired WT WT WT WT WT 7-pair stem mir30.wt AAAAA hsa-mir-1-1 TK pA UG(–14) CNNC(+17) UG UG AG UG UG UG AG UG CNNC CNNC CNNC CNNU UNNC UNNU UNNU CNNC hsa-miR-30a probe binding site UGUU A A A UC GUGAAG G CAGUG GCG CUGUAAACAUCC GACUGGAAGCU C C GUCAU CGU GACGUUUGUAGG CUGACUUUCGG C GGCU C C C -GUAGACA mir30.1 UGUU A A UC UGUGAAG GCCAGUG GCG CUGUAAACAUCC GACUGGAAGC C CGGUCAU CGU GACGUUUGUAGG CUGACUUUCG C C C -GGUAGACA GGCU mir30.7 UGUUCCGC NA 14 15 16 17 18 19 20 21 A A UC UGUGAAG GUG GCG CUGUAAACAUCC GACUGGAAGC C CAU CGU GACGUUUGUAGG CUGACUUUCG C GGCUCCGU C C -GGUAGACA wt 1 70605040- 2 3 4 5 6 7 30hsa-miR-30a 20hsa-miR-1-1 hsa-miR-30a expression Query pri-miRNA NA 14 15 16 17 18 19 20 21 Position B NA 2 1 0.5 0.25 0.125 0.063 0.031 wt 1 2 3 4 5 6 7 Figure S5. Contribution of the CNNC motif in vitro and in HEK293T cells, related to Figure 5. (A) CNNC odds ratios at alternative positions. Odds ratios were calculated for CNNC dinucleotides starting at the indicated of positions downstream of the Drosha cleavage site. Plotted are odds ratios for all sequences (left) and for sequences that lack both wild-type C residues (right). (B) Contributions of basal features, including the CNNC motif, to the accumulation of hsa-miR-30a in HEK293T cells. The listed variants of hsa-mir-30a were transcriptionally fused to hsa-mir-1-1 (left). Predicted secondary structures for variants with non-wild-type structure are shown (center), with the annotated Drosha cleavage sites (purple arrowheads). The accumulation of miR-30a was quantified by RNA blot, normalized to miR-1 (right, geometric mean ± standard error, n = 3). 107 B Pair 18 P1 0.8 CCA 0.6 G 0.4 A 0.2 A G 0 U G P24 U-G P40 P23 C-G P41 P22 G-C P42 P21 A-U P43 P20 A-U P44 17th Pair G-U G-C U-A C-G A-U G-C C | U | hsa-mir-30a C-G pre-miRNA C-G U-A A-U C-G A-U A-U A-U U-G G-C P1 U-A G C P63 0.4 0.2 0 P44 P42 P40 P38 P36 P34 P32 P30 P28 P26 P24 P22 -0.2 0.8 0.6 0.4 0.2 0 P44 P42 P40 P38 P36 P34 P32 P30 P28 P26 -0.4 P24 -0.2 P22 hsa-mir-30a 1 P20 Information content (bits) 1.2 0.11 C –0.11 –0.1 –0.54 –0.15 G 0.11 –0.05 –0.08 –0.24 U 0 0.27 –0.05 –0.21 A Position P43 C G U Position P21 0.05 –0.31 0.03 C 0.06 –0.29 –0.17 G 0.3 0.23 0.14 0.24 U 0.39 0.5 –0.02 –0.06 A Candidate pairs Position P24 0.2 0 0.5 Threshold score A –0.12 0.22 0.4 C –0.32 –0.54 –0.01 –0.41 0.2 G –0.28 0.15 0.01 0.42 0 U 0.18 –0.14 0.08 –0.41 −0.2 A Position P41 C G U A –0.48 –0.41 –0.7 0.54 C –0.49 –0.66 0.47 –0.61 G –0.04 0.37 -0.06 0.44 U 0.44 –0.29 0.07 –0.13 A A –0.61 −0.6 −0.8 −1 Position P40 C G U –0.5 –0.66 0.04 C –0.42 –0.41 –0.07 –0.66 G –0.17 –0.06 –0.49 –0.11 U 0.03 –0.21 0.32 –0.09 P46 P44 P42 P40 P38 P36 P34 P32 P28 P26 P24 P30 hsa-mir-16-1 Apical Stem 0 0.6 –0.25 -0.2 hsa-mir-125a 256 128 64 32 16 8 4 2 1 0.8 Position P42 C G U –0.41 Pair 21 0.4 -0.4 C 1 −0.4 0.6 P22 Information content (bits) hsa-mir-223 0.16 –0.13 Pair 22 0.8 –0.13 –0.15 1.2 1 –0.04 A Pair 20 Position P22 0.6 Position P44 C G U A Pair 19 Position P23 P42 P40 P38 P36 P34 P32 P30 P28 P26 P24 P22 P20 0.8 P20 Information content (bits) 1 -0.4 A G 1.2 hsa-mir-16-1 C G A U -0.2 -0.4 Position P20 1 P18 hsa-mir-125a Information content (bits) 1.2 A Odds ratio (log2) A 1 128 64 32 16 8 4 2 1 0 0.5 Threshold score hsa-mir-30a 1 128 64 32 16 8 4 2 1 0 0.5 Threshold score hsa-mir-223 1 128 64 32 16 8 4 2 1 0 0.5 Threshold score 1 Figure S6. Primary-sequence and structural elements in the apical stem and terminal loop, related to Figure 6. (A) Enrichment and depletion at variable residues in pri-miRNA variants selected from a pool with varied nucleotide identities in the apical stem and loop. At each varied position (inset, red inner line), information content was calculated for each residue (green, cyan, black, and red for A, C, G, and U, respectively), as in Figure 2D. (B) Apical stem secondary structure of mir-30a. Predicted secondary structure and covariation matrices for were calculated as in Figure 3A. (C) Screen for Watson–Crick pairs involving any to varied positions of the apical stem and loop. A Watson–Crick pairing score was calculated as in Figure S3C for each of the >275 possible pairs of varied positions in each of the four pri-miRNAs. The number of pairs corresponding to the apical stem is shown (dashed line). 109 A Query pri-miRNA ? Control pri-miRNA ? Gppp CMV Promoter mir-50 mutant AAAAA hsa-mir-1-1 TK pA Construct Basal Stem UG(–14) CNNC mir50.wt WT UG None mir50.1 11-pair stem AC None mir50.2 11-pair stem UC None 40- mir50.3 11-pair stem UC CNNC(+19) 30- mir50.4 11-pair stem UC CNNC(+18) mir50.5 11-pair stem UC CNNC(+17) mir50.6 11-pair stem UC CNNC(+16) mir50.7 11-pair stem UC CNNC(+15) mir50.8 11-pair stem UG None mir50.9 11-pair stem UG CNNC(+18) wt 1 3 4 5 6 7 8 9 cel-miR-50 20- CUG C UCU G UUC UAUU CCUG CCCGCCGGCCG UGAUAUGUCUGGUAU UGGGUUU AAC \ GUAA GGAC GGGCGGCCGGC GCUAUGCAGAUUAUA GCCCAAG UUG C UCG U G A C-CGA cel-miR-50 relative expression hsa-miR-1 cel-miR-50 Probe Binding Site mir50.wt 2 706050- 64 32 16 8 4 2 0 0.5 0.25 wt 1 2 3 4 5 6 7 8 9 mir50.2 CUGUAUUCCUC UCU G UUC CCCCGCCGGCCG UGAUAUGUCUGGUAU UGGGUUU AAC \ GGGGCGGCCGGC GCUAUGCAGAUUAUA GCCCAAG UUG C UCGGUAAU--AC A C-CGA Query pri-miRNA ? Control pri-miRNA ? Gppp CMV Promoter mir-40 mutant AAAAA hsa-mir-1-1 TK pA Construct Basal Stem UG(–14) CNNC mir40.wt WT CC None None mir40.1 (mutant) UG mir40.2 WT reverted UG None mir40.3 WT reverted UG CNNC(+16) mir40.4 WT reverted UG CNNC(+17) mir40.5 WT reverted UG CNNC(+18) mir40.6 WT reverted UG CNNC(+19) mir40.7 WT reverted UG CNNC(+20) mir40.wt GUCUC CCU- -C G A A A AUC CCUGU CCGCACCU AGU GAUGUAUGCC UG UGAUA GAU \ GGACA GGCGUGGA UCG CUACAUGUGG GC ACUAU CUA A ACGU UG A A - C C AAG CGAA GUUU A---- cel-miR-40 Probe Binding Site mir40.1 GUCUC CCU- G -C G A A A AUC U UGU CCGCACCU AGU GAUGUAUGCC UG UGAUA GAU \ G ACA GGCGUGGA UCG CUACAUGUGG GC ACUAU CUA A ACGU G UG A A - C C AAG CGAA GUUU A---mir40.2 GUCUC CCU- -C G A A A AUC UGUGU CCGCACCU AGU GAUGUAUGCC UG UGAUA GAU \ ACACA GGCGUGGA UCG CUACAUGUGG GC ACUAU CUA A ACGU UG A A - C C AAG A---- CGAA GUUU 706050- wt 1 2 3 4 5 6 7 4030cel-miR-40 20hsa-miR-1 cel-miR-40 relative expression B 8 4 2 1 0.5 wt 1 2 3 4 5 6 7 Figure S7. Rescue of C. elegans pri-miRNA processing in human cells, related to Figure 7. (A) Effects of adding human pri-miRNA motifs to C. elegans mir-50. Changes that introduced the listed features were incorporated into mir-50 within the bicistronic expression vector (left). Secondary structures are shown for changes that were predicted to affect the wild-type basal stem (middle), with the annotated Drosha cleavage sites (purple arrowheads). After transfection into HEK293T cells, accumulation of miR-50 was assessed on an RNA blot, normalizing to the accumulation of the miR-1 control, and increased miR-50 expression is plotted (geometric mean ± standard error, n = 3, right). (B) Effects of adding human pri-miRNA motifs to C. elegans mir-40, otherwise as in (A). 111 Supplemental Table S1: Oligonucleotides used in the in vitro selections Related to the Experimental Procedures. Degenerate and conventional oligonucleotides were commercially synthesized (IDT). Oligonucleotides are DNA unless otherwise noted. HDV 5′ Polishing Primer hsa-mir-125a circular selection CTTCTCCCTTAGCCTACCGAAGTAGCCCAGG Note: Underlined bases were degenerate. S125circ.004 T7 Adaptor CAGAGATGCATAATACGACTCACTATAGGGTCACAG S125circ.001a Left arm S125circ.003 HDV adaptor GACTCACTATAGGGTCACAGGTGAGGTTCTTGGGAGCCTGGCGTCTGGCCCA ACCACACACCTGGGGAATTGCTGGCCTGACTTCTGACCCCTGACTCCT TCCTCACAGGTTAAAGGGTCTCAGGGACCTAGAGACTGGCAACATGGTGTGC GGTGGCCCGGTAGACCCTGGGGTGGGGGTATGAGGAGTCAGGGGTCAG CTTCTCCCTTAGCCTACCGAAGTAGCCCAGGTCGGACCGCGAGGAGGTGGAG ATGCCATGCCGACCCTGGATGTCCTCACAGGTTAAAGG S125circ.006 3p Arm Splint AGACGCCAAGATCGGA S125circ.007 5p Arm Splint ACGTGTACCCTAGAGA S125circ.009 Ref RT TGGATGTCCTCACAGGTTAAAGGGTCTCAGGGACCTAG S125circ.014 Ref-II Fwd Primer S125circ.015 Ref-II Rev Primer CTTTCCCTACACGACGCTCTTCCGATCTCAGGTGAGGTTCTTGGGAGCCTGG C GCATTCCTGCTGAACCGCTCTTCCGATCTTTAAAGGGTCTCAGGGACCTAGA G hsa-mir-16-1 circular selection Note: Underlined bases were degenerate. S16-1circ.004 T7 Adaptor S16-1circ.003 HDV Adaptor CAGAGATGCATAATACGACTCACTATACTAAAATTATCTCCAGTATTAACTG TGC ATACTAAAATTATCTCCAGTATTAACTGTGCTGCTGAAGTAAGGTTGACCAT ACTCTACAGTTGTGTTTTAATGTATATTAATGTTGCTTAATTAAGGAC ACCCAATCTTAACGCCAATATTTACGTGCTGCTAAGGCACTGCTGACATTGC TATCATAAGAGCTATGAATAAAAAGAAATATGTCCTTAATTAAGCAAC GCCTACCGAAGTAGCCCAGGTCGGACCGCGAGGAGGTGGAGATGCCATGCCG ACCCAATCTTAACGCCAATATTTAC S16-1circ.006 3p Arm Splint AACCTTACAGATCGGA S16-1circ.006b 3p Arm Splint Variant AACCTTACTAGATCGGA S16-1circ.007 5p Arm Splint ACGTGTACAGGCACTG S16-1circ.009 Ref RT Primer AATCTTAACGCCAATATTTACGTGCTGCTAAGGC S16-1circ.010 Ref Fwd Primer CTTTCCCTACACGACGCTCTTCCGATCTTCCAGTATTAACTGTGCTGCTGA S16-1circ.011 Ref Rev Primer GCATTCCTGCTGAACCGCTCTTCCGATCTCCAATATTTACGTGCTGCTA hsa-mir-30a circular selection Note: Underlined bases were degenerate. S30acirc.004 T7 Adaptor CAGAGATGCATAATACGACTCACTATAGCCACAGATGGGCTTTCAGTCGG S30acirc.001a Left Arm S30a.003 HDV Adaptor CTATAGCCACAGATGGGCTTTCAGTCGGATGTTTGCAGCTGCCTACTGCCTC GGACTTCAAGGGGCTACTTTAGGAGCAATTATCTTGTTAATTAAGGTT CGACCCTTCACAGCTTCCAGTCGAGGATGTTTACAGTCGCTCACTGTCAACA GCAATATACCTTCTTTAGCCTTCTGTTGGGTTAACCTTAATTAACAAG GCCTACCGAAGTAGCCCAGGTCGGACCGCGAGGAGGTGGAGATGCCATGCCG ACCCTTCACAGCTTCCAGTCGAGG S30acirc.006 3p Arm Splint AGTAGGCAAGATCGGA S30acirc.007 5p Arm Splint ACGTGTACGTCGCTCA S125circ.002a Right arm S16-1circ.001a Left Arm S16-1circ.002a Right Arm S30acirc.002a Right Arm 112 S30acirc.009 Ref RT Primer CCGATCTTTCCAGTCGAGGATGTTTACAGTCGC S30acirc.010 Ref Fwd Primer S30acirc.011 Ref Rev Primer CTTTCCCTACACGACGCTCTTCCGATCTGGGCTTTCAGTCGGATGTTTGCAG CTGC GCATTCCTGCTGAACCGCTCTTCCGATCTTTCCAGTCGAGGATGTTTACAGT CGC hsa-mir-223 circular selection Note: Underlined bases were degenerate. S223circ.004 T7 Adaptor CAGAGATGCATAATACGACTCACTATAGGTAGAGTGTCAGTTTGTC S223circ.001a Left Arm S223.003 HDV Adaptor ACTATAGGTAGAGTGTCAGTTTGTCAAATACCCCAAGTGCGGCACATGCTTA CCAGCTCTAGGCCAGGGCAGATGGGATATGACGAATTTAATTAAGATC ACATGGAGTGTCCAACTCAGCTTGTCAAATACACGGAGCGTGGCACTGCAGG AGGCCAGGCCAAGAGCTTCTGTGGGGAAGTGAGATCTTAATTAAATTC CTTCTCCCTTAGCCTACCGAAGTAGCCCAGGTCGGACCGCGAGGAGGTGGAG ATGCCATGCCGACCCACATGGAGTGTCCAACTCAGC S223circ.006 3p Arm Splint GCCGCACTAGATCGGA S223circ.007 5p Arm Splint ACGTGTACGAGCGTGG S223circ.009 Ref RT Primer ACATGGAGTGTCCAACTCAGCTTGTCAAATACAC S223circ.010 Ref Fwd Primer S223circ.011 Ref Rev Primer CTTTCCCTACACGACGCTCTTCCGATCTGTCAGTTTGTCAAATACCCCAAGT G GCATTCCTGCTGAACCGCTCTTCCGATCTACTCAGCTTGTCAAATACACGGA GC Common oligonucleotides for circular cleavage selection Note: “p” indicates a phosphate and lowercase letters denote RNA S0circ.005 3p Arm Adaptor S0circ.007B.007.A 5p Arm Adaptor, CAT barcode S0circ.007B.007.B 5p Arm Adaptor, ATG barcode S0circ.007B.007.C 5p Arm Adaptor, TGA barcode S0circ.007B.007.D 5p Arm Adaptor, TAG barcode S0circ.008 RT Primer (-1) GAGATCTACACTCTTTCCCTACACGACGCTCTuccgaucu S0.001 Solexa Fwd Seq S0.002 Solexa Rev Seq, -1 short AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTC CGATCT CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTC TTCCGATC hsa-mir-125a binding selection Note: Underlined bases were degenerate. S0.003 T7-fwdSeq CAGAGATGCATAATACGACTCACTATAGGACACGACGCTCTTCCGATCT S125.001 Left arm S125.002 Right arm ACGCTCTTCCGATCTCCCCAGGGTCTACCGGGCCACCGCACACCATGTTGCC AGTCTCTAGGTCCCTGAGACCCTTTAACCTGTGAGGACATCCAGGGTC AGGAGTCAGGGGTCAGAAGTCAGGCCAGCAATTCCCCAGGTGTGTGGTTGGG CCAGACGCCAGGCTCCCAAGAACCTCACCTGTGACCCTGGATGTCCTC S125.003 RT primer TATGAGGAGTCAGGGGTCAG S125.005.A Solexa Reverse, barcoded CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTC TTCCGATCTCATTATGAGGAGTCAGGGGTCAG CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTC TTCCGATCTATGTATGAGGAGTCAGGGGTCAG CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTC TTCCGATCTTGATATGAGGAGTCAGGGGTCAG CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTC TTCCGATCTTAGTATGAGGAGTCAGGGGTCAG S223circ.002a Right Arm S125.005.B Solexa Reverse, barcoded S125.005.C Solexa Reverse, barcoded S125.005.D Solexa Reverse, barcoded p-guacacguaugaGATCGGAAGAGCGGTTCAGCAGGAATGC p-guacacgucauaGATCGGAAGAGCGGTTCAGCAGGAATGC p-guacacguucaaGATCGGAAGAGCGGTTCAGCAGGAATGC p-guacacgucuaaGATCGGAAGAGCGGTTCAGCAGGAATGC GCATTCCTGCTGAACCGCTCTTCCGATC 113 hsa-mir-125a loop and apical selection Note: Underlined bases were degenerate. S125loop.001a Central S125loop.002 Left Arm ACCATGTTGCCAGTCTCTAGGTCCCTGAGACCCTTTAACCTGTGAGGACATC CAGGGTCACAGGTGAGGTTCTTGGGAGCCTGGCGTCTGGCCCAACCAC CAGAGATGCATAATACGACTCACTATAgCCCCCACCCCAGGGTCTACCGGGC CACCGCACACCATGTTGCCAGTCTCTAGG S125loop.003 Right Arm GGTCAGAAGTCAGGCCAGCAATTCCCCAGGTGTGTGGTTGGGCCAGACGCCA S125loop.004 RT Primer GGCATAGGCTCCCAAGAACCTC S125loop.005 Init. PCR Fwd GACGATCTCCCTGAGACCCTTTAA S125loop.005a Init. PCR Fwd, barcoded GACGATCgaTCCCTGAGACCCTTTAA S125loop.005b Init. PCR Fwd, barcoded GACGATCctTCCCTGAGACCCTTTAA S125loop.006 Solexa-R Adaptor CAAGCAGAAGACGGCATACGAGGCTCCCAAGAACCTC S125loop.007 Solexa-Seq Adaptor AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCTCC CTGAGACCCTTTAA AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCgaT CCCTGAGACCCTTTAA S125loop.007a Solexa-Seq Adaptor, barcoded S125loop.007b Solexa-Seq Adaptor, barcoded AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCctT CCCTGAGACCCTTTAA hsa-mir-16-1 loop and apical selection Note: Underlined bases were degenerate. S16-1loop.001a Central S16-1loop.003 Right Arm GCAATGTCAGCAGTGCCTTAGCAGCACGTAAATATTGGCGTTAAGATTCTAA AATTATCTCCAGTATTAACTGTGCTGCTGAAGTAAGGTTGACCATAC CAGAGATGCATAATACGACTCACTATAgATATTTCTTTTTATTCATAGCTCT TATGATAGCAATGTCAGCAGTGCCTTAG AACATTAATATACATTAAAACACAACTGTAGAGTATGGTCAACCTTACTTCA G S16-1loop.004 RT Primer GACGGCATATTCAGCAGCACAGTTAATAC S16-1loop.005 Init. PCR Fwd CCGACGATCTAGCAGCACGTAAATATT S16-1loop.005a Init. PCR Fwd, Barcoded CCGACGATCgaTAGCAGCACGTAAATATT S16-1loop.005b Init. PCR Fwd, Barcoded CCGACGATCctTAGCAGCACGTAAATATT S16-1loop.006 Solexa-R Adaptor CAAGCAGAAGACGGCATACGATTCAGCAGCACAGTTAATAC S16-1loop.007 Solexa-Seq Adaptor AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCTAG CAGCACGTAAATATT AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCgaT AGCAGCACGTAAATATT S16-1loop.002 Left Arm S16-1loop.007a Solexa-Seq Adaptor, barcoded S16-1loop.007b Solexa-Seq Adaptor, barcoded AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCctT AGCAGCACGTAAATATT hsa-mir-30a loop and apical selection Note: Underlined bases were degenerate. S30aloop.001a Central S30aloop.003 Right Arm GCTGTTGACAGTGAGCGACTGTAAACATCCTCGACTGGAAGCTGTGAAGCCA CAGATGGGCTTTCAGTCGGATGTTTGCAGCTGCCTACTGCCTCGGAC CAGAGATGCATAATACGACTCACTATAGGTTAACCCAACAGAAGGCTAAAGA AGGTATATTGCTGTTGACAGTGAGCGACTGTAAACATC GTAAACAAGATAATTGCTCCTAAAGTAGCCCCTTGAAGTCCGAGGCAGTAGG CAGCTGCAAACATC S30aloop.004 RT Primer GACGGCATAGCTGCAAACATCCGACTGA S30aloop.005 Init. PCR Fwd CCGACGATCTGTAAACATCCTCGACTG S30aloop.005a Init. PCR Fwd, Barcoded CCGACGATCgaTGTAAACATCCTCGACTG S30aloop.005b Init. PCR Fwd, Barcoded CCGACGATCctTGTAAACATCCTCGACTG S30aloop.006 Solexa-R Adaptor CAAGCAGAAGACGGCATACGAGCTGCAAACATCCGACTGA S30aloop.007 Solexa-Seq Adaptor AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCTGT AAACATCCTCGACTG AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCgaT GTAAACATCCTCGACTG S30aloop.002 Left Arm S30aloop.007a Solexa-Seq Adaptor, barcoded 114 S30aloop.007b Solexa-Seq Adaptor, barcoded AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCctT GTAAACATCCTCGACTG hsa-mir-223 loop and apical selection Note: Underlined bases were degenerate. S223loop.001a Central S223loop.003 Right Arm TCCTGCAGTGCCACGCTCCGTGTATTTGACAAGCTGAGTTGGACACTCCATG TGGTAGAGTGTCAGTTTGTCAAATACCCCAAGTGCGGCACATGCTTAC CAGAGATGCATAATACGACTCACTATAGATCTCACTTCCCCACAGAAGCTCT TGGCCTGGCCTCCTGCAGTGCCACGCTCCGTGTATTTGACAAGCTGAG ATTCGTCATATCCCATCTGCCCTGGCCTAGAGCTGGTAAGCATGTGCCGCAC TTGGGGTATTTGACAAAC S223loop.004 RT Primer GACGGCATATGGGGTATTTGACAAAC S223loop.005 Init. PCR Fwd CCGACGATCCGTGTATTTGACAAGCTGA S223loop.005a Init. PCR Fwd, Barcoded CCGACGATCgaCGTGTATTTGACAAGCTGA S223loop.005b Init. PCR Fwd, Barcoded CCGACGATCctCGTGTATTTGACAAGCTGA S223loop.006 Solexa-R Adaptor CAAGCAGAAGACGGCATACGATGGGGTATTTGACAAAC S223loop.007 Solexa-Seq Adaptor AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCCGT GTATTTGACAAGCTGA AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCgaC GTGTATTTGACAAGCTGA S223loop.002 Left Arm S223loop.007a Solexa-Seq Adaptor, barcoded S223loop.007b Solexa-Seq Adaptor, barcoded AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCctC GTGTATTTGACAAGCTGA 115 Supplemental Table S2. Pri-miRNA collections Related to the Experimental Procedures. Pri-miRNAs used in phylogenetic analyses were chosen as described in the Experimental Procedures. H. sapiens D. rerio hsa-mir-455, hsa-mir-451, hsa-mir-452, hsa-mir-181b-1, hsa-mir-568, hsa-mir-302a, hsa-mir-7-1, hsa-mir-19a, hsa-mir-708, hsa-mir-219-2, hsa-mir-28, hsa-mir-431, hsa-mir-301a, hsa-mir-432, hsa-mir-21, hsa-mir-433, hsa-mir-22, hsa-mir-574, hsamir-598, hsa-mir-599, hsa-mir-101-1, hsa-mir-592, hsa-mir-18a, hsa-mir-590, hsamir-3960, hsa-mir-449a, hsa-mir-320a, hsa-mir-1306, hsa-mir-125b-1, hsa-mir-31, hsa-mir-32, hsa-mir-216a, hsa-mir-1193, hsa-mir-582, hsa-mir-212, hsa-mir-376a-1, hsa-mir-214, hsa-mir-210, hsa-mir-215, hsa-mir-217, hsa-mir-129-2, hsa-mir-124-2, hsa-mir-412, hsa-mir-411, hsa-mir-224, hsa-mir-328, hsa-mir-222, hsa-mir-325, hsa-mir-223, hsa-mir-326, hsa-mir-718, hsa-mir-324, hsa-mir-1912, hsa-mir-711, hsa-mir-425, hsa-mir-424, hsa-mir-423, hsa-mir-1277, hsa-mir-200a, hsa-mir-615, hsa-mir-96, hsa-mir-105-1, hsa-mir-802, hsa-mir-337, hsa-mir-24-1, hsa-mir-338, hsa-mir-339, hsa-mir-335, hsa-mir-330, hsa-mir-331, hsa-mir-744, hsa-mir-218-1, hsa-mir-135a-2, hsa-mir-202, hsa-mir-203, hsa-mir-551b, hsa-mir-199a-2, hsa-mir345, hsa-mir-346, hsa-mir-204, hsa-mir-205, hsa-mir-342, hsa-let-7a-1, hsa-mir340, hsa-mir-544, hsa-mir-542, hsa-mir-541, hsa-mir-138-1, hsa-mir-33a, hsa-mir1298, hsa-mir-760, hsa-mir-668, hsa-mir-764, hsa-mir-150, hsa-mir-761, hsa-mir23b, hsa-mir-762, hsa-mir-664, hsa-mir-767, hsa-mir-665, hsa-mir-9-2, hsa-mir-3652, hsa-mir-149, hsa-mir-107, hsa-mir-142, hsa-mir-144, hsa-mir-1249, hsa-mir-143, hsa-mir-1247, hsa-mir-145, hsa-mir-140, hsa-mir-193b, hsa-mir-652, hsa-mir-653, hsa-mir-654, hsa-mir-759, hsa-mir-139, hsa-mir-153-2, hsa-mir-137, hsa-mir-136, hsa-mir-134, hsa-mir-196a-2, hsa-mir-1264, hsa-mir-194-1, hsa-mir-127, hsa-mir126, hsa-mir-122, hsa-mir-505, hsa-mir-506, hsa-mir-1251, hsa-mir-770, hsa-mir503, hsa-mir-504, hsa-mir-10b, hsa-mir-1-2, hsa-mir-133a-1, hsa-mir-128-2, hsamir-99a, hsa-mir-486, hsa-mir-147b, hsa-mir-489, hsa-mir-488, hsa-mir-208a, hsamir-483, hsa-mir-485, hsa-mir-484, hsa-mir-384, hsa-mir-383, hsa-mir-191, hsa-mir190, hsa-mir-185, hsa-mir-186, hsa-mir-187, hsa-mir-188, hsa-mir-27b, hsa-mir892b, hsa-mir-490, hsa-mir-374c, hsa-mir-491, hsa-mir-3074, hsa-mir-146a, hsamir-375, hsa-mir-184, hsa-mir-500b, hsa-mir-378, hsa-mir-183, hsa-mir-182, hsamir-30c-1, hsa-mir-15b, hsa-mir-367, hsa-mir-362, hsa-mir-361, hsa-mir-363, hsamir-370, hsa-mir-1224, hsa-mir-371, hsa-mir-29a, hsa-mir-92a-1, hsa-mir-2861, hsa-mir-496, hsa-mir-670, hsa-mir-26a-1, hsa-mir-495, hsa-mir-148a, hsa-mir-493, hsa-mir-671, hsa-mir-499, hsa-mir-676, hsa-mir-675, hsa-mir-497, hsa-mir-875, hsa-mir-876, hsa-mir-877, hsa-mir-155, hsa-mir-450a-2, hsa-mir-297, hsa-mir-3064, hsa-mir-298, hsa-mir-296, hsa-mir-3065, hsa-mir-34a, hsa-mir-873, hsa-mir-299, hsa-mir-874 dre-let-7f, dre-mir-100-2, dre-mir-101b, dre-mir-107, dre-mir-10d, dre-mir-122, dremir-124-2, dre-mir-125b-1, dre-mir-126b, dre-mir-128-1, dre-mir-129-2, dre-mir-1322, dre-mir-133a-2, dre-mir-135b, dre-mir-137-2, dre-mir-1388, dre-mir-139, dre-mir140, dre-mir-142b, dre-mir-145, dre-mir-146b-1, dre-mir-148, dre-mir-153a, dre-mir155, dre-mir-15c, dre-mir-1788, dre-mir-181b-1, dre-mir-182, dre-mir-183, dre-mir184, dre-mir-187, dre-mir-190, dre-mir-192, dre-mir-193a-1, dre-mir-194a, dre-mir196a-2, dre-mir-199-3, dre-mir-19b, dre-mir-202, dre-mir-203a, dre-mir-204-2, dremir-206-2, dre-mir-20a, dre-mir-210, dre-mir-21-1, dre-mir-214, dre-mir-216a-1, dremir-2188, dre-mir-218a-1, dre-mir-219-3, dre-mir-222, dre-mir-223, dre-mir-22a, dre-mir-23a-3, dre-mir-24-1, dre-mir-26a-3, dre-mir-27a, dre-mir-29b-3, dre-mir301c, dre-mir-30a, dre-mir-31, dre-mir-338-1, dre-mir-34, dre-mir-363, dre-mir-3652, dre-mir-375-2, dre-mir-429, dre-mir-430c-3, dre-mir-451, dre-mir-454b, dre-mir455, dre-mir-456, dre-mir-458, dre-mir-460, dre-mir-489, dre-mir-499, dre-mir-736, dre-mir-7b, dre-mir-92a-1, dre-mir-9-3, dre-mir-96 116 C. intestinalis A. gambiae D. melanogaster cin-mir-4003b, cin-mir-4001a-2, cin-mir-4092, cin-mir-4001a-1, cin-mir-4091, cinmir-4094, cin-mir-4093, cin-mir-4098, cin-mir-196, cin-mir-4097, cin-mir-4099, cinmir-4219, cin-mir-4003d, cin-mir-4003c, cin-mir-4217, cin-mir-4090, cin-mir-4207, cin-mir-4209, cin-mir-4208, cin-mir-4213, cin-mir-4214, cin-mir-4215, cin-mir-4216, cin-mir-4006a-1, cin-mir-4212, cin-mir-4014-1, cin-mir-4006a-3, cin-mir-4006a-2, cin-mir-4010-1, cin-mir-4001d, cin-mir-4001c, cin-mir-4001h, cin-mir-4001g, cin-mir4001f, cin-mir-4001e, cin-mir-4001i, cin-mir-1502c, cin-mir-1502b, cin-mir-1502d, cin-mir-4013a, cin-mir-4013b, cin-mir-4120, cin-mir-4123, cin-mir-4124, cin-mir4121, cin-mir-4122, cin-mir-4127, cin-mir-4128, cin-mir-4118, cin-let-7f, cin-mir4000d, cin-mir-4000e, cin-mir-4000f, cin-mir-4000g, cin-mir-15, cin-mir-375, cin-mir4000c, cin-mir-4110, cin-mir-4112, cin-mir-219, cin-mir-4113, cin-mir-4114, cin-mir4115, cin-mir-4116, cin-mir-4011a, cin-mir-4011b, cin-mir-4000i, cin-mir-4000h, cinmir-4002, cin-mir-29, cin-mir-200, cin-mir-4007, cin-mir-4004, cin-mir-4106, cin-mir4103, cin-mir-4101, cin-mir-367, cin-mir-4100, cin-mir-4201, cin-mir-31, cin-mir4077d, cin-mir-4077c, cin-mir-4205, cin-mir-34, cin-mir-4203, cin-mir-124-1, cin-mir3598, cin-mir-3599, cin-mir-92c, cin-mir-92a, cin-mir-4159, cin-mir-4158, cin-mir4157, cin-mir-4156, cin-mir-4155, cin-mir-92e, cin-mir-4154, cin-mir-4153, cin-mir4019, cin-mir-4169, cin-mir-4166, cin-mir-4165, cin-mir-4021, cin-mir-4022, cin-mir4025, cin-mir-4178a, cin-mir-4163, cin-mir-4024, cin-mir-126, cin-mir-4012-2, cinmir-125, cin-mir-4026, cin-mir-4029, cin-mir-4028, cin-mir-3575, cin-mir-4020b, cinmir-4129, cin-mir-4134, cin-mir-4031, cin-mir-4133, cin-mir-4030, cin-mir-4132, cinmir-4139, cin-mir-135, cin-mir-133, cin-mir-4137, cin-mir-4136, cin-mir-4039, cinmir-4038, cin-mir-4037, cin-mir-4131, cin-mir-4036, cin-mir-4130, cin-mir-4035, cinmir-4034, cin-mir-4033, cin-mir-4041, cin-mir-4144, cin-mir-4143, cin-mir-4146, cinmir-4043, cin-mir-141, cin-mir-4145, cin-mir-4009b, cin-mir-4147, cin-mir-4009a, cin-mir-4009c, cin-mir-4049, cin-mir-4048, cin-mir-4140, cin-mir-4045, cin-mir-4044, cin-mir-4142, cin-mir-4047, cin-mir-4046, cin-mir-4141, cin-mir-4057, cin-mir-4196, cin-mir-7, cin-mir-4197, cin-mir-4055, cin-mir-4194, cin-mir-4195, cin-mir-4056, cinmir-4192, cin-mir-4001b-2, cin-mir-4193, cin-mir-4190, cin-mir-4059, cin-mir-9, cinmir-4001b-1, cin-mir-4008b, cin-mir-4008c, cin-mir-4050, cin-mir-4008a, cin-mir155, cin-mir-4053, cin-mir-153, cin-mir-4054, cin-mir-4198, cin-mir-4051, cin-mir-1, cin-mir-4052, cin-mir-1473, cin-mir-96, cin-mir-4066, cin-mir-4067, cin-mir-4068, cin-mir-4069, cin-let-7a-2, cin-mir-4060, cin-mir-4061, cin-mir-4062, cin-mir-4063, cin-mir-4064, cin-mir-4065, cin-mir-132-2, cin-mir-4171, cin-mir-4014-2, cin-mir4174, cin-mir-4079, cin-mir-4006f, cin-mir-4172, cin-mir-4078, cin-mir-4006g, cinmir-4173, cin-mir-4006d, cin-mir-4076, cin-mir-4006e, cin-mir-4179, cin-mir-4073, cin-mir-4006b, cin-mir-4176, cin-mir-4074, cin-mir-4006c, cin-mir-4177, cin-mir4071, cin-mir-4072, cin-mir-4070, cin-mir-182, cin-mir-1497, cin-mir-4017-2, cin-mir4018b, cin-mir-281, cin-mir-4180, cin-mir-4181, cin-mir-4015-1, cin-mir-4088, cinmir-4183, cin-mir-4089, cin-mir-4185, cin-mir-4186, cin-mir-4084, cin-mir-4005a, cin-mir-183, cin-mir-4085, cin-mir-4086, cin-mir-4005c, cin-mir-4087, cin-mir-40162, cin-mir-4081, cin-mir-4083 aga-mir-9a, aga-mir-1000, aga-mir-957, aga-mir-100, aga-mir-iab-4, aga-mir-275, aga-mir-965-1, aga-mir-278, aga-mir-279, aga-mir-276, aga-mir-277, aga-mir-993, aga-mir-1, aga-mir-8, aga-mir-305, aga-mir-996, aga-mir-137, aga-mir-10, aga-mir11, aga-mir-929, aga-mir-12, aga-mir-927, aga-mir-14, aga-mir-283, aga-mir-286, aga-mir-375-2, aga-mir-281, aga-mir-282, aga-mir-309, aga-mir-308, aga-mir-307, aga-bantam, aga-mir-1890, aga-mir-315, aga-mir-981, aga-let-7, aga-mir-87, agamir-184, aga-mir-219, aga-mir-2-1, aga-mir-1891, aga-mir-124, aga-mir-125, agamir-190, aga-mir-210, aga-mir-988, aga-mir-989, aga-mir-317, aga-mir-970, agamir-34, aga-mir-92b, aga-mir-263, aga-mir-1175 dme-mir-31a, dme-mir-932, dme-mir-8, dme-mir-5, dme-mir-4, dme-mir-7, dme-mir125, dme-mir-1, dme-mir-124, dme-mir-3, dme-mir-318, dme-mir-219, dme-mir-316, dme-mir-317, dme-mir-34, dme-mir-33, dme-mir-193, dme-mir-190, dme-mir-281-2, dme-mir-92a, dme-mir-210, dme-mir-315, dme-mir-314, dme-mir-313, dme-mir-312, dme-mir-305, dme-mir-306, dme-mir-308, dme-mir-375, dme-mir-959, dme-mir-958, 117 C. elegans C. briggsae P. pacificus dme-mir-184, dme-mir-957, dme-mir-955, dme-mir-100, dme-mir-2494, dme-mir304, dme-mir-303, dme-mir-969, dme-mir-965, dme-mir-9a, dme-mir-968, dme-mir967, dme-mir-961, dme-mir-962, dme-mir-276a, dme-mir-963, dme-mir-964, dmemir-960, dme-mir-2489, dme-mir-307b, dme-mir-263a, dme-mir-971, dme-mir-970, dme-mir-975, dme-mir-976, dme-mir-977, dme-mir-978, dme-bantam, dme-mir307a, dme-mir-1001, dme-mir-1002, dme-mir-1003, dme-mir-1005, dme-mir-1006, dme-mir-6-3, dme-mir-982, dme-mir-981, dme-mir-980, dme-mir-986, dme-mir1000, dme-mir-989, dme-mir-987, dme-mir-988, dme-mir-iab-4, dme-mir-iab-8, dme-mir-252, dme-mir-2a-2, dme-mir-1012, dme-mir-1013, dme-mir-1010, dme-mir995, dme-mir-1011, dme-mir-994, dme-mir-87, dme-mir-996, dme-mir-991, dmemir-993, dme-mir-992, dme-let-7, dme-mir-288, dme-mir-289, dme-mir-999, dmemir-286, dme-mir-284, dme-mir-285, dme-mir-282, dme-mir-283, dme-mir-280, dme-mir-927, dme-mir-929, dme-mir-279, dme-mir-11, dme-mir-275, dme-mir-12, dme-mir-277, dme-mir-14, dme-mir-278, dme-mir-133, dme-mir-10, dme-mir-274, dme-mir-137, dme-mir-983-2 cel-mir-2208b, cel-mir-237, cel-mir-90, cel-mir-238, cel-mir-55, cel-mir-233, cel-mir58, cel-mir-234, cel-mir-57, cel-mir-239a, cel-mir-235, cel-mir-236, cel-mir-59, celmir-2209b, cel-mir-230, cel-mir-231, cel-mir-232, cel-mir-50, cel-mir-52, cel-mir-51, cel-mir-4922-1, cel-lin-4, cel-mir-2214, cel-mir-67, cel-mir-65, cel-mir-248, cel-mir249, cel-mir-246, cel-mir-247, cel-mir-244, cel-mir-245, cel-mir-240, cel-mir-241, cel-mir-63, cel-mir-62, cel-mir-790, cel-mir-61, cel-mir-791, cel-mir-60, cel-mir-7892, cel-mir-2, cel-mir-4812, cel-mir-1, cel-mir-788, cel-mir-787, cel-mir-786, cel-mir785, cel-mir-71, cel-mir-255, cel-mir-72, cel-mir-35, cel-mir-73, cel-mir-38, cel-mir353, cel-mir-74, cel-mir-259, cel-mir-392, cel-mir-34, cel-mir-70, cel-mir-79, cel-mir358, cel-mir-359, cel-mir-250, cel-mir-354, cel-mir-75, cel-mir-251, cel-mir-355, celmir-76, cel-mir-252, cel-mir-356, cel-mir-77, cel-mir-253, cel-mir-357, cel-mir-254, cel-let-7, cel-mir-1822, cel-lsy-6, cel-mir-39, cel-mir-124, cel-mir-84, cel-mir-268, cel-mir-49, cel-mir-85, cel-mir-269, cel-mir-48, cel-mir-82, cel-mir-83, cel-mir-46, cel-mir-80, cel-mir-360, cel-mir-44, cel-mir-43, cel-mir-42, cel-mir-228, cel-mir-86, cel-mir-87 cbr-let-7, cbr-mir-238, cbr-mir-236, cbr-mir-237, cbr-mir-234, cbr-mir-87, cbr-mir235, cbr-mir-233, cbr-mir-47, cbr-mir-48, cbr-mir-49, cbr-mir-43, cbr-mir-124, cbrmir-42, cbr-mir-83, cbr-mir-242, cbr-mir-239b, cbr-mir-84, cbr-mir-241, cbr-mir-85, cbr-mir-240, cbr-mir-86, cbr-mir-80, cbr-mir-81, cbr-mir-248, cbr-mir-249, cbr-mir232-1, cbr-mir-35b-6, cbr-mir-244, cbr-mir-245, cbr-mir-76, cbr-mir-246, cbr-mir791, cbr-mir-39, cbr-mir-73b, cbr-mir-34, cbr-mir-789, cbr-mir-786, cbr-mir-787, cbrmir-784, cbr-mir-785, cbr-mir-357-1, cbr-mir-74, cbr-mir-251, cbr-mir-250, cbr-mir72, cbr-mir-253, cbr-mir-392, cbr-mir-252, cbr-mir-70, cbr-mir-71, cbr-mir-90b, cbrmir-67, cbr-mir-359, cbr-mir-255, cbr-mir-358, cbr-mir-254, cbr-mir-355, cbr-mir-353, cbr-mir-259, cbr-mir-354, cbr-mir-360, cbr-mir-45-2, cbr-mir-790-1, cbr-mir-2222-2, cbr-mir-60, cbr-mir-61, cbr-mir-1822, cbr-mir-57, cbr-mir-55, cbr-mir-228, cbr-mir-58, cbr-mir-1, cbr-mir-268, cbr-mir-54a, cbr-mir-77-2, cbr-lsy-6, cbr-mir-52, cbr-mir-35d, cbr-mir-231, cbr-mir-50, cbr-mir-230 ppc-mir-2265, ppc-mir-2266, ppc-mir-2235-1, ppc-mir-2267, ppc-mir-2268, ppc-mir2269, ppc-mir-279, ppc-mir-72, ppc-mir-71, ppc-mir-63b, ppc-mir-67, ppc-mir-65, ppc-mir-2239-1, ppc-mir-993, ppc-mir-2271, ppc-mir-236, ppc-mir-2270, ppc-mir2233, ppc-mir-2273, ppc-mir-234, ppc-mir-2272, ppc-mir-2234a, ppc-mir-232, ppcmir-2274, ppc-lin-4, ppc-mir-2237b, ppc-mir-1, ppc-mir-2, ppc-mir-86, ppc-mir-55a, ppc-mir-55b, ppc-mir-81, ppc-mir-124, ppc-mir-79, ppc-mir-2253b, ppc-mir-2247, ppc-mir-2248, ppc-mir-2249, ppc-let-7, ppc-mir-2243, ppc-mir-252, ppc-mir-2244, ppc-mir-38, ppc-mir-2245, ppc-mir-37, ppc-mir-2246, ppc-mir-312, ppc-mir-2241a-2, ppc-mir-45, ppc-mir-46, ppc-mir-2240c, ppc-mir-2250, ppc-mir-2236a, ppc-mir2236b, ppc-mir-87, ppc-mir-2258, ppc-mir-2259, ppc-mir-2256, ppc-mir-240, ppcmir-84b, ppc-mir-2255, ppc-mir-2234b, ppc-mir-242, ppc-mir-2242-2, ppc-mir-22321, ppc-mir-2238a-2, ppc-mir-56, ppc-mir-42b, ppc-mir-42a, ppc-mir-2264, ppc-mir2263, ppc-mir-2262, ppc-mir-2261, ppc-mir-239-2 118 C. teleta L. gigantea S. mediterranea N. vectensis cte-mir-1999, cte-mir-87a, cte-mir-1997, cte-mir-1998, cte-mir-1991, cte-mir-124, cte-mir-1992, cte-mir-1993, cte-mir-1994, cte-mir-193, cte-mir-277b, cte-mir-216b, cte-mir-12, cte-mir-750, cte-mir-1989, cte-mir-216c, cte-mir-2694, cte-mir-137, ctemir-1987, cte-mir-2693, cte-mir-2692, cte-mir-2691, cte-mir-133, cte-mir-2690, ctemir-2686b, cte-mir-2699, cte-mir-2686c, cte-mir-2695, cte-mir-2696, cte-mir-29b, cte-mir-92c, cte-mir-92b, cte-mir-153, cte-mir-219, cte-mir-277a, cte-mir-210, ctemir-210b, cte-mir-71, cte-mir-182, cte-mir-1175, cte-mir-1996b, cte-mir-1996a, ctemir-281, cte-mir-981, cte-mir-278, cte-mir-279, cte-mir-67, cte-mir-2001, cte-mir375, cte-mir-2000, cte-mir-7, cte-mir-1, cte-mir-9, cte-mir-8, cte-mir-2703, cte-mir2706, cte-mir-993, cte-mir-2705, cte-mir-2708, cte-mir-2707, cte-mir-996, cte-mir2709, cte-mir-365, cte-mir-2700, cte-mir-2702, cte-mir-317, cte-mir-2687, cte-mir1990c, cte-mir-2g, cte-mir-2685, cte-mir-2f, cte-mir-2e, cte-mir-315, cte-mir-1990b, cte-mir-2d, cte-mir-2c, cte-mir-2689, cte-mir-2688, cte-mir-2719, cte-mir-2718, ctemir-2717, cte-mir-252b, cte-mir-2716, cte-mir-2714, cte-mir-2713, cte-mir-2712, ctemir-2711, cte-mir-2710, cte-mir-184b, cte-mir-31, cte-let-7, cte-bantam, cte-mir-10b, cte-mir-10a, cte-mir-10d, cte-mir-10c, cte-mir-745a, cte-mir-745b, cte-mir-242, ctemir-2721, cte-mir-33, cte-mir-34, cte-mir-2720, cte-mir-36 lgi-mir-31, lgi-mir-34, lgi-mir-33, lgi-mir-2001, lgi-mir-745b, lgi-mir-252, lgi-let-7, lgimir-87, lgi-mir-981, lgi-mir-193, lgi-mir-216a, lgi-mir-2c, lgi-mir-317, lgi-mir-1, lgi-mir315, lgi-mir-124, lgi-mir-745a, lgi-mir-96a, lgi-mir-96b, lgi-mir-71, lgi-mir-2d, lgi-mir182, lgi-mir-184, lgi-mir-183, lgi-mir-375, lgi-mir-8, lgi-mir-10, lgi-mir-9, lgi-mir-7, lgimir-12, lgi-mir-1993, lgi-mir-137, lgi-mir-133, lgi-mir-281, lgi-mir-1990, lgi-mir-1992, lgi-mir-1991, lgi-mir-92, lgi-mir-67, lgi-mir-29, lgi-mir-242, lgi-mir-1985, lgi-mir-279, lgi-mir-1984, lgi-mir-278, lgi-mir-1989, lgi-mir-1994b, lgi-mir-1988, lgi-mir-750, lgimir-1986, lgi-mir-252b, lgi-mir-100, lgi-mir-1175 sme-mir-1175, sme-mir-1a, sme-mir-1c, sme-mir-1b, sme-mir-71c, sme-mir-125a, sme-mir-125b, sme-mir-61b, sme-mir-61a, sme-mir-2156b, sme-mir-2156a, smemir-92, sme-lin-4, sme-mir-7d, sme-mir-7c, sme-mir-7b, sme-mir-7a, sme-mir-745, sme-mir-31a, sme-mir-10b, sme-mir-10a, sme-mir-190b, sme-mir-190a, sme-mir12, sme-mir-2150, sme-mir-8b, sme-mir-13, sme-mir-8a, sme-mir-2152, sme-mir2151, sme-mir-2153, sme-mir-2155, sme-mir-1993, sme-mir-2158, sme-mir-1992, sme-mir-2157, sme-mir-278, sme-mir-2159, sme-mir-96b, sme-mir-315, sme-mir9b, sme-mir-281, sme-mir-2154-1, sme-mir-2149, sme-mir-2148, sme-mir-2200, sme-mir-2201, sme-mir-2202, sme-let-7d, sme-mir-124e, sme-mir-277d, sme-mir2203, sme-mir-124d, sme-mir-277c, sme-let-7c, sme-mir-2204, sme-mir-31b-2, sme-mir-277b, sme-mir-2205, sme-mir-277a, sme-mir-2206, sme-mir-2147b, smemir-2e, sme-mir-2f, sme-mir-2177, sme-mir-2b, sme-mir-2178, sme-mir-2179, smemir-36c, sme-mir-2173, sme-mir-36b, sme-mir-36a, sme-mir-2175, sme-mir-2176, sme-mir-2170, sme-mir-2171, sme-mir-2172, sme-bantam-a, sme-mir-133a, smemir-133b, sme-mir-2168, sme-mir-2169, sme-mir-2166, sme-mir-2167, sme-mir2164, sme-mir-2165, sme-mir-2162, sme-mir-2163, sme-mir-2161, sme-mir-67, sme-mir-748, sme-mir-749, sme-mir-746, sme-mir-87d, sme-mir-747, sme-mir-87b, sme-mir-87c, sme-mir-219, sme-mir-87a, sme-mir-752, sme-mir-751, sme-mir-750, sme-mir-756, sme-mir-755, sme-mir-79, sme-mir-76, sme-mir-753b, sme-mir-184, sme-mir-2160-1, sme-mir-216, sme-mir-754c-1, sme-mir-2181, sme-let-7b, nve-mir-2049, nve-mir-2023, nve-mir-2022, nve-mir-2047, nve-mir-2048, nve-mir100, nve-mir-2041, nve-mir-2042, nve-mir-2029, nve-mir-2028, nve-mir-2046, nvemir-2027, nve-mir-2026, nve-mir-2044, nve-mir-2025, nve-mir-2040a-1, nve-mir2024a-3, nve-mir-2033, nve-mir-2034, nve-mir-2031, nve-mir-2030, nve-mir-2050, nve-mir-2035-2, nve-mir-2037, nve-mir-2036, nve-mir-2039, nve-mir-2043b, nvemir-2032b-2 119 120 Chapter 3. Future directions Contents Overview ..................................................................................................................................... 122 Expanding the understanding of human Microprocessor substrates........................................... 122 Recognition of human pri-miRNAs without known motifs ............................................... 122 Dynamics of pri-miRNA recognition in different cellular contexts ................................... 123 Recognition of non-miRNA substrates of Drosha and DGCR8/Pasha............................... 127 Defining nematode pri-miRNAs ................................................................................................. 128 Gaining insight into mechanisms of pri-miRNA recognition ..................................................... 132 Applying selection approaches to stubborn questions in RNA biology ..................................... 136 121 Overview In Chapter 2, in vitro selection of partially-randomized pri-miRNAs and high-throughput sequencing of the functional variants confirmed the importance of secondary structure and revealed three novel and conserved primary sequence determinants. These experiments provided a detailed perspective on the substrate specificity of the human Microprocessor, and, correspondingly, the features that define most conserved human pri-miRNAs. The results in Chapter 2 also inspire several additional questions related to the Microprocessor and the biological definition of a pri-miRNA. I will discuss some attractive avenues of further investigation, ranging from a wider search for determinants in human Microprocessor substrates to the application of in vitro and in vivo selection strategies to find functional elements in other RNAs that have resisted detailed characterization. Expanding the understanding of human Microprocessor substrates Recognition of human pri-miRNAs without known motifs The three recognition motifs identified in Chapter 2 are generally utilized by conserved human pri-miRNAs, with 79% having one or more motifs. Despite the success of the study, the motifs that define the remaining 21% are still unknown. Many of these pri-miRNAs are conserved across vertebrates, and three are conserved across bilaterian animals, suggesting that they have distinct, evolutionarily-conserved processing motifs that were simply not present in the four pri-miRNAs studied (Figure 1A). Partial-randomization and selection of these pri-miRNAs will therefore expand the catalog of pri-miRNA recognition motifs, and perhaps uncover additional determinants that are deeply conserved in bilaterian animals. Some of these pri-miRNAs are located in clusters of miRNAs, including hsa-mir-15b, hsa-mir-24-1, and hsa-mir-200a, and other members of these clusters have the downstream CNNC motif. These pri-miRNAs may simply use separate processing motifs that have not been discovered yet (Figure 1B), but an intriguing possibility is that the processing of clustered primiRNAs is somehow coordinated, and that a single positioned CNNC instance in the cluster is sufficient to recruit the Microprocessor to process the cluster hairpins (Figure 1C). Consistent with this idea, the ~600 kDa Microprocessor complex has been suggested to consist of multiple 122 functional Drosha and DGCR8 units (Han et al., 2004; Sohn et al., 2007), and a single complex could be capable of simultaneously cleaving multiple pri-miRNAs in a cluster. At the same time, clustered miRNAs could interact at a tertiary level to promote or inhibit joint processing by the Microprocessor (Chakraborty et al., 2012). This question could be addressed by computationally analyzing clustered pri-miRNAs and determining whether the probability that any individual member has a processing motif is dependent on the presence or absence of processing motifs in other cluster members. Suggestive results could be followed by in vitro and in vivo studies of the processing of clustered miRNAs. Dynamics of pri-miRNA recognition in different cellular contexts Processing motifs may exist that are dependent on cell identity or cell state, and may not have been identified in Chapter 2 because the lysate used for pri-miRNA cleavage was derived from a single cell line (HEK293T) under standard culture conditions. Such conditional motifs, and the proteins that recognize them, are likely to help regulate levels of pri-miRNA processing and may play important, dynamic roles during physiological changes such as during development or in response to extracellular signals; or during pathological changes, such as oncogenic transformation. For example, the regulation and dysregulation of miRNAs has a crucial role in oncogenesis, primary tumor growth, and distant metastasis (Garzon et al., 2009). On a cellular and molecular level, studies have demonstrated the importance of miRNAs as both tumor suppressors and oncogenes (Zhang et al., 2007). Indeed, many highly-expressed miRNAs overlap with regions that are genomically unstable in cancer, including deletions, duplications, translocations, or losses of heterozygozity (Calin et al., 2005). At an epidemiological level, miRNA expression patterns are highly correlated with both cancer type (Lu et al., 2005) and clinical course (Calin et al., 2004a; Yanaihara et al., 2006), and thus are of both diagnostic and prognostic value. In particular, pri-miRNA processing is dysregulated in cancer. For example, a SNP in the hsa-mir-16-1 locus severely impairs the processing of mir-16-1 by the Microprocessor. Mir16-1 has been shown to be frequently deleted in chronic lymphoid leukemia (CLL) (Calin et al., 2002), leading to dysregulation of the key apoptosis factor BCL2 (Cimmino et al., 2005). Correspondingly, the SNP identified in mir-16-1 that impairs processing was found in two CLL 123 patients, one meeting the criteria for familial CLL, and was associated with loss of heterozygozity in both (Calin et al., 2005). More broadly, as discussed in Chapter 1, several oncogenic and tumor-suppressor proteins have been shown to interact with pri-miRNAs and regulate their cleavage by the Microprocessor complex, including p53 (Suzuki et al., 2009), Smad family members (Davis et al., 2008; Davis et al., 2010; Hata and Davis, 2011), and Lin-28 [reviewed in (Viswanathan and Daley, 2010)]. In most cases, the RNA motifs that these proteins recognize and bind are poorly understood. Identification of conditional motifs will require two expansions to the in vitro selection approach described in Chapter 2 (Figure 1D). First, a more extensive panel of pri-miRNAs must be examined, preferably ones for which there is evidence of dynamic processing and important function in physiology or pathology. For example, four pri-miRNAs might be fertile ground for additional study: hsa-mir-206, hsa-mir-199a-2, hsa-mir-155, and hsa-let-7e. SNPs have been identified in cancer samples for all four of these miRNAs, and these polymorphisms may be responsible for inhibited or enhanced miRNA processing in specific cells or tissues (Calin et al., 2005; Wu et al., 2008). mir-206 is a tumor suppressor miRNA that is downregulated in metastatic subsets of the human breast cancer cell line MDA-MB-231 and the human lung cancer line LM2; restoration of miR-206 expression decreases the colonization capacity of these cells (Tavazoie et al., 2008). mir-199a has been implicated in the regulation of the SWI/SNF complex, and hence may be both a tumor suppressor and an oncogenic miRNA, depending on the tumor (Sakurai et al., 2011). Likewise, miR-155, the major product of the BIC locus, is an oncogenic miRNA activated in human and bird B-lymphomas (Tam et al., 1997; Eis et al., 2005), and its overexpression is sufficient to cause lymphoproliferation (Costinean et al., 2006) and myeloproliferation (O'Connell et al., 2008) disorders in mice. Finally, the let-7 family of miRNAs is a well-studied class of tumor-suppressor miRNAs that have been implicated in a variety of oncologic settings in both molecular and epidemiological studies [reviewed in (Viswanathan and Daley, 2010)]. The second expansion is the preparation and use of lysates from different cell lines or tissues, in order to maximize opportunity to detect cancer or cell-type specific motifs. For example, one might use four readily-available cell lines: HeLa, derived from a cervical carcinoma; Huh7, derived from a hepatocellular carcinoma; MCF7, derived from a breast carcinoma; and K562, derived from a chronic myelogenous leukemia. These lines divide rapidly 124 and are efficiently transfected using cationic lipids, making them suitable for medium-scale preparation of lysates containing the Microprocessor complex and any auxiliary proteins expressed in those cells. These lysates would be used to cleave the expanded panel of partiallyrandomized miRNA substrates. As in Chapter 2, any novel motifs should be confirmed by in vitro and in vivo analysis. High-throughput sequencing of the selected pri-miRNA variants is likely to reveal a variety of sequence and structural features important to processing. Some of these elements will recur in selections from all four cell lines and 293T cells, suggesting that these elements are important for general recognition of pri-miRNAs for cleavage. In contrast, other features could be present in some selections but not others; such motifs are of special interest. For example, some motifs may only be observable in the four human cancer lines, and not in the 293T transformed embryonic line; or in some cancer lines, but not others. This heterogeneity suggests that the utilization of such features is dependent on the cellular milieu, including the expression and/or activity of unknown proteins that may dynamically regulate miRNA processing. Variable motifs may therefore be binding sites for such proteins. In addition to novel motif discovery, the in vitro selection will also provide data on the effect of cancer SNPs previously identified in the four example miRNAs described above. These polymorphisms will be represented millions of times in the proposed selection pool, in addition to the billions of variants mutated at other positions. Analysis of the selected variants will provide quantitative data on the cleavage efficiency of pri-miRNAs with mutations that mimic cancer SNPs. For let-7e, where the specific polymorphism is already known to impair processing of the pri-miRNA (Wu et al., 2008), further study would help define the complete sequence or structural motif that is mutated in prostate cancer. For mir-206, mir-199a-2, and mir-155, detailed studies of processing motifs would help establish whether the cancer polymorphisms affect pri-miRNA processing, in addition to helping define the affected motif. To the extent that novel elements enhance miRNA biogenesis, this study will also aid efforts towards miRNA-based cancer gene therapy. Several approaches are based on stably reconstituting expression of tumor suppressor miRNAs to inhibit tumor growth or metastasis (Esquela-Kerscher et al., 2008; Kumar et al., 2008; Kota et al., 2009; Aigner, 2011), while others are attempting to stably express miRNA-derived hairpins engineered to downregulate specific oncogenes (Wang et al., 2009). In both cases, highly efficient miRNA processing will be critical 125 A Mammalian >1 known motif 79% No known motifs 21% hsa-mir-302a hsa-mir-320a hsa-mir-325 hsa-mir-326 hsa-mir-328 hsa-mir-335 hsa-mir-337 hsa-mir-383 hsa-mir-384 hsa-mir-423 hsa-mir-484 hsa-mir-486 hsa-mir-505 hsa-mir-568 hsa-mir-582 Human pri-miRNAs B hsa-mir-653 hsa-mir-654 hsa-mir-664 hsa-mir-665 hsa-mir-668 hsa-mir-676 hsa-mir-711 hsa-mir-718 hsa-mir-759 hsa-mir-764 hsa-mir-767 hsa-mir-877 hsa-mir-1193 hsa-mir-1251 hsa-mir-1912 hsa-mir-2861 Vertebrate Bilaterian hsa-mir-15b hsa-mir-24-1 hsa-mir-143 hsa-mir-155 hsa-mir-181b-1 hsa-mir-214 hsa-mir-223 hsa-mir-301a hsa-mir-153-2 hsa-mir-200a hsa-mir-216a C Microprocessor CNNC D ??? Partially randomized pri-miRNA sequences mir-206 let-7 family mir-199a-2 mir-155 In vitro selection HeLa Huh7 MCF7 K562 Lysate overexpressing Drosha and DGCR8 Unselected reference pools High-throughput sequencing Novel motif discovery to the efficacy of the therapy, and these efforts will be able to use the miRNA processing motifs identified in this study. In addition, if cancer-specific or tissue-specific motifs are identified, gene therapy platforms could use this information to maximize therapeutic miRNA expression specifically in cancer cells. Recognition of non-miRNA substrates of Drosha and DGCR8/Pasha Several groups have recently described substrates of Drosha and DGCR8/Pasha that are not processed into mature miRNAs, at least not efficiently. Instead, these substrates are mRNAs, and cleavage destabilizes the target messages. The best-described example of this is DGCR8 in humans; its mRNA contains two hairpins, one in the 5′ untranslated region and another in the beginning of the coding region, which are cleaved by Drosha (Han et al., 2009). Since the DGCR8 protein helps stabilize Drosha, the cleavage of DGCR8 would in turn lead to destabilization of Drosha. Thus, the activity of the Drosha-DGCR8 complex is self-limiting (Han et al., 2009). Two mature miRNAs, hsa-mir-3618 and hsa-mir-1306 have been annotated that are derived from the DGCR8 hairpins, but the importance of these miRNAs is unclear; the fact that mature miRNAs can be produced might be considered good illustrations of how the specificity of the processing machinery downstream of the Microprocessor is largely dependent on the biochemistry of the previous steps (Chapter 1). DGCR8 is not the only mRNA target of Drosha cleavage. A microarray comparison of gene expression after Drosha knockdown to expression after Dicer knockdown revealed Figure 1. Open questions in the recognition of human pri-miRNAs (A) How does the Microprocessor recognize pri-miRNAs without known primary sequence determinants? Among conserved human pri-miRNA families, 21% do not have any of the three primary sequence determinants elucidated in Chapter 2. The representatives of these human pri-miRNA families are listed, classified by their conservation. (B) To what degree is the recognition of individual pri-miRNAs biochemically independent from the recognition of other pri-miRNAs in a cluster? One model is that recognition of clustered pri-miRNAs is distinct, implying that pri-miRNAs in the cluster that do not have apparent primary sequence determinants utilize other motifs that have yet to be discovered. (C) To what degree is the recognition of clustered pri-miRNAs coordinated? In contrast to (B), sequence determinants in one pri-miRNA could be sufficient to recruit the Microprocessor to the cluster, so that multiple pri-miRNAs are dependent on a single set of determinants for processing. (D) To what extent is the use of individual recognition determinants conditional on cell state or identity? Performing in vitro selection on additional pri-miRNAs in cell lysates derived from a panel of cancer cell lines could identify motifs that are differentially utilized in different cellular environments. 127 hundreds of mRNAs upregulated after Drosha knockdown but not after Dicer knockdown, suggesting that Drosha could be directly regulating these mRNAs (Han et al., 2009). Similar results were obtained with Dicer knockout and Drosha knockout cell lines in mice (Chong et al., 2010), and an analogous experiment in Drosophila S2 cells similar found hundreds of genes upregulated by Drosha knockdown but not Dicer knockdown (Kadener et al., 2009). Thus Drosha could cleave many mRNAs as a mechanism for directly regulating of gene expression. There are several open questions in the direct cleavage of mRNAs. First, are hairpins in mRNA targets recognized in the same manner as pri-miRNA hairpins? A glance at the DGCR8 hairpins indicates that the upstream hairpin does not have any of the three motifs identified in Chapter 2, while the downstream hairpin has both the loop GUG and the downstream CNNC. A more systematic analysis is needed to determine whether mRNA cleavage substrates have the same propensity as pri-miRNAs to have one or more processing motifs. Second, is the DroshaDGCR8 complex that cleaves mRNA hairpins the same as the Microprocessor complex that cleaves pri-miRNAs? Cleavage of mRNA hairpins has the same core protein requirements (Drosha and Pasha/DGCR8), but it would be interesting to know whether the ~600 kDa Microprocessor complex isolated by gel filtration (Gregory et al., 2004) is fully competent to cleave these hairpins, or if a distinct complex of Drosha, Pasha/DGCR8, and other unidentified proteins is utilized instead. If these hairpins use processing motifs distinct from those of primiRNAs, it would be reasonable to hypothesize that the proteins that recognize those motifs are members of a distinct Drosha-DGCR8 complex. Finally, is Drosha-DGCR8 cleavage of individual mRNAs regulated in a dynamic manner? From a teleological standpoint, constitutive cleavage of the DGCR8 mRNA is comprehensible, since this negative feedback makes the activity of Drosha and DGCR8 self-limiting. However, it is not obvious why Drosha and DGCR8 would constitutively cleave hundreds of other putative mRNA targets. As with primiRNA processing, perhaps Drosha and DGCR8 cleavage of individual mRNAs is dynamically regulated. Defining nematode pri-miRNAs As described in Chapter 2, a species barrier exists between C. elegans pri-miRNAs and human pri-miRNAs, leading to poor processing of C. elegans pri-miRNAs in human cells. In Chapter 2, I described the discovery of additional motifs that promote efficient pri-miRNAs 128 processing in humans, which are absent in C. elegans and other nematode pri-miRNAs. The lack of human processing motifs in nematode pri-miRNAs explains why the nematode pri-miRNAs are poorly processed in human cells, and the species barrier can be bridged by adding the human processing motifs to nematode pri-miRNAs. Although this work has improved the understanding of human pri-miRNA processing, nematode processing continues to be poorly understood. In fact, pri-miRNA processing in nematodes may be unique among bilaterian animals: the downstream CNNC motif described in Chapter 2 is conserved in pri-miRNAs throughout bilaterian animals, but is strikingly absent in nematodes. One interpretation is that mechanisms of pri-miRNA recognition have generally diverged in nematodes from recognition in other bilaterian animals. In fact, pri-miRNA processing would not be alone in this divergence. Nematodes have 2-3 fold higher substitution rates in their 18S ribosomal sequences than other metazoan animals (Aguinaldo et al., 1997), and, among nematodes, the evolution of the C. elegans 18S ribosomal sequence is particularly fast (Holterman et al., 2006). Rapid evolution is reflected in protein coding genes as well; the nematodes have lost several homeobox (Hox) genes compared to the Hox clusters in other animals, and there are high substitution rates in the retained Hox genes (Aboobaker and Blaxter, 2003). Again, even among nematodes, C. elegans and its close relatives have evolved more rapidly, with more Hox gene losses and even a unique expansion of the posterior Hox genes (Aboobaker and Blaxter, 2003). The approach used in Chapter 2 to detail the processing motifs of human pri-miRNAs used a cell lysate to cleave pri-miRNA variants in vitro. This approach can be applied to nematode pri-miRNAs, with some considerations for biological and practical differences between C. elegans and other model animals. One example of a biological difference that could impinge on pri-miRNA processing studies is trans-splicing. C. elegans is unique among the model animals in that some 70% of pre-messenger RNA transcripts are spliced in trans to 22 nt leader exons called SL1 and SL2 (Blumenthal, 2005). Since the splice leaders RNAs form the 5′ end of the mature mRNA, splice leaders provide the methylguanosine cap, and trans-splicing serves to physically excise individual mRNAs from polycistronic transcripts. In many transcripts, SL1 and SL2 form almost the entirety of the 5′ untranslated region in the mature mRNA, and may play a role in efficient translational initiation (Blumenthal, 2005). 129 In a similar way, trans-splicing may play a critical role in pri-miRNA processing. The processing of the cel-let-7 pri-miRNA is thought to be dependent on trans-splicing to SL1, which occurs 41 nt upstream of the 5p cleavage site (Bracht et al., 2004). The degree to which trans-splicing occurs and is important in other pri-miRNAs is not clear, although analysis of whole-transcriptome sequencing in C. elegans suggests that trans splicing to pri-miRNAs may be relatively common (J.-W. Nam, personal communication). This poses an inconvenience for the synthesis of pri-miRNA substrates for in vitro studies. For Drosophila and humans, primiRNA substrates have been inferred by simply examining the sequence of the corresponding genome. By contrast, it will be necessary to determine the 5′ ends of the C. elegans pri-miRNAs by examining transcriptome sequencing data when possible, or by direct experimental approaches like amplification and sequencing of cDNA ends (e.g., 5′-RACE). When trans-splicing to cel-let-7 was characterized, it was suggested that the SL1 exon changed the optimal overall secondary structure of the pri-miRNA, presumably converting a nonfunctional substrate into a recognizable pri-miRNA substrate (Bracht et al., 2004). However, computationally-predicted folding (and changes in folding) is notoriously difficult to interpret when RNAs are hundreds of nucleotides long. To the extent that trans-splicing is generally important in pri-miRNA processing, it seems unlikely that the SL1 sequence optimizes the folding of all these divergent RNAs. An alternative possibility is that the SL1 exon contains a processing motif that marks nearby hairpins as bona-fide pri-miRNAs. For example, the SL1 exon contains an AGUU tetraloop (Greenbaum et al., 1996), and the structure of this tetraloop is thought to be very similar to the AGNN tetraloops recognized by the S. cerevisae RNase III protein Rnt1p (Wu et al., 2001). Perhaps one of the dsRBDs in the C. elegans Microprocessor complex has convergently evolved to recognize the AGNN tetraloop in SL1, and this recognition serves to recruit the nematode Microprocessor to pri-miRNAs. If the AGUU tetraloop in SL1 is important, it should be easy to see enrichment of the SL1 stem and tetraloop after in vitro selection. On a practical level, C. elegans has been a difficult platform for biochemistry. Since no C. elegans cell line is currently available, whole animals would have to be raised and used to make cell-free lysates, including their protease- and nuclease-rich gastrointestinal tracts. Nevertheless, lysates derived from C. elegans larvae have been used to study miRNA turnover, indicating that lysates with cytosolic activities and probably nuclear activities can be successfully 130 prepared (Chatterjee and Grosshans, 2009). In addition, a cell-free system from the parasitic nematode Ascaris lumbricoides was developed to study transcription and co-transcriptional events, including capping (Maroney et al., 1990) and trans-splicing (Hannon et al., 1990). Recently, this technique was adapted to generate splicing-competent extracts in C. elegans (Lasda et al., 2010). It is reasonable to believe that either of these extracts from C. elegans could also be competent for pri-miRNA processing. Regardless of the mechanism of pri-miRNA recognition in nematodes, the divergence from other animals is an opportunity to explore the co-evolution of an enzyme complex and its substrates. At first glance, an enzyme with multiple substrates seems likely to evolve its substrate preferences very slowly, since any significant change would cause the enzyme to lose the capacity to catalyze the reaction on many (perhaps most) of its substrates. To the extent that the enzyme activity on these substrates is important, changes to enzyme specificity would surely be detrimental, and thus the enzyme would be subjected to strong purifying selection. For example, the 3′ splice site consensus sequence YAG recognized by the U2AF complex is conserved in canonical U2-dependent splice sites in animals, plants, and fungi (Sharp and Burge, 1997; Spingola et al., 1999), perhaps because any alteration to the specificity of U2AF would negatively affect thousands of 3′ splice sites. By contrast, proteins that interact with just one binding partner may be capable of much more rapid evolution in their binding specificities. One of the best-studied examples of this is the abalone sperm enzyme lysin, which specifically binds the egg vitelline protein VERL and induces a conformational change that opens a pore for sperm heads to access the egg membrane. VERL and lysin are among the most rapidly evolving proteins in the abalone genome, perhaps because there is evolutionary pressure to continuously change sperm-egg specificity, particularly in organisms with external fertilization (Swanson and Vacquier, 2002). It is reasonable to believe that this rapid evolution would be severely impeded if VERL had to simultaneously maintain critical interactions with dozens or hundreds of ligands. Nevertheless, Microprocessor specificity does appear to have evolved in nematodes. One potential path to altered enzyme specificity is duplication and sub- or neofunctionalization; the redundancy allows one duplicate to evolve, while the other maintains the biologically important substrate interactions. Over time, substrates could then co-evolve to be recognized by one or both duplicates. For example, a functionally distinct spliceosome with different snRNPs, including U12 in place of U2, tends to utilize 3′ splice sites that have an AC rather than AG 131 (Tarn and Steitz, 1996). In worms and other animals, there could be variant Microprocessor complexes that would have allowed the substrate specificity of the complex to change over evolutionary time. Another potential path is modularity in the substrate, where multiple recognition motifs collaborate to distinguish true substrates from other substrates, even though each individual motif contributes relatively little to recognition. In this scenario, if the enzyme or enzyme complex alters its specificity for any one of the motifs, overall recognition is not significantly impacted; over time, the specificity for all the motifs could change quite considerably. For example, the rapid evolution of VERL is likely facilitated by the presence of 22 tandem repeats of the VERL binding domain, since variations in one repeat will not be too detrimental to lysin binding; in time, the overall VERL-lysin interaction can change dramatically, resulting in speciation (Swanson and Vacquier, 2002). Likewise, recognition of human pri-miRNAs appears to be modular, and it would be reasonable to suppose that recognition of nematode pri-miRNAs is also modular, although the modules themselves have diverged between the two lineages. Once the recognition motifs have been determined in C. elegans, phylogenetic analysis of pri-miRNA sequences across the nematode clade could permit a reconstruction of the evolutionary history of pri-miRNA recognition. For example, it is possible that the less rapidlyevolving members of the nematode clade might continue to use the downstream CNNC motif, while gaining recognition elements that are used throughout the nematode clade. C. elegans might use the common nematode elements, and may have gained additional elements specific to the rhabditid subclade, in lieu of the CNNC motif. If the members of the Microprocessor complex are elucidated that recognize these putative motifs, it may even be possible to correlate the evolutionary history of pri-miRNA motifs with the history of individual Microprocessor components. The next section will describe possible strategies for finding protein components of the Microprocessor that recognize pri-miRNA motifs. Gaining insight into mechanisms of pri-miRNA recognition Each sequence or structural preference found in the selection corresponds to a protein component of the Microprocessor that recognizes and binds to the motif, or to an RNA structure whose formation is dependent upon the motif. What proteins bind the motifs identified in Chapter 2, and any additional motifs that might be identified in pri-miRNAs? Given a functional 132 RNA motif, it is possible to specifically incorporate a photoreactive nucleoside analog in the motif, crosslink the nucleoside to a bound protein, and identify the proteins by mass spectrometry. This technique of incorporating a photoreactive nucleoside at a specific site in a long RNA substrate was initially used to identify spliceosome proteins that recognize the 5′ splice site motif (Wyatt et al., 1992). For snoRNAs and tRNAs, structured RNAs with multiple important motifs and protein interactions, this form of site-specific crosslinking has provided physical maps of the functional elements of the ribonucleoprotein complex relative to the substrate RNA (Mishima and Steitz, 1995; Cahill et al., 2002). The classic site-specific crosslinking protocol (Sontheimer, 1994) can be adapted to purify protein/pri-miRNA complexes and identify the peptide components by mass spectrometry. For example, to identify proteins bound to the CNNC motif, a pri-miRNA substrate could be assembled by splint ligation of three RNA fragments: a synthetic RNA oligonucleotide with a 4-thiouridine moiety in the CNNC motif, a synthetic RNA oligonucleotide with a 3′ biotin modification, and an RNA that includes the 5p flank and the hairpin (Figure 2A). This substrate would be crosslinked to candidate proteins in the Microprocessor-containing lysate described in Chapter 2, and covalently-linked protein-RNA complexes purified by binding to streptavidincoated beads (Figure 2B). As in the classic protocol, RNase T1 or another suitable RNase would be used to digest away the RNA substrate; in the proposed protocol, RNase digestion would additionally serve to elute the protein-RNA complexes from the streptavidin-coated beads. The peptides in the eluted protein-RNA complex would then be identified by mass spectrometry. Nearly all pri-miRNA motifs should be amenable to this crosslinking approach. However, one potential pitfall is that, for certain motifs, bound proteins may not be efficiently crosslinked; the 4-thiouridine moiety can only be crosslinked to peptide moieties within a few angstroms (Sontheimer, 1994). This spatial specificity is advantageous for identifying only those proteins which are directly bound to the motif. However, for some direct binding interactions, it is possible that the thio group may be too distant from a reactive amino acid to crosslink efficiently. An alternative strategy could use 5′-[(4-azidophenacyl)thio]-uridine, a photoreactive uridine analog whose reactive moiety is about 10 angstroms distant from the base, and is thus more likely to react with nearby amino acids (Hanna, 1989). Additionally, one could attempt to co-precipitate the bound protein with biotinylated pri-miRNA without crosslinking. Binding of biotinylated, wildtype pri-miRNA molecules would be carried out in the presence of competing 133 A Splint ligation s s p CUUC Bio GApCUUC Bio p B Crosslinking ? s s Bio GApCUUC Bio Binding to solid support ? s GApCUUC Bio Streptavidin-coated bead GApCUUC Aggressive washing Elution by RNAse T1 Elution by RNAse T1 ? s GApCUUC Peptide identification by mass spectrometry Bio pri-miRNA molecules in which the motif has been mutated. This approach would selectively coprecipitate proteins whose binding depends on the motif; these proteins can then be identified by either mass spectrometry or Western blot. In any of these approaches, the contributions of the candidate proteins should be delineated by in vitro and in vivo processing assays after loss-offunction or gain-of-function of the candidate proteins. The identification of proteins that recognize pri-miRNA processing motifs has implications for the regulation and dysregulation of miRNA processing in physiology and pathology. As discussed above, processing motifs may be mutated in cancer, as the CNNC motif is mutated in human chronic lymphoid leukemia. The proteins that recognize these motifs may themselves be dysregulated in cancer, and may be possible to infer the nature of this dysregulation based on the regulated miRNA (tumor-suppressor or oncogene) and the type of motif (promoting or inhibiting Microprocessor cleavage). For example, proteins that enhance processing of a tumor-suppressor miRNA may themselves be tumor suppressors. One instance of this is found in the biogenesis of let-7, a family of tumor-suppressor miRNAs that target key oncogenes such as c-MYC, and HMGA2 (Mayr et al., 2007; Chang et al., 2008). Lin-28A protein inhibits Microprocessor cleavage of let-7 family pri-miRNAs and thus reduces mature levels of this key tumor-suppresor miRNA (Piskounova et al., 2011). Correspondingly, both isoforms of Lin-28 are upregulated in cancer samples, particularly in samples from advanced malignancy (Viswanathan et al., 2009). It could be possible to characterize the functional roles of identified proteins by analyzing analyze publicly-available databases on cancer gene expression, mutations, clinical course, and drug sensitivity (Cowin et al., 2010). For example, candidate proteins could be cross-referenced against protein-coding genes known to be mutated in cancer (Futreal et al., 2004). Data on the Figure 2. Proposed strategy for identifying proteins that bind identified pri-miRNA sequence determinants. Identification of CNNC-binding proteins is shown as an example. (A) Assembly of a biotinylated pri-miRNA substrate containing a 4-thiouridine moiety specifically incorporated in the CNNC motif. Three synthetic RNAs would be assembled into a pri-miRNA 32 substrate by splint ligation. The red “p” denotes a P phosphate. (B) Purification of crosslinked RNA-protein complexes. The assembled pri-miRNA would be incubated in whole-cell lysate. Proteins that bind in close proximity to the 4-thiouridine moiety would be crosslinked to the biotinylated pri-miRNA by 365 nm ultraviolet light. Crosslinked complexes would be purified by streptavidin bead binding with aggressive washing, and released by RNase T1 digestion of the pri-miRNA. Protein components of the complex would be identified by massspectrometry. 135 NCI-60 cancer cell lines would be particularly appealing for analysis, since many types of data are available for all 60 lines, including protein-coding gene expression data (Ross et al., 2000; Liu et al., 2010) and miRNA expression data (Blower et al., 2007; Liu et al., 2010; Sokilde et al., 2011). For proteins that have not been characterized in the literature, gene expression clustering analysis may help reveal the molecular pathways that impinge on the candidate proteins and thus on miRNA biogenesis. Identifying the proteins that bind processing motifs and characterizing their effects on miRNA biogenesis would open new avenues of investigation and may lead to opportunities for the development of diagnostic tools or novel therapeutics. At a basic level, identifying proteins that bind pri-miRNAs and regulate their biogenesis would expand the catalog of proteins comprise the Microprocessor complex, and shed light on the mechanisms by which the activity of the complex can be regulated. In addition, since little is known of the structure of the Microprocessor, determination of which proteins bind to various parts of the pri-miRNA would yield a schematic of the three-dimensional structure of the Microprocessor, albeit at low spatial resolution. From a regulatory pathway standpoint, classification of previously uncharacterized proteins as miRNA regulators may give insight into novel oncogenic or tumor suppressing regulatory pathways. On an epidemiological level, the expression levels of identified proteins may be useful as biomarkers for tumor progression and prognosis, as is the case for Drosha, Dicer, and Lin-28 (Merritt et al., 2008; Viswanathan et al., 2009). To the extent that identified proteins are mutated in cancer, the proposed work may also aid the interpretation of loci identified in cancer genome efforts. On a therapeutic level, these proteins may also be novel drug targets for regulating miRNA levels. Applying selection approaches to stubborn questions in RNA biology The availability of hundreds of organismal genomes has made it possible to identify and characterize functional elements, including those in functional RNAs, simply by computationally analyzing sequence data. The power of comparative sequence analysis is illustrated by some of the original work to identify Dicer: knowing that the precursors of small interfering RNAs were double-stranded and that RNase III enzymes in other organisms cleaved RNA duplexes, investigators simply searched the D. melanogaster and C. elegans genomes for sequences that 136 would code for proteins resembling RNase III (Bernstein et al., 2001). Computational analysis can even provide powerful evidence for biological importance when actual function is unknown, since conservation of nucleic acid or protein sequence over a long evolutionary timespan is ipso facto evidence for purifying selection. However, computational analysis of genomic sequence data is limited when functional sequence elements are degenerate, as motifs for RNA–protein binding tend to be. The study of pri-miRNA determinants described in Chapter 2 was situated just beyond the limits of computational analysis, given the relatively unsuccessful efforts to find determinants by comparing individual pri-miRNAs to their orthologs, or by comparing pri-miRNAs to other primiRNAs in the same species. In fact, the determinants identified by in vitro selection are composed of just two or three informative nucleotides, which are generally too short to be found by multiple-sequence alignment or de novo motif discovery algorithms. In particular, discovery of the CNNC motif by pure computational analysis was probably complicated because it is a “spaced” motif, where two informative nucleotides are separated by less informative nucleotides, and the CNNC position can vary across a narrow range of positions relative to the Drosha cleavage site. Even when subtle enrichment signals were detected in pure sequence analysis, it was often difficult to interpret their significance, since the sequence data was not intrinsically rooted in pri-miRNA processing. These difficulties were overcome by generating hundreds of billions of sequence variants, selecting in vitro for those that were functional in a specific biochemical context, and coupling the in vitro selection to high-throughput sequencing. This approach is not limited to the identification of motifs in pri-miRNAs; it can be applied to the study of any functional nucleic acid, and is uniquely suited to study RNA species which have not been amenable to computational analysis either because of extensive divergence that impairs evolutionary conservation analysis, or because the elements to be found are too subtle to be detected above noise by computational algorithms. application is the study of long noncoding RNAs (lncRNAs). One potentially fertile These RNAs are important regulators of gene expression, but studies of lncRNAs have been hampered by diversity in both sequence and regulatory mechanism (Wang and Chang, 2011). When blocks of sequence conservation can be found, the sequences have been shown to be important for the function of the lncRNA and for normal vertebrate development (Ulitsky et al., 2011). 137 A Incubation with immunopurified PRC2 Nitrocellulose filtration Pool of partially-randomized lncRNA variants High-throughput sequencing ?? Pool of selected lncRNA variants B Novel motifs that define PRC2 binding High-throughput plasmid sequencing Poly-A signals Barcode Reporter ORF Database mapping barcodes to lncRNA variant sequences Enhancer-like lncRNA A AAAA In vivo expression of reporter mRNA AAA A AAA AAAA A AAA High-throughput barcode sequencing AAAA Plasmid pool of partially-randomized lncRNA variants A AAA AAAA ?? Novel motifs that promote lncRNA function C Model ORF Putative zipcode region Soma compartment In vivo expression of model mRNA Poly-A signal Plasmid pool of partially-randomized 3’ UTR variants High-throughput sequencing NGF-containing axon compartment A AAAA AAA A ?? AAAA AAA AAAA Novel motifs that regulate mRNA localization However, the vast majority of lncRNAs are poorly conserved at the sequence level, even when the presence of a noncoding transcript is conserved in syntenic regions of the genome in other organisms (Wang and Chang, 2011). In vitro and in vivo selection techniques, coupled with high throughput sequencing, could rapidly identify and characterize the functional elements in such lncRNAs. For example, many lncRNAs are probably molecular scaffolds that assist in the assembly of chromatin-regulation complexes in specific genomic regions. The best-studied example of this is the recruitment of polycomb repressive complex 2 (PRC2) to the HOXD locus by the lncRNA HOTAIR (Rinn et al., 2007). In fact, up to 20% of the annotated lncRNAs in mammalian genomes may bind PRC2, although some of these may be interacting nonspecifically (Khalil et al., 2009; Guttman et al., 2011). The region of HOTAIR that binds PRC2 components has been isolated to a 300 nt region that is thought to have some secondary structure, but the importance of this structure or any potential primary sequence elements is unclear (Tsai et al., 2010). A relatively straightforward approach to studying this region would be to partially randomize the 300 nt region and select for variants that retain binding to immunopurified PRC2 complexes, followed by sequencing of those variants (Figure 3A). This approach resembles that first used to define the HIV Rev-binding site (Bartel et al., 1991). Any motifs that are found in HOTAIR might directly translate to some of the hundreds of other lncRNAs that are thought to bind PRC2, thus defining a functional domain in many lncRNAs with a single experiment. Another set of lncRNAs that could be studied by in vivo selection and sequencing are the enhancer-like lncRNAs that promote the expression of nearby protein-coding genes, including a Figure 3. Possible selection strategies to study three functional RNAs. (A) Scheme for identifying motifs that specify binding substrates of PRC2. Variants of lncRNAs that bind the immunopurified PRC2 complex would be isolated by nitrocellulose filtration and sequenced. (B) Scheme for mapping functional elements in enhancer-like lncRNAs. The enhancer lncRNA sequences would be partially randomized and cloned into an expression plasmid with a nearby reporter mRNA containing a random barcode. An initial sequencing run would associate the sequence of the enhancer-lncRNA variants with the reporter barcode. Successful variants would promote the transcription of the reporter mRNA, including the barcode, and would be identified by sequencing the barcode. (C) Scheme for identifying and characterizing zipcode motifs that promote the localization of mRNAs to axon terminals. Variant 3′ UTR sequences corresponding to a putative zipcode region would be cloned into a plasmid downstream of a model localized protein-coding sequence. Neurons expressing these variants would be grown in a compartmentalized chamber that permits easy separation of axons from the neuronal soma. Successful variants would be isolated from the axon compartment and sequenced. 139 set of transcription factors that are master regulators of hematopoiesis (Orom et al., 2010). The enhancement function of these lncRNAs is dependent on the lncRNA molecule, as opposed to the act of transcription, but the important elements in the lncRNA are unclear. Since the enhancing effects of the lncRNAs can be recapitulated in a plasmid-based reporter gene system, the lncRNAs could be partially randomized and cloned into a reporter plasmid, and functional variants could be selected on the basis of reporter expression. The randomization, selection, and sequencing would be carried out following a strategy developed to study DNA enhancer elements (Patwardhan et al., 2012). This strategy uses a two-step approach to selection and sequencing. In the first step, the plasmid pool is sequenced to determine the sequence of the partially randomized region, which is correlated to a barcode sequence located in the reporter transcript. In the second step, plasmids are transfected into cells, where functional DNA enhancer elements drive the transcription of the reporter mRNA, including the barcode. These barcoded mRNAs are reverse transcribed and sequenced to determine which variants successfully promoted transcription. For enhancer-like lncRNAs, the strategy would be identical, except the partially-randomized DNA enhancer sequence would be replaced by the partially-randomized lncRNA sequence (Figure 3B). Another potentially fertile application of selection and high-throughput sequencing is the study of “zipcode” elements that specify the localization of mRNAs. Dynamic mRNA transport is an important component of many biological processes, such as the formation of spatial gradients in developmental syncytia and polarized cells, local regulation at the leading edge of migrating cells, and the specification of cell identity in daughter cells after mitosis (Meignin and Davis, 2010). The regulated localization of mRNAs is particularly important in neurons, where transported mRNAs play important roles in the growth and migration of new dendrites and axons, and in the maintenance of functional synaptic terminals (Jung et al., 2012). The bestcharacterized zipcode is a bipartite element in the 3′ UTR of the β-actin mRNA, consisting of two 6 nt motifs separated by a 15 nt spacer; this spacer is thought to allow the zipcode region to loop around the zipcode-binding protein (ZBP1) so that the two motifs can be simultaneously recognized by a pair of KH domains (Patel et al., 2012). However, this zipcode is present in just a fraction of synapse-localized mRNAs; for the rest, neither the zipcodes nor the proteins that recognize them are known. For example, the mRNA of the chaperone protein calreticulin is localized to axon terminals, and two 100 nt conserved regions in the 3′ UTR are individually 140 sufficient to drive mRNA transport, but the actual motifs and the proteins that bind them are unknown (Vuppalanchi et al., 2010). Since the 3′ UTR is sufficient to drive mRNA transport, a relatively simple selection approach could be undertaken where a pool of variant 3′ UTRs would be cloned into an expression plasmid downstream of a reporter mRNA; the plasmid pool would be transfected into neurons grown in a compartmentalized culture system that permits easy separation of axon terminals from the neuron soma (Campenot et al., 2009). Successful 3′ UTR variants would be transported into the axon terminals, collected by the compartmentalized culture system, amplified and sequenced (Figure 3C). Once the specific zipcode motifs have been delineated, the proteins that bind the zipcode and recruit the mRNA to transport granules would be identified by candidate testing or by site-specific crosslinking, as described above. These three applications illustrate the diversity of questions that could be addressed by the careful design of a selection experiment, coupled to a high-throughput sequencing of the selected variants. This type of strategy is becoming increasingly attractive as the cost of sequencing falls, and the capacity increases to generate long, high-quality sequences. More broadly, prospects are excellent for approaches that combine experimental savvy with the power of computational analysis to crack stubborn but critical problems in biology. 141 Bibliography and References Cited Aboobaker, A., and Blaxter, M. (2003). Hox gene evolution in nematodes: novelty conserved. Curr Opin Genet Dev 13, 593-598. Aguinaldo, A.M., Turbeville, J.M., Linford, L.S., Rivera, M.C., Garey, J.R., Raff, R.A., and Lake, J.A. (1997). Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387, 489-493. Aigner, A. (2011). MicroRNAs (miRNAs) in cancer invasion and metastasis: therapeutic approaches based on metastasis-related miRNAs. J Mol Med (Berl) 89, 445-457. Akey, D.L., and Berger, J.M. (2005). Structure of the nuclease domain of ribonuclease III from M. tuberculosis at 2.1 A. Protein Sci 14, 2744-2750. Altuvia, S., Locker-Giladi, H., Koby, S., Ben-Nun, O., and Oppenheim, A.B. (1987). RNase III stimulates the translation of the cIII gene of bacteriophage lambda. Proc Natl Acad Sci U S A 84, 6511-6515. Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., et al. (2003). A uniform system for microRNA annotation. RNA 9, 277-279. Arvey, A., Larsson, E., Sander, C., Leslie, C.S., and Marks, D.S. (2010). Target mRNA abundance dilutes microRNA and siRNA activity. Mol Syst Biol 6, 363. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-29. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008). Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicerdependent small RNAs. Genes Dev 22, 2773-2785. Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The impact of microRNAs on protein output. Nature 455, 64-71. Bar, M., Wyman, S.K., Fritz, B.R., Qi, J., Garg, K.S., Parkin, R.K., Kroh, E.M., Bendoraite, A., Mitchell, P.S., Nelson, A.M., et al. (2008). MicroRNA discovery and profiling in human embryonic stem cells by deep sequencing of small RNA libraries. Stem Cells 26, 2496-2505. Bardwell, J.C., Regnier, P., Chen, S.M., Nakamura, Y., Grunberg-Manago, M., and Court, D.L. (1989). Autoregulation of RNase III operon by mRNA processing. EMBO J 8, 3401-3407. Barry, G., Squires, C., and Squires, C.L. (1980). Attenuation and processing of RNA from the rplJL--rpoBC transcription unit of Escherichia coli. Proc Natl Acad Sci U S A 77, 33313335. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281297. Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233. Bartel, D.P., Zapp, M.L., Green, M.R., and Szostak, J.W. (1991). HIV-1 Rev regulation involves recognition of non-Watson-Crick base pairs in viral RNA. Cell 67, 529-536. Bass, B.L. (2000). Double-stranded RNA as a template for gene silencing. Cell 101, 235-238. Bazzini, A.A., Lee, M.T., and Giraldez, A.J. (2012). Ribosome profiling shows that miR-430 reduces translation before causing mRNA decay in zebrafish. Science 336, 233-237. Beltrame, M., and Tollervey, D. (1995). Base pairing between U3 and the pre-ribosomal RNA is required for 18S rRNA synthesis. EMBO J 14, 4350-4356. 142 Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., Einat, P., Einav, U., Meiri, E., et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37, 766-770. Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J., Verloop, R., van de Wetering, M., Guryev, V., Takada, S., et al. (2006). Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 16, 1289-1298. Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. (2001). Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409, 363-366. Bernstein, E., Kim, S.Y., Carmell, M.A., Murchison, E.P., Alcorn, H., Li, M.Z., Mills, A.A., Elledge, S.J., Anderson, K.V., and Hannon, G.J. (2003). Dicer is essential for mouse development. Nat Genet 35, 215-217. Blow, M.J., Grocock, R.J., van Dongen, S., Enright, A.J., Dicks, E., Futreal, P.A., Wooster, R., and Stratton, M.R. (2006). RNA editing of human microRNAs. Genome Biol 7, R27. Blower, P.E., Verducci, J.S., Lin, S., Zhou, J., Chung, J.H., Dai, Z., Liu, C.G., Reinhold, W., Lorenzi, P.L., Kaldjian, E.P., et al. (2007). MicroRNA expression profiles for the NCI-60 cancer cell panel. Mol Cancer Ther 6, 1483-1491. Blumenthal, T. (2005). Trans-splicing and operons. WormBook, 1-9. Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO J 17, 170-180. Bohnsack, M.T., Czaplinski, K., and Gorlich, D. (2004). Exportin 5 is a RanGTP-dependent dsRNA-binding protein that mediates nuclear export of pre-miRNAs. RNA 10, 185-191. Bohnsack, M.T., Regener, K., Schwappach, B., Saffrich, R., Paraskeva, E., Hartmann, E., and Gorlich, D. (2002). Exp5 exports eEF1A via tRNA from nuclei and synergizes with other transport pathways to confine translation to the cytoplasm. EMBO J 21, 6205-6215. Bracht, J., Hunter, S., Eachus, R., Weeks, P., and Pasquinelli, A.E. (2004). Trans-splicing and polyadenylation of let-7 microRNA primary transcripts. RNA 10, 1586-1594. Bram, R.J., Young, R.A., and Steitz, J.A. (1980). The ribonuclease III site flanking 23S sequences in the 30S ribosomal precursor RNA of E. coli. Cell 19, 393-401. Breaker, R.R., Banerji, A., and Joyce, G.F. (1994). Continuous in vitro evolution of bacteriophage RNA polymerase promoters. Biochemistry 33, 11980-11986. Brownawell, A.M., and Macara, I.G. (2002). Exportin-5, a novel karyopherin, mediates nuclear export of double-stranded RNA binding proteins. J Cell Biol 156, 53-64. Brummelkamp, T.R., Bernards, R., and Agami, R. (2002). A system for stable expression of short interfering RNAs in mammalian cells. Science 296, 550-553. Butcher, S.E., Dieckmann, T., and Feigon, J. (1997). Solution structure of the conserved 16 Slike ribosomal RNA UGAA tetraloop. J Mol Biol 268, 348-358. Cahill, N.M., Friend, K., Speckmann, W., Li, Z.H., Terns, R.M., Terns, M.P., and Steitz, J.A. (2002). Site-specific cross-linking analyses reveal an asymmetric protein distribution for a box C/D snoRNP. EMBO J 21, 3816-3828. Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. (2007). RNA sequence analysis defines Dicer's role in mouse embryonic stem cells. Proc Natl Acad Sci U S A 104, 18097-18102. Calado, A., Treichel, N., Muller, E.C., Otto, A., and Kutay, U. (2002). Exportin-5-mediated nuclear export of eukaryotic elongation factor 1A and tRNA. EMBO J 21, 6216-6224. Calin, G.A., Dumitru, C.D., Shimizu, M., Bichi, R., Zupo, S., Noch, E., Aldler, H., Rattan, S., Keating, M., Rai, K., et al. (2002). Frequent deletions and down-regulation of micro- RNA 143 genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A 99, 15524-15529. Calin, G.A., Ferracin, M., Cimmino, A., Di Leva, G., Shimizu, M., Wojcik, S.E., Iorio, M.V., Visone, R., Sever, N.I., Fabbri, M., et al. (2005). A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 353, 1793-1801. Calin, G.A., Liu, C.G., Sevignani, C., Ferracin, M., Felli, N., Dumitru, C.D., Shimizu, M., Cimmino, A., Zupo, S., Dono, M., et al. (2004a). MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias. Proc Natl Acad Sci U S A 101, 1175511760. Calin, G.A., Sevignani, C., Dumitru, C.D., Hyslop, T., Noch, E., Yendamuri, S., Shimizu, M., Rattan, S., Bullrich, F., Negrini, M., et al. (2004b). Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc Natl Acad Sci U S A 101, 2999-3004. Campenot, R.B., Lund, K., and Mok, S.A. (2009). Production of compartmented cultures of rat sympathetic neurons. Nat Protoc 4, 1869-1887. Caudy, A.A., Myers, M., Hannon, G.J., and Hammond, S.M. (2002). Fragile X-related protein and VIG associate with the RNA interference machinery. Genes Dev 16, 2491-2496. Cenik, E.S., Fukunaga, R., Lu, G., Dutcher, R., Wang, Y., Tanaka Hall, T.M., and Zamore, P.D. (2011). Phosphate and R2D2 restrict the substrate specificity of Dicer-2, an ATP-driven ribonuclease. Mol Cell 42, 172-184. Chakraborty, S., Mehtab, S., Patwardhan, A., and Krishnan, Y. (2012). Pri-miR-17-92a transcript folds into a tertiary structure and autoregulates its processing. RNA 18, 1014-1028. Chakravarthy, S., Sternberg, S.H., Kellenberger, C.A., and Doudna, J.A. (2010). Substratespecific kinetics of Dicer-catalyzed RNA processing. J Mol Biol 404, 392-402. Chanfreau, G., Buckle, M., and Jacquier, A. (2000). Recognition of a conserved class of RNA tetraloops by Saccharomyces cerevisiae RNase III. Proc Natl Acad Sci U S A 97, 3142-3147. Chanfreau, G., Legrain, P., and Jacquier, A. (1998). Yeast RNase III as a key processing enzyme in small nucleolar RNAs metabolism. J Mol Biol 284, 975-988. Chang, T.C., Yu, D., Lee, Y.S., Wentzel, E.A., Arking, D.E., West, K.M., Dang, C.V., ThomasTikhonenko, A., and Mendell, J.T. (2008). Widespread microRNA repression by Myc contributes to tumorigenesis. Nat Genet 40, 43-50. Chatterjee, S., and Grosshans, H. (2009). Active turnover modulates mature microRNA activity in Caenorhabditis elegans. Nature 461, 546-549. Chelladurai, B.S., Li, H., and Nicholson, A.W. (1991). A conserved sequence element in ribonuclease III processing signals is not required for accurate in vitro enzymatic cleavage. Nucleic Acids Res 19, 1759-1766. Cheloufi, S., Dos Santos, C.O., Chong, M.M., and Hannon, G.J. (2010). A dicer-independent miRNA biogenesis pathway that requires Ago catalysis. Nature 465, 584-589. Chen, C.Z., Li, L., Lodish, H.F., and Bartel, D.P. (2004). MicroRNAs modulate hematopoietic lineage differentiation. Science 303, 83-86. Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura, K., and Shiekhattar, R. (2005). TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing. Nature 436, 740-744. Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 24, 992-1009. 144 Chong, M.M., Zhang, G., Cheloufi, S., Neubert, T.A., Hannon, G.J., and Littman, D.R. (2010). Canonical and alternate functions of the microRNA biogenesis machinery. Genes Dev 24, 1951-1960. Chung, W.J., Agius, P., Westholm, J.O., Chen, M., Okamura, K., Robine, N., Leslie, C.S., and Lai, E.C. (2011). Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans. Genome Res 21, 286-300. Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E., Mane, S., Hannon, G.J., Lawson, N.D., et al. (2010). A novel miRNA processing pathway independent of Dicer requires Argonaute2 catalytic activity. Science 328, 1694-1698. Cimmino, A., Calin, G.A., Fabbri, M., Iorio, M.V., Ferracin, M., Shimizu, M., Wojcik, S.E., Aqeilan, R.I., Zupo, S., Dono, M., et al. (2005). miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc Natl Acad Sci U S A 102, 13944-13949. Costinean, S., Zanesi, N., Pekarsky, Y., Tili, E., Volinia, S., Heerema, N., and Croce, C.M. (2006). Pre-B cell proliferation and lymphoblastic leukemia/high-grade lymphoma in E(mu)miR155 transgenic mice. Proc Natl Acad Sci U S A 103, 7024-7029. Cowin, P.A., Anglesio, M., Etemadmoghadam, D., and Bowtell, D.D. (2010). Profiling the cancer genome. Annu Rev Genomics Hum Genet 11, 133-159. Czech, B., Malone, C.D., Zhou, R., Stark, A., Schlingeheyde, C., Dus, M., Perrimon, N., Kellis, M., Wohlschlegel, J.A., Sachidanandam, R., et al. (2008). An endogenous small interfering RNA pathway in Drosophila. Nature 453, 798-802. Czech, B., Zhou, R., Erlich, Y., Brennecke, J., Binari, R., Villalta, C., Gordon, A., Perrimon, N., and Hannon, G.J. (2009). Hierarchical rules for Argonaute loading in Drosophila. Mol Cell 36, 445-456. Daniels, D.L., Subbarao, M.N., Blattner, F.R., and Lozeron, H.A. (1988). Q-mediated late gene transcription of bacteriophage lambda: RNA start point and RNase III processing sites in vivo. Virology 167, 568-577. Davis, B.N., Hilyard, A.C., Lagna, G., and Hata, A. (2008). SMAD proteins control DROSHAmediated microRNA maturation. Nature 454, 56-61. Davis, B.N., Hilyard, A.C., Nguyen, P.H., Lagna, G., and Hata, A. (2010). Smad proteins bind a conserved RNA sequence to promote microRNA maturation by Drosha. Mol Cell 39, 373384. Denli, A.M., Tops, B.B., Plasterk, R.H., Ketting, R.F., and Hannon, G.J. (2004). Processing of primary microRNAs by the Microprocessor complex. Nature 432, 231-235. Doench, J.G., and Sharp, P.A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev 18, 504-511. Dong, Z., Han, M.H., and Fedoroff, N. (2008). The RNA-binding proteins HYL1 and SE promote accurate in vitro processing of pri-miRNA by DCL1. Proc Natl Acad Sci U S A 105, 9970-9975. Dreyfuss, G., Matunis, M.J., Pinol-Roma, S., and Burd, C.G. (1993). hnRNP proteins and the biogenesis of mRNA. Annu Rev Biochem 62, 289-321. Duan, R., Pak, C., and Jin, P. (2007). Single nucleotide polymorphism associated with mature miR-125a alters the processing of pri-miRNA. Hum Mol Genet 16, 1124-1131. Dunn, J.J., and Studier, F.W. (1973a). T7 early RNAs and Escherichia coli ribosomal RNAs are cut from large precursor RNAs in vivo by ribonuclease 3. Proc Natl Acad Sci U S A 70, 3296-3300. 145 Dunn, J.J., and Studier, F.W. (1973b). T7 early RNAs are generated by site-specific cleavages. Proc Natl Acad Sci U S A 70, 1559-1563. Eis, P.S., Tam, W., Sun, L., Chadburn, A., Li, Z., Gomez, M.F., Lund, E., and Dahlberg, J.E. (2005). Accumulation of miR-155 and BIC RNA in human B cell lymphomas. Proc Natl Acad Sci U S A 102, 3627-3632. Elbashir, S.M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., and Tuschl, T. (2001a). Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411, 494-498. Elbashir, S.M., Lendeckel, W., and Tuschl, T. (2001b). RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev 15, 188-200. Elela, S.A., Igel, H., and Ares, M., Jr. (1996). RNase III cleaves eukaryotic preribosomal RNA at a U3 snoRNP-dependent site. Cell 85, 115-124. Ellington, A.D., and Szostak, J.W. (1990). In vitro selection of RNA molecules that bind specific ligands. Nature 346, 818-822. Esquela-Kerscher, A., Trang, P., Wiggins, J.F., Patrawala, L., Cheng, A., Ford, L., Weidhaas, J.B., Brown, D., Bader, A.G., and Slack, F.J. (2008). The let-7 microRNA reduces tumor growth in mouse models of lung cancer. Cell Cycle 7, 759-764. Fagard, M., Boutet, S., Morel, J.B., Bellini, C., and Vaucheret, H. (2000). AGO1, QDE-2, and RDE-1 are related proteins required for post-transcriptional gene silencing in plants, quelling in fungi, and RNA interference in animals. Proc Natl Acad Sci U S A 97, 11650-11654. Faller, M., Toso, D., Matsunaga, M., Atanasov, I., Senturia, R., Chen, Y., Zhou, Z.H., and Guo, F. (2010). DGCR8 recognizes primary transcripts of microRNAs through highly cooperative binding and formation of higher-order structures. RNA 16, 1570-1583. Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821. Fellmann, C., Zuber, J., McJunkin, K., Chang, K., Malone, C.D., Dickins, R.A., Xu, Q., Hengartner, M.O., Elledge, S.J., Hannon, G.J., et al. (2011). Functional identification of optimized RNAi triggers using a massively parallel sensor assay. Mol Cell 41, 733-746. Feng, Y., Zhang, X., Song, Q., Li, T., and Zeng, Y. (2011). Drosha processing controls the specificity and efficiency of global microRNA expression. Biochim Biophys Acta 1809, 700707. Filippov, V., Solovyev, V., Filippova, M., and Gill, S.S. (2000). A novel type of RNase III family proteins in eukaryotes. Gene 245, 213-221. Fire, A., Albertson, D., Harrison, S.W., and Moerman, D.G. (1991). Production of antisense RNA leads to effective and specific inhibition of gene expression in C. elegans muscle. Development 113, 503-514. Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mello, C.C. (1998). Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391, 806-811. Flynt, A.S., Greimann, J.C., Chung, W.J., Lima, C.D., and Lai, E.C. (2010). MicroRNA biogenesis via splicing and exosome-mediated trimming in Drosophila. Mol Cell 38, 900907. Forstemann, K., Horwich, M.D., Wee, L., Tomari, Y., and Zamore, P.D. (2007). Drosophila microRNAs are sorted into functionally distinct argonaute complexes after production by dicer-1. Cell 130, 287-297. 146 Forstemann, K., Tomari, Y., Du, T., Vagin, V.V., Denli, A.M., Bratu, D.P., Klattenhoff, C., Theurkauf, W.E., and Zamore, P.D. (2005). Normal microRNA maturation and germ-line stem cell maintenance requires Loquacious, a double-stranded RNA-binding domain protein. PLoS Biol 3, e236. Frank, F., Sonenberg, N., and Nagar, B. (2010). Structural basis for 5'-nucleotide base-specific recognition of guide RNA by human AGO2. Nature 465, 818-822. Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92-105. Fujita, P.A., Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., et al. (2011). The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39, D876-882. Fukuda, T., Yamagata, K., Fujiyama, S., Matsumoto, T., Koshida, I., Yoshimura, K., Mihara, M., Naitou, M., Endoh, H., Nakamura, T., et al. (2007). DEAD-box RNA helicase subunits of the Drosha complex are required for processing of rRNA and a subset of microRNAs. Nat Cell Biol 9, 604-611. Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N., and Stratton, M.R. (2004). A census of human cancer genes. Nat Rev Cancer 4, 177-183. Gan, J., Shaw, G., Tropea, J.E., Waugh, D.S., Court, D.L., and Ji, X. (2008). A stepwise model for double-stranded RNA processing by ribonuclease III. Mol Microbiol 67, 143-154. Gan, J., Tropea, J.E., Austin, B.P., Court, D.L., Waugh, D.S., and Ji, X. (2006). Structural insight into the mechanism of double-stranded RNA processing by ribonuclease III. Cell 124, 355366. Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P. (2011). Weak seedpairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18, 1139-1146. Garzon, R., Calin, G.A., and Croce, C.M. (2009). MicroRNAs in Cancer. Annu Rev Med 60, 167-179. Ghildiyal, M., Xu, J., Seitz, H., Weng, Z., and Zamore, P.D. (2010). Sorting of Drosophila small silencing RNAs partitions microRNA* strands into the RNA interference pathway. RNA 16, 43-56. Gottwein, E., Cai, X., and Cullen, B.R. (2006). A novel assay for viral microRNA function identifies a single nucleotide polymorphism that affects Drosha processing. J Virol 80, 53215326. Grad, Y., Aach, J., Hayes, G.D., Reinhart, B.J., Church, G.M., Ruvkun, G., and Kim, J. (2003). Computational and experimental identification of C. elegans microRNAs. Mol Cell 11, 12531263. Greenbaum, N.L., Radhakrishnan, I., Patel, D.J., and Hirsh, D. (1996). Solution structure of the donor site of a trans-splicing RNA. Structure 4, 725-733. Gregory, R.I., Yan, K.P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and Shiekhattar, R. (2004). The Microprocessor complex mediates the genesis of microRNAs. Nature 432, 235-240. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. (2006). miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140-144. 147 Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105. Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and evolution of microRNAs and Piwiinteracting RNAs in animals. Nature 455, 1193-1197. Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34. Guil, S., and Caceres, J.F. (2007). The multifunctional RNA-binding protein hnRNP A1 is required for processing of miR-18a. Nat Struct Mol Biol 14, 591-596. Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466, 835-840. Guo, S., and Kemphues, K.J. (1995). par-1, a gene required for establishing polarity in C. elegans embryos, encodes a putative Ser/Thr kinase that is asymmetrically distributed. Cell 81, 611-620. Guttman, M., Donaghey, J., Carey, B.W., Garber, M., Grenier, J.K., Munson, G., Young, G., Lucas, A.B., Ach, R., Bruhn, L., et al. (2011). lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295-300. Gwizdek, C., Ossareh-Nazari, B., Brownawell, A.M., Doglio, A., Bertrand, E., Macara, I.G., and Dargemont, C. (2003). Exportin-5 mediates nuclear export of minihelix-containing RNAs. J Biol Chem 278, 5505-5508. Haase, A.D., Jaskiewicz, L., Zhang, H., Laine, S., Sack, R., Gatignol, A., and Filipowicz, W. (2005). TRBP, a regulator of cellular PKR and HIV-1 virus expression, interacts with Dicer and functions in RNA silencing. EMBO Rep 6, 961-967. Hagan, J.P., Piskounova, E., and Gregory, R.I. (2009). Lin28 recruits the TUTase Zcchc11 to inhibit let-7 maturation in mouse embryonic stem cells. Nat Struct Mol Biol 16, 1021-1025. Hamilton, A.J., and Baulcombe, D.C. (1999). A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286, 950-952. Hammond, S.M., Bernstein, E., Beach, D., and Hannon, G.J. (2000). An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293-296. Hammond, S.M., Boettcher, S., Caudy, A.A., Kobayashi, R., and Hannon, G.J. (2001). Argonaute2, a link between genetic and biochemical analyses of RNAi. Science 293, 11461150. Han, B.W., Hung, J.H., Weng, Z., Zamore, P.D., and Ameres, S.L. (2011). The 3'-to-5' Exoribonuclease Nibbler Shapes the 3' Ends of MicroRNAs Bound to Drosophila Argonaute1. Curr Biol. Han, J., Lee, Y., Yeom, K.H., Kim, Y.K., Jin, H., and Kim, V.N. (2004). The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18, 3016-3027. Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y., Zhang, B.T., and Kim, V.N. (2006). Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887-901. Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.K., Yeom, K.H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. (2009). Posttranscriptional crossregulation between Drosha and DGCR8. Cell 136, 75-84. 148 Hanna, M.M. (1989). Photoaffinity cross-linking methods for studying RNA-protein interactions. Methods Enzymol 180, 383-409. Hannon, G.J., Maroney, P.A., Denker, J.A., and Nilsen, T.W. (1990). Trans splicing of nematode pre-messenger RNA in vitro. Cell 61, 1247-1255. Hartig, J.V., Esslinger, S., Bottcher, R., Saito, K., and Forstemann, K. (2009). Endo-siRNAs depend on a new isoform of loquacious and target artificially introduced, high-copy sequences. EMBO J 28, 2932-2944. Hata, A., and Davis, B.N. (2011). Regulation of pri-miRNA Processing Through Smads. Adv Exp Med Biol 700, 15-27. Hendrickson, D.G., Hogan, D.J., McCullough, H.L., Myers, J.W., Herschlag, D., Ferrell, J.E., and Brown, P.O. (2009). Concordant regulation of translation and mRNA abundance for hundreds of targets of a human microRNA. PLoS Biol 7, e1000238. Heo, I., Joo, C., Cho, J., Ha, M., Han, J., and Kim, V.N. (2008). Lin28 mediates the terminal uridylation of let-7 precursor MicroRNA. Mol Cell 32, 276-284. Heo, I., Joo, C., Kim, Y.K., Ha, M., Yoon, M.J., Cho, J., Yeom, K.H., Han, J., and Kim, V.N. (2009). TUT4 in concert with Lin28 suppresses microRNA biogenesis through premicroRNA uridylation. Cell 138, 696-708. Hofacker, I.L., and Stadler, P.F. (2006). Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics 22, 1172-1176. Holterman, M., van der Wurff, A., van den Elsen, S., van Megen, H., Bongers, T., Holovachov, O., Bakker, J., and Helder, J. (2006). Phylum-wide analysis of SSU rDNA reveals deep phylogenetic relationships among nematodes and accelerated evolution toward crown Clades. Mol Biol Evol 23, 1792-1800. Hughes, J.A., Brown, L.R., and Ferro, A.J. (1987). Nucleotide sequence and analysis of the coliphage T3 S-adenosylmethionine hydrolase gene and its surrounding ribonuclease III processing sites. Nucleic Acids Res 15, 717-729. Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D. (2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293, 834-838. Izant, J.G., and Weintraub, H. (1984). Inhibition of thymidine kinase gene expression by antisense RNA: a molecular approach to genetic analysis. Cell 36, 1007-1015. Jacobsen, S.E., Running, M.P., and Meyerowitz, E.M. (1999). Disruption of an RNA helicase/RNAse III gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126, 5231-5243. Jazdzewski, K., Murray, E.L., Franssila, K., Jarzab, B., Schoenberg, D.R., and de la Chapelle, A. (2008). Common SNP in pre-miR-146a decreases mature miR expression and predisposes to papillary thyroid carcinoma. Proc Natl Acad Sci U S A 105, 7269-7274. Jiang, F., Ye, X., Liu, X., Fincher, L., McKearin, D., and Liu, Q. (2005). Dicer-1 and R3D1-L catalyze microRNA maturation in Drosophila. Genes Dev 19, 1674-1679. Jolma, A., Kivioja, T., Toivonen, J., Cheng, L., Wei, G., Enge, M., Taipale, M., Vaquerizas, J.M., Yan, J., Sillanpaa, M.J., et al. (2010). Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res 20, 861873. Jones-Rhoades, M.W., Bartel, D.P., and Bartel, B. (2006). MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol 57, 19-53. 149 Jung, H., Yoon, B.C., and Holt, C.E. (2012). Axonal mRNA localization and local protein synthesis in nervous system assembly, maintenance and repair. Nat Rev Neurosci 13, 308324. Kadener, S., Rodriguez, J., Abruzzi, K.C., Khodor, Y.L., Sugino, K., Marr, M.T., 2nd, Nelson, S., and Rosbash, M. (2009). Genome-wide identification of targets of the droshapasha/DGCR8 complex. RNA 15, 537-545. Kawahara, Y., Megraw, M., Kreider, E., Iizasa, H., Valente, L., Hatzigeorgiou, A.G., and Nishikura, K. (2008). Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36, 5270-5280. Kawahara, Y., Zinshteyn, B., Chendrimada, T.P., Shiekhattar, R., and Nishikura, K. (2007). RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer-TRBP complex. EMBO Rep 8, 763-769. Kawamata, T., Seitz, H., and Tomari, Y. (2009). Structural determinants of miRNAs for RISC loading and slicer-independent unwinding. Nat Struct Mol Biol 16, 953-960. Kawamata, T., Yoda, M., and Tomari, Y. (2011). Multilayer checkpoints for microRNA authenticity during RISC assembly. EMBO Rep 12, 944-949. Ketting, R.F. (2011). The many faces of RNAi. Dev Cell 20, 148-161. Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15, 2654-2659. Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., et al. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106, 11667-11672. Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs and miRNAs exhibit strand bias. Cell 115, 209-216. Knight, S.W., and Bass, B.L. (2001). A role for the RNase III enzyme DCR-1 in RNA interference and germ line development in Caenorhabditis elegans. Science 293, 2269-2271. Kota, J., Chivukula, R.R., O'Donnell, K.A., Wentzel, E.A., Montgomery, C.L., Hwang, H.W., Chang, T.C., Vivekanandan, P., Torbenson, M., Clark, K.R., et al. (2009). Therapeutic microRNA delivery suppresses tumorigenesis in a murine liver cancer model. Cell 137, 1005-1017. Kozomara, A., and Griffiths-Jones, S. (2011). miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res 39, D152-157. Krinke, L., and Wulff, D.L. (1990). The cleavage specificity of RNase III. Nucleic Acids Res 18, 4809-4815. Kumar, M.S., Erkeland, S.J., Pester, R.E., Chen, C.Y., Ebert, M.S., Sharp, P.A., and Jacks, T. (2008). Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc Natl Acad Sci U S A 105, 3903-3908. Kumar, M.S., Lu, J., Mercer, K.L., Golub, T.R., and Jacks, T. (2007). Impaired microRNA processing enhances cellular transformation and tumorigenesis. Nat Genet 39, 673-677. Kurihara, Y., and Watanabe, Y. (2004). Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci U S A 101, 12753-12758. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858. 150 Lamontagne, B., and Elela, S.A. (2004). Evaluation of the RNA determinants for bacterial and yeast RNase III binding and cleavage. J Biol Chem 279, 2231-2241. Lamontagne, B., Ghazal, G., Lebars, I., Yoshizawa, S., Fourmy, D., and Elela, S.A. (2003). Sequence dependence of substrate recognition and cleavage by yeast RNase III. J Mol Biol 327, 985-1000. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414. Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical region gene 8 and Its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol 14, 2162-2167. Lasda, E.L., Allen, M.A., and Blumenthal, T. (2010). Polycistronic pre-mRNA processing in vitro: snRNP and pre-mRNA role reversal in trans-splicing. Genes Dev 24, 1645-1658. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. Lau, P.W., Guiley, K.Z., De, N., Potter, C.S., Carragher, B., and MacRae, I.J. (2012). The molecular architecture of human Dicer. Nat Struct Mol Biol 19, 436-440. Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864. Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., et al. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415419. Lee, Y., Hur, I., Park, S.Y., Kim, Y.K., Suh, M.R., and Kim, V.N. (2006). The role of PACT in the RNA silencing pathway. EMBO J 25, 522-532. Lee, Y., Jeon, K., Lee, J.T., Kim, S., and Kim, V.N. (2002). MicroRNA maturation: stepwise processing and subcellular localization. EMBO J 21, 4663-4670. Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004a). MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051-4060. Lee, Y., and Kim, V.N. (2007). In vitro and in vivo assays for the activity of Drosha complex. Methods Enzymol 427, 89-106. Lee, Y.S., Nakahara, K., Pham, J.W., Kim, K., He, Z., Sontheimer, E.J., and Carthew, R.W. (2004b). Distinct roles for Drosophila Dicer-1 and Dicer-2 in the siRNA/miRNA silencing pathways. Cell 117, 69-81. Lehrbach, N.J., Armisen, J., Lightfoot, H.L., Murfitt, K.J., Bugaut, A., Balasubramanian, S., and Miska, E.A. (2009). LIN-28 and the poly(U) polymerase PUP-2 regulate let-7 microRNA processing in Caenorhabditis elegans. Nat Struct Mol Biol 16, 1016-1020. Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. (2003a). Vertebrate microRNA genes. Science 299, 1540. Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades, M.W., Burge, C.B., and Bartel, D.P. (2003b). The microRNAs of Caenorhabditis elegans. Genes Dev 17, 9911008. Liu, H., D'Andrade, P., Fulmer-Smentek, S., Lorenzi, P., Kohn, K.W., Weinstein, J.N., Pommier, Y., and Reinhold, W.C. (2010). mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities. Mol Cancer Ther 9, 1080-1091. 151 Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J., Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 is the catalytic engine of mammalian RNAi. Science 305, 1437-1441. Liu, Q., Rand, T.A., Kalidas, S., Du, F., Kim, H.E., Smith, D.P., and Wang, X. (2003). R2D2, a bridge between the initiation and effector steps of the Drosophila RNAi pathway. Science 301, 1921-1925. Liu, X., Park, J.K., Jiang, F., Liu, Y., McKearin, D., and Liu, Q. (2007). Dicer-1, but not Loquacious, is critical for assembly of miRNA-induced silencing complexes. RNA 13, 23242329. Llave, C., Xie, Z., Kasschau, K.D., and Carrington, J.C. (2002). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert, B.L., Mak, R.H., Ferrando, A.A., et al. (2005). MicroRNA expression profiles classify human cancers. Nature 435, 834-838. Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. (2004). Nuclear export of microRNA precursors. Science 303, 95-98. Ma, J.B., Ye, K., and Patel, D.J. (2004). Structural basis for overhang-specific small interfering RNA recognition by the PAZ domain. Nature 429, 318-322. MacRae, I.J., Zhou, K., and Doudna, J.A. (2007). Structural determinants of RNA recognition and cleavage by Dicer. Nat Struct Mol Biol 14, 934-940. Macrae, I.J., Zhou, K., Li, F., Repic, A., Brooks, A.N., Cande, W.Z., Adams, P.D., and Doudna, J.A. (2006). Structural basis for double-stranded RNA processing by Dicer. Science 311, 195-198. Maroney, P.A., Hannon, G.J., and Nilsen, T.W. (1990). Transcription and cap trimethylation of a nematode spliced leader RNA in a cell-free system. Proc Natl Acad Sci U S A 87, 709-713. Mateos, J.L., Bologna, N.G., Chorostecki, U., and Palatnik, J.F. (2010). Identification of microRNA processing determinants by random mutagenesis of Arabidopsis MIR172a precursor. Curr Biol 20, 49-54. Mayr, C., Hemann, M.T., and Bartel, D.P. (2007). Disrupting the pairing between let-7 and Hmga2 enhances oncogenic transformation. Science 315, 1576-1579. Meignin, C., and Davis, I. (2010). Transmitting the message: intracellular mRNA localization. Curr Opin Cell Biol 22, 112-119. Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and Tuschl, T. (2004). Human Argonaute2 mediates RNA cleavage targeted by miRNAs and siRNAs. Mol Cell 15, 185-197. Merritt, W.M., Lin, Y.G., Han, L.Y., Kamat, A.A., Spannuth, W.A., Schmandt, R., Urbauer, D., Pennacchio, L.A., Cheng, J.F., Nick, A.M., et al. (2008). Dicer, Drosha, and outcomes in patients with ovarian cancer. N Engl J Med 359, 2641-2650. Mian, I.S. (1997). Comparative sequence analysis of ribonucleases HII, III, II PH and D. Nucleic Acids Res 25, 3187-3195. Michlewski, G., and Caceres, J.F. (2010). Antagonistic role of hnRNP A1 and KSRP in the regulation of let-7a biogenesis. Nat Struct Mol Biol 17, 1011-1018. Michlewski, G., Guil, S., Semple, C.A., and Caceres, J.F. (2008). Posttranscriptional regulation of miRNAs harboring conserved terminal loops. Mol Cell 32, 383-393. 152 Mishima, Y., and Steitz, J.A. (1995). Site-specific crosslinking of 4-thiouridine-modified human tRNA(3Lys) to reverse transcriptase from human immunodeficiency virus type I. EMBO J 14, 2679-2687. Miyoshi, K., Miyoshi, T., and Siomi, H. (2010). Many ways to generate microRNA-like small RNAs: non-canonical pathways for microRNA production. Mol Genet Genomics 284, 95103. Miyoshi, K., Okada, T.N., Siomi, H., and Siomi, M.C. (2009). Characterization of the miRNARISC loading complex and miRNA-RISC formed in the Drosophila miRNA pathway. RNA 15, 1282-1291. Moffat, J., Grueneberg, D.A., Yang, X., Kim, S.Y., Kloepfer, A.M., Hinkle, G., Piqani, B., Eisenhaure, T.M., Luo, B., Grenier, J.K., et al. (2006). A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124, 1283-1298. Montgomery, M.K., and Fire, A. (1998). Double-stranded RNA as a mediator in sequencespecific genetic silencing and co-suppression. Trends Genet 14, 255-258. Moore, M.J. (1999). Joining RNA molecules with T4 DNA ligase. Methods Mol Biol 118, 1119. Murphy, D., Dancis, B., and Brown, J.R. (2008). The evolution of core proteins involved in microRNA biogenesis. BMC Evol Biol 8, 92. Nakamura, T., Canaani, E., and Croce, C.M. (2007). Oncogenic All1 fusion proteins target Drosha-mediated microRNA processing. Proc Natl Acad Sci U S A 104, 10980-10985. Nam, J.W., Shin, K.R., Han, J., Lee, Y., Kim, V.N., and Zhang, B.T. (2005). Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res 33, 3570-3581. Nam, Y., Chen, C., Gregory, R.I., Chou, J.J., and Sliz, P. (2011). Molecular Basis for Interaction of let-7 MicroRNAs with Lin28. Cell. Napoli, C., Lemieux, C., and Jorgensen, R. (1990). Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. Plant Cell 2, 279-289. Newman, M.A., Thomson, J.M., and Hammond, S.M. (2008). Lin-28 interaction with the Let-7 precursor loop mediates regulated microRNA processing. RNA 14, 1539-1549. Nishikura, K. (2010). Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem 79, 321-349. Nykanen, A., Haley, B., and Zamore, P.D. (2001). ATP requirements and small interfering RNA structure in the RNA interference pathway. Cell 107, 309-321. O'Connell, R.M., Rao, D.S., Chaudhuri, A.A., Boldin, M.P., Taganov, K.D., Nicoll, J., Paquette, R.L., and Baltimore, D. (2008). Sustained expression of microRNA-155 in hematopoietic stem cells causes a myeloproliferative disorder. J Exp Med 205, 585-594. Okada, C., Yamashita, E., Lee, S.J., Shibata, S., Katahira, J., Nakagawa, A., Yoneda, Y., and Tsukihara, T. (2009). A high-resolution structure of the pre-microRNA nuclear export machinery. Science 326, 1275-1279. Okamura, K., Chung, W.J., Ruby, J.G., Guo, H., Bartel, D.P., and Lai, E.C. (2008). The Drosophila hairpin RNA pathway generates endogenous short interfering RNAs. Nature 453, 803-806. Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89-100. 153 Okamura, K., Liu, N., and Lai, E.C. (2009). Distinct mechanisms for microRNA strand selection by Drosophila Argonautes. Mol Cell 36, 431-444. Orom, U.A., Derrien, T., Beringer, M., Gumireddy, K., Gardini, A., Bussotti, G., Lai, F., Zytnicki, M., Notredame, C., Huang, Q., et al. (2010). Long noncoding RNAs with enhancerlike function in human cells. Cell 143, 46-58. Paddison, P.J., Caudy, A.A., Bernstein, E., Hannon, G.J., and Conklin, D.S. (2002). Short hairpin RNAs (shRNAs) induce sequence-specific silencing in mammalian cells. Genes Dev 16, 948-958. Pan, T., and Uhlenbeck, O.C. (1992). In vitro selection of RNAs that undergo autolytic cleavage with Pb2+. Biochemistry 31, 3887-3895. Park, J.E., Heo, I., Tian, Y., Simanshu, D.K., Chang, H., Jee, D., Patel, D.J., and Kim, V.N. (2011). Dicer recognizes the 5' end of RNA for efficient and accurate processing. Nature 475, 201-205. Park, M.Y., Wu, G., Gonzalez-Sulser, A., Vaucheret, H., and Poethig, R.S. (2005). Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A 102, 36913696. Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12, 1484-1495. Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-89. Patel, V.L., Mitra, S., Harris, R., Buxbaum, A.R., Lionnet, T., Brenowitz, M., Girvin, M., Levy, M., Almo, S.C., Singer, R.H., et al. (2012). Spatial arrangement of an RNA zipcode identifies mRNAs under post-transcriptional control. Genes Dev 26, 43-53. Patwardhan, R.P., Hiatt, J.B., Witten, D.M., Kim, M.J., Smith, R.P., May, D., Lee, C., Andrie, J.M., Lee, S.I., Cooper, G.M., et al. (2012). Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol 30, 265-270. Pertzev, A.V., and Nicholson, A.W. (2006). Characterization of RNA sequence determinants and antideterminants of processing reactivity for a minimal substrate of Escherichia coli ribonuclease III. Nucleic Acids Res 34, 3708-3721. Piskounova, E., Polytarchou, C., Thornton, J.E., LaPierre, R.J., Pothoulakis, C., Hagan, J.P., Iliopoulos, D., and Gregory, R.I. (2011). Lin28A and Lin28B inhibit let-7 microRNA biogenesis by distinct mechanisms. Cell 147, 1066-1079. Piskounova, E., Viswanathan, S.R., Janas, M., LaPierre, R.J., Daley, G.Q., Sliz, P., and Gregory, R.I. (2008). Determinants of microRNA processing inhibition by the developmentally regulated RNA-binding protein Lin28. J Biol Chem 283, 21310-21314. Pitman, J. (1993). Probability (New York, Springer-Verlag). Pitt, J.N., and Ferre-D'Amare, A.R. (2010). Rapid construction of empirical RNA fitness landscapes. Science 330, 376-379. Portier, C., Dondon, L., Grunberg-Manago, M., and Regnier, P. (1987). The first step in the functional inactivation of the Escherichia coli polynucleotide phosphorylase messenger is a ribonuclease III processing at the 5' end. EMBO J 6, 2165-2170. Regnier, P., and Grunberg-Manago, M. (1989). Cleavage by RNase III in the transcripts of the met Y-nus-A-infB operon of Escherichia coli releases the tRNA and initiates the decay of the downstream mRNA. J Mol Biol 210, 293-302. 154 Regnier, P., and Portier, C. (1986). Initiation, attenuation and RNase III processing of transcripts from the Escherichia coli operon encoding ribosomal protein S15 and polynucleotide phosphorylase. J Mol Biol 187, 23-32. Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906. Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). MicroRNAs in plants. Genes Dev 16, 1616-1626. Rinn, J.L., Kertesz, M., Wang, J.K., Squazzo, S.L., Xu, X., Brugmann, S.A., Goodnough, L.H., Helms, J.A., Farnham, P.J., Segal, E., et al. (2007). Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311-1323. Rivas, F.V., Tolia, N.H., Song, J.J., Aragon, J.P., Liu, J., Hannon, G.J., and Joshua-Tor, L. (2005). Purified Argonaute2 and an siRNA form recombinant human RISC. Nat Struct Mol Biol 12, 340-349. Robertson, H.D. (1982). Escherichia coli ribonuclease III cleavage sites. Cell 30, 669-672. Robertson, H.D., and Dunn, J.J. (1975). Ribonucleic acid processing activity of Escherichia coli ribonuclease III. J Biol Chem 250, 3050-3056. Robertson, H.D., Webster, R.E., and Zinder, N.D. (1967). A nuclease specific for doublestranded RNA. Virology 32, 718-719. Robertson, H.D., Webster, R.E., and Zinder, N.D. (1968). Purification and properties of ribonuclease III from Escherichia coli. J Biol Chem 243, 82-91. Romano, N., and Macino, G. (1992). Quelling: transient inactivation of gene expression in Neurospora crassa by transformation with homologous sequences. Mol Microbiol 6, 33433353. Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jeffrey, S.S., Van de Rijn, M., Waltham, M., et al. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24, 227-235. Rotondo, G., and Frendewey, D. (1996). Purification and characterization of the Pac1 ribonuclease of Schizosaccharomyces pombe. Nucleic Acids Res 24, 2377-2386. Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel, D.P. (2006). Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127, 1193-1207. Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007a). Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83-86. Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. (2007b). Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 17, 1850-1864. Ruvkun, G., Wightman, B., and Ha, I. (2004). The 20 years it took to recognize the importance of tiny RNAs. Cell 116, S93-96, 92 p following S96. Saetrom, P., Heale, B.S., Snove, O., Jr., Aagaard, L., Alluin, J., and Rossi, J.J. (2007). Distance constraints between microRNA target sites dictate efficacy and cooperativity. Nucleic Acids Res 35, 2333-2342. Saito, K., Ishizuka, A., Siomi, H., and Siomi, M.C. (2005). Processing of pre-microRNAs by the Dicer-1-Loquacious complex in Drosophila cells. PLoS Biol 3, e235. Sakurai, K., Furukawa, C., Haraguchi, T., Inada, K., Shiogama, K., Tagawa, T., Fujita, S., Ueno, Y., Ogata, A., Ito, M., et al. (2011). MicroRNAs miR-199a-5p and -3p target the Brm 155 subunit of SWI/SNF to generate a double-negative feedback loop in a variety of human cancers. Cancer Res 71, 1680-1689. Schauer, S.E., Jacobsen, S.E., Meinke, D.W., and Ray, A. (2002). DICER-LIKE1: blind men and elephants in Arabidopsis development. Trends Plant Sci 7, 487-491. Schurer, H., Lang, K., Schuster, J., and Morl, M. (2002). A universal method to produce in vitro transcripts with homogeneous 3' ends. Nucleic Acids Res 30, e56. Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208. Seitz, H., Tushir, J.S., and Zamore, P.D. (2011). A 5'-uridine amplifies miRNA/miRNA* asymmetry in Drosophila by promoting RNA-induced silencing complex formation. Silence 2, 4. Sempere, L.F., Cole, C.N., McPeek, M.A., and Peterson, K.J. (2006). The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. J Exp Zool B Mol Dev Evol 306, 575-588. Sharp, P.A., and Burge, C.B. (1997). Classification of introns: U2-type or U12-type. Cell 91, 875-879. Shi, Y., Wang, Y.F., Jayaraman, L., Yang, H., Massague, J., and Pavletich, N.P. (1998). Crystal structure of a Smad MH1 domain bound to DNA: insights on DNA binding in TGF-beta signaling. Cell 94, 585-594. Shin, C., Nam, J.W., Farh, K.K., Chiang, H.R., Shkumatava, A., and Bartel, D.P. (2010). Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38, 789-802. Slattery, M., Riley, T., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J., et al. (2011). Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270-1282. Sohn, S.Y., Bae, W.J., Kim, J.J., Yeom, K.H., Kim, V.N., and Cho, Y. (2007). Crystal structure of human DGCR8 core. Nat Struct Mol Biol 14, 847-853. Sokilde, R., Kaczkowski, B., Podolska, A., Cirera, S., Gorodkin, J., Moller, S., and Litman, T. (2011). Global microRNA analysis of the NCI-60 cancer cell panel. Mol Cancer Ther 10, 375-384. Song, L., Axtell, M.J., and Fedoroff, N.V. (2010). RNA secondary structural determinants of miRNA precursor processing in Arabidopsis. Curr Biol 20, 37-41. Sontheimer, E.J. (1994). Site-specific RNA crosslinking with 4-thiouridine. Mol Biol Rep 20, 35-44. Spingola, M., Grate, L., Haussler, D., and Ares, M., Jr. (1999). Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. RNA 5, 221-234. Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146. Stark, A., Kheradpour, P., Parts, L., Brennecke, J., Hodges, E., Hannon, G.J., and Kellis, M. (2007). Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res 17, 1865-1879. Steiner, F.A., Hoogstrate, S.W., Okihara, K.L., Thijssen, K.L., Ketting, R.F., Plasterk, R.H., and Sijen, T. (2007). Structural features of small RNA precursors determine Argonaute loading in Caenorhabditis elegans. Nat Struct Mol Biol 14, 927-933. 156 Sun, G., Yan, J., Noltner, K., Feng, J., Li, H., Sarkis, D.A., Sommer, S.S., and Rossi, J.J. (2009). SNPs in human miRNA genes affect biogenesis and function. RNA 15, 1640-1651. Suzuki, H.I., Yamagata, K., Sugimoto, K., Iwamoto, T., Kato, S., and Miyazono, K. (2009). Modulation of microRNA processing by p53. Nature 460, 529-533. Swanson, W.J., and Vacquier, V.D. (2002). The rapid evolution of reproductive proteins. Nat Rev Genet 3, 137-144. Tam, W., Ben-Yehuda, D., and Hayward, W.S. (1997). bic, a novel gene activated by proviral insertions in avian leukosis virus-induced lymphomas, is likely to function through its noncoding RNA. Mol Cell Biol 17, 1490-1502. Tang, G., Reinhart, B.J., Bartel, D.P., and Zamore, P.D. (2003). A biochemical framework for RNA silencing in plants. Genes Dev 17, 49-63. Tarn, W.Y., and Steitz, J.A. (1996). A novel spliceosome containing U11, U12, and U5 snRNPs excises a minor class (AT-AC) intron in vitro. Cell 84, 801-811. Tavazoie, S.F., Alarcon, C., Oskarsson, T., Padua, D., Wang, Q., Bos, P.D., Gerald, W.L., and Massague, J. (2008). Endogenous human microRNAs that suppress breast cancer metastasis. Nature 451, 147-152. Thai, T.H., Calado, D.P., Casola, S., Ansel, K.M., Xiao, C., Xue, Y., Murphy, A., Frendewey, D., Valenzuela, D., Kutok, J.L., et al. (2007). Regulation of the germinal center response by microRNA-155. Science 316, 604-608. Tomari, Y., Du, T., and Zamore, P.D. (2007). Sorting of Drosophila small silencing RNAs. Cell 130, 299-308. Trabucchi, M., Briata, P., Filipowicz, W., Ramos, A., Gherzi, R., and Rosenfeld, M.G. (2010). KSRP promotes the maturation of a group of miRNA precursors. Adv Exp Med Biol 700, 36-42. Trabucchi, M., Briata, P., Garcia-Mayoral, M., Haase, A.D., Filipowicz, W., Ramos, A., Gherzi, R., and Rosenfeld, M.G. (2009). The RNA-binding protein KSRP promotes the biogenesis of a subset of microRNAs. Nature 459, 1010-1014. Tsai, M.C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J.K., Lan, F., Shi, Y., Segal, E., and Chang, H.Y. (2010). Long noncoding RNA as modular scaffold of histone modification complexes. Science 329, 689-693. Tsutsumi, A., Kawamata, T., Izumi, N., Seitz, H., and Tomari, Y. (2011). Recognition of the premiRNA structure by Drosophila Dicer-1. Nat Struct Mol Biol 18, 1153-1158. Tuschl, T., Zamore, P.D., Lehmann, R., Bartel, D.P., and Sharp, P.A. (1999). Targeted mRNA degradation by double-stranded RNA in vitro. Genes Dev 13, 3191-3197. Ui-Tei, K., Naito, Y., Nishi, K., Juni, A., and Saigo, K. (2008). Thermodynamic stability and Watson-Crick base pairing in the seed duplex are major determinants of the efficiency of the siRNA-based off-target effect. Nucleic Acids Res 36, 7100-7109. Ulitsky, I., Shkumatava, A., Jan, C.H., Sive, H., and Bartel, D.P. (2011). Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 1537-1550. van der Krol, A.R., Mur, L.A., Beld, M., Mol, J.N., and Stuitje, A.R. (1990). Flavonoid genes in petunia: addition of a limited number of gene copies may lead to a suppression of gene expression. Plant Cell 2, 291-299. Vermeulen, A., Behlen, L., Reynolds, A., Wolfson, A., Marshall, W.S., Karpilow, J., and Khvorova, A. (2005). The contributions of dsRNA structure to Dicer specificity and efficiency. RNA 11, 674-682. 157 Viswanathan, S.R., and Daley, G.Q. (2010). Lin28: A microRNA regulator with a macro role. Cell 140, 445-449. Viswanathan, S.R., Daley, G.Q., and Gregory, R.I. (2008). Selective blockade of microRNA processing by Lin28. Science 320, 97-100. Viswanathan, S.R., Powers, J.T., Einhorn, W., Hoshida, Y., Ng, T.L., Toffanin, S., O'Sullivan, M., Lu, J., Phillips, L.A., Lockhart, V.L., et al. (2009). Lin28 promotes transformation and is associated with advanced human malignancies. Nat Genet 41, 843-848. Vuppalanchi, D., Coleman, J., Yoo, S., Merianda, T.T., Yadhati, A.G., Hossain, J., Blesch, A., Willis, D.E., and Twiss, J.L. (2010). Conserved 3'-untranslated region sequences direct subcellular localization of chaperone protein mRNAs in neurons. J Biol Chem 285, 1802518038. Wang, D., Zhang, Z., O'Loughlin, E., Lee, T., Houel, S., O'Carroll, D., Tarakhovsky, A., Ahn, N.G., and Yi, R. (2012). Quantitative functions of Argonaute proteins in mammalian development. Genes Dev 26, 693-704. Wang, K.C., and Chang, H.Y. (2011). Molecular mechanisms of long noncoding RNAs. Mol Cell 43, 904-914. Wang, S.L., Yao, H.H., and Qin, Z.H. (2009). Strategies for short hairpin RNA delivery in cancer gene therapy. Expert Opin Biol Ther 9, 1357-1368. Wang, Y., Medvid, R., Melton, C., Jaenisch, R., and Blelloch, R. (2007). DGCR8 is essential for microRNA biogenesis and silencing of embryonic stem cell self-renewal. Nat Genet 39, 380385. Warf, M.B., Johnson, W.E., and Bass, B.L. (2011). Improved annotation of C. elegans microRNAs by deep sequencing reveals structures associated with processing by Drosha and Dicer. RNA 17, 563-577. Weinberg, D.E., Nakanishi, K., Patel, D.J., and Bartel, D.P. (2011). The inside-out mechanism of Dicers from budding yeasts. Cell 146, 262-276. Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862. Witten, D., Tibshirani, R., Gu, S.G., Fire, A., and Lui, W.O. (2010). Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biol 8, 58. Wu, H., Henras, A., Chanfreau, G., and Feigon, J. (2004). Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proc Natl Acad Sci U S A 101, 8307-8312. Wu, H., Xu, H., Miraglia, L.J., and Crooke, S.T. (2000). Human RNase III is a 160-kDa protein involved in preribosomal RNA processing. J Biol Chem 275, 36957-36965. Wu, H., Yang, P.K., Butcher, S.E., Kang, S., Chanfreau, G., and Feigon, J. (2001). A novel family of RNA tetraloop structure forms the recognition site for Saccharomyces cerevisiae RNase III. EMBO J 20, 7240-7249. Wu, M., Jolicoeur, N., Li, Z., Zhang, L., Fortin, Y., L'Abbe, D., Yu, Z., and Shen, S.H. (2008). Genetic variations of microRNAs in human cancer and their effects on the expression of miRNAs. Carcinogenesis 29, 1710-1716. Wyatt, J.R., Sontheimer, E.J., and Steitz, J.A. (1992). Site-specific cross-linking of mammalian U5 snRNP to the 5' splice site before the first step of pre-mRNA splicing. Genes Dev 6, 2542-2553. 158 Yanaihara, N., Caplen, N., Bowman, E., Seike, M., Kumamoto, K., Yi, M., Stephens, R.M., Okamoto, A., Yokota, J., Tanaka, T., et al. (2006). Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell 9, 189-198. Yang, W., Chendrimada, T.P., Wang, Q., Higuchi, M., Seeburg, P.H., Shiekhattar, R., and Nishikura, K. (2006). Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13, 13-21. Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science 304, 594-596. Yeom, K.H., Lee, Y., Han, J., Suh, M.R., and Kim, V.N. (2006). Characterization of DGCR8/Pasha, the essential cofactor for Drosha in primary miRNA processing. Nucleic Acids Res 34, 4622-4629. Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. (2003). Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes Dev 17, 3011-3016. Yoda, M., Kawamata, T., Paroo, Z., Ye, X., Iwasaki, S., Liu, Q., and Tomari, Y. (2010). ATPdependent human RISC assembly pathways. Nat Struct Mol Biol 17, 17-23. Young, R.A., and Steitz, J.A. (1978). Complementary sequences 1700 nucleotides apart form a ribonuclease III cleavage site in Escherichia coli ribosomal precursor RNA. Proc Natl Acad Sci U S A 75, 3593-3597. Zamore, P.D., Tuschl, T., Sharp, P.A., and Bartel, D.P. (2000). RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101, 2533. Zeng, Y., and Cullen, B.R. (2005). Efficient processing of primary microRNA hairpins by Drosha requires flanking nonstructured RNA sequences. J Biol Chem 280, 27595-27603. Zeng, Y., Yi, R., and Cullen, B.R. (2005). Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J 24, 138-148. Zhang, B., Pan, X., Cobb, G.P., and Anderson, T.A. (2007). microRNAs as oncogenes and tumor suppressors. Dev Biol 302, 1-12. Zhang, H., Kolb, F.A., Brondani, V., Billy, E., and Filipowicz, W. (2002). Human Dicer preferentially cleaves dsRNAs at their termini without a requirement for ATP. EMBO J 21, 5875-5885. Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. (2004). Single processing center models for human Dicer and bacterial RNase III. Cell 118, 57-68. Zhang, K., and Nicholson, A.W. (1997). Regulation of ribonuclease III processing by doublehelical sequence antideterminants. Proc Natl Acad Sci U S A 94, 13437-13441. Zhang, X., and Zeng, Y. (2010). The terminal loop region controls microRNA processing by Drosha and Dicer. Nucleic Acids Res 38, 7689-7697. Zhou, R., Czech, B., Brennecke, J., Sachidanandam, R., Wohlschlegel, J.A., Perrimon, N., and Hannon, G.J. (2009). Processing of Drosophila endo-siRNAs depends on a specific Loquacious isoform. RNA 15, 1886-1895. Zykovich, A., Korf, I., and Segal, D.J. (2009). Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions using massively parallel sequencing. Nucleic Acids Res 37, e151. 159 160 Appendix 1. Experimental protocols Contents In Vitro Cleavage Selection for Functional pri-miRNAs ............................................................164 Preparation of Drosha/DGCR8 Overexpression Lysate ..................................................... 164 Overview ................................................................................................................... 164 Cell culture and transfection ..................................................................................... 164 Lysate preparation ..................................................................................................... 165 Assay for microprocessor activity ............................................................................ 165 In vitro cleavage selection of circular pri-miRNA substrates ............................................. 168 Overview ................................................................................................................... 168 Assembly of partially-randomized, circular pri-miRNA substrate ........................... 169 Selection of functional variants from circular pri-miRNA pool ............................... 174 Library preparation for Illumina high-throughput sequencing ................................. 175 In vitro cleavage selection of linear pri-miRNA substrates ................................................ 184 Overview ................................................................................................................... 184 Assembly of linear, partially-randomized, pri-miRNA substrate ............................. 185 Selection of functional variants from linear pri-miRNA pool .................................. 187 Library preparation for Illumina high-throughput sequencing ................................. 188 In Vitro Binding Selection for Functional pri-miRNAs ..............................................................191 Preparation of immunopurified Drosha-TN/DGCR8 ......................................................... 191 Overview ................................................................................................................... 191 Cell culture and transfection ..................................................................................... 191 161 Lysate preparation ..................................................................................................... 192 Immunoprecipitation with anti-FLAG beads ............................................................ 192 In vitro binding selection of linear pri-miRNA substrates.................................................. 194 Overview ................................................................................................................... 194 Assembly of linear, partially-randomized, pri-miRNA substrate ............................. 195 Selection of functional variants from linear pri-miRNA pool .................................. 197 PCR amplification for downstream applications ...................................................... 200 Identification of Motif-Binding Proteins by Site-Specific Crosslinking .....................................203 Overview ............................................................................................................................. 203 Assembly of 4-thiouridine containing RNA substrate ........................................................ 204 RNA-protein crosslinking and complex analysis ............................................................... 205 Optimization of 365 nm UV dose ............................................................................. 205 Large-scale purification of crosslinked complexes for analysis ............................... 207 Candidate protein testing by immunoprecipitation of crosslinked complexes ......... 209 Supplement 1: Standard Protocols ...............................................................................................211 Crush-and-soak technique for acrylamide gel extraction ................................................... 211 T7 transcription (“Midi” scale) ........................................................................................... 212 Phosphorylation of nucleic acid 5′ Ends ............................................................................. 213 Cold ........................................................................................................................... 213 Hot............................................................................................................................. 213 Dephosphorylation of nucleic acid 5′ ends ......................................................................... 214 Dephosphorylation of 2′-3′ cyclic phosphates and 3′ phosphates ....................................... 214 Splint ligation of RNA ........................................................................................................ 215 Using T4 RNA Ligase 2 (RNL2) .............................................................................. 215 Partial hydrolysis of RNA................................................................................................... 216 162 Supplement 2: Standard Reagents ...............................................................................................217 Buffers................................................................................................................................. 217 Commercial products .......................................................................................................... 219 163 In Vitro Cleavage Selection for Functional pri-miRNAs Preparation of Drosha/DGCR8 Overexpression Lysate Overview 293T cells are transiently transfected with plasmids encoding C-terminal FLAG tagged human Drosha and N-terminal FLAG-HA tagged human DGCR8. Cells are harvested and sonicated. After centrifuge clearing of precipitates and insoluble material, the whole-cell lysate is stored in single-use aliquots. This protocol is modified from Lee and Kim, Meth. Enz. 427 (2007), with additional input from Jinju Han and V. Narry Kim. Plasmids: • • • pMAX-GFP (transfection marker) pFLAG-HA-DGCR8 (courtesy of T. Tuschl) pCK-Drosha-FLAG (courtesy of V. Narry Kim) Cell culture and transfection Maintain 293T cells under standard conditions. We use Dulbecco’s Modified Eagle Medium (DMEM) supplemented to 10% Fetal Bovine Serum. To transfect cells, start with three 15-cm dishes at 90-100% confluence. Split all of these cells into eight 15-cm dishes the day before transfection. Use 15 ml media per plate. The day after splitting (18-24 hours later), begin transfection. Assemble Lipofectamine dilution in one 50 ml Falcon tube: 720 µl Lipofectamine 2000 24 ml Opti-mem I Serum-Free Media Incubate at room temperature for 5 min. Meanwhile, assemble plasmid mixture in another 50 ml Falcon tube: 24 µg pMAX-GFP 60 µg pCK-Drosha-FLAG 60 µg pFLAG-HA-DGCR8 24 ml Opti-mem I Serum-Free Media Add plasmid mixture to Lipofectamine dilution, and gently invert twice to mix. complexes to form by incubating at room temperature for 20-30 min. Allow After complexes have formed, add 6 ml of mixture dropwise to each 15-cm dish. Rock plate gently 1-2 times to ensure even dispersal of transfection complexes. 164 12-24 hours after transfection, check for expression of GFP. At this relatively early point, 50%75% of cells will already be expressing GFP. Split all the transfected cells into twenty 15-cm dishes. Allow cells to grow for 48 hr. After this period, cells should be at or just past confluence. Frequently, with a successful transfection containing pMAX-GFP, the monolayer is faintly green in room lighting. Cells are now ready for harvesting. Lysate preparation Harvest overexpressing cells by removing media and pipetting PBS onto the monolayer. Due to the poor adherence of 293T cells, this is typically enough to dislodge the monolayer. Collect PBS suspension of cells, and keep on ice. Pellet PBS suspension by centrifugation. This pellet should be green from GFP in the cells; there should be little or no visible fluorescence in the supernatant. Prepare 1X Sonic Buffer + protease inhibitors by dissolving a Mini EDTA-Free Protease Inhibitor tablet in 10 ml of 1X Sonic Buffer. Keep on ice. Resuspend cell pellet in 10 ml of 1X Sonic Buffer + protease inhibitors. Prepare sonicator for use. We use a Branson Sonifier 250 at 50% duty cycle and output level 4. Clean the probe by sonicating RNase-ZAP solution for 10 pulses (approximately 20 seconds at 50% duty cycle), then sonicating deionized water for 10 pulses. Lyse 293T cells by sonicating for 10 pulses. Clear the lysate once by centrifuging lysate at 3500 x g for 15 min. If lysis is successful, the pellet should be yellow and perhaps light-green, while the supernatant is green. Collect supernatant. Clear supernatant once more by centrifuging at 3500 x g for 15 min. Collect supernatant. This is the cleared whole-cell extract. Aliquot whole-cell extract into single-use aliquots and store aliquots in liquid nitrogen vapor. Assay for microprocessor activity We assess Microprocessor activity by cleavage of a model miRNA substrate, pri-mir-125a. Cleavage of this substrate appears to be essentially complete at 10 minutes, and shows features consistent with a single-turnover reaction. Accordingly, we use this substrate to estimate a functional concentration of Microprocessor complex. 165 Synthesis of pri-mir-125a reference substrate Amplify a runoff template for T7 transcription by PCR using standard protocols. Human genomic DNA or a mir-125a expression plasmid can be used as a template for this reaction. T7::mir-125a Forward Primer: TAATACGACTCACTATAggTCTCTGACCCCCACCCCAGGG mir-125a Reverse Primer ATGAGGAGTCAGGGGTCAGAAGTCAGGCCAGC See Appendix I for a T7 transcription protocol using the PCR product as a template. The expected product is below (underlined region is the pre-miRNA product). Pri-mir-125a reference (187 nt) ggUCUCUGACCCCCACCCCAGGGUCUACCGGGCCACCGCACACCAUGUUGCCAGUCUCUAGGUCC CUGAGACCCUUUAACCUGUGAGGACAUCCAGGGUCACAGGUGAGGUUCUUGGGAGCCUGGCGUCU GGCCCAACCACACACCUGGGGAAUUGCUGGCCUGACUUCUGACCCCUGACUCCUCAU After purification on urea-polyacrylamide, dephosphorylate the 5′-triphosphorylated substrate RNA using CIP, according to the standard protocol in Appendix I. After the reaction is complete, phenol extract the reaction and ethanol precipitate the product; resuspend in deionized water. 5′-label the substrate using γ-[32P]-ATP and T4 Polynucleotide Kinase, as described in Appendix I. After the reaction is complete, run over a size-exclusion column such as G-25 or P-30 to remove unincorporated ATP. Purify again on urea-polyacrylamide. After resuspension, measure concentration of hot substrate or dilute into a known concentration of cold substrate. We typically maintain substrates at 1 µM or greater to minimize concentration changes due to tube wall adsorption. Titration-timecourse assay for microprocessor activity Set up a titration of the whole-cell lysate for estimating the effective concentration of Microprocessor complex: 10X Cleavage Buffer (5 mM Mg) Microprocessor Lysate 1X Sonic Buffer 1 µM pri-mir-125a Ref. Substrate 1% Rxn 10 µl 1 µl 80 µl 10 µl 100 µl 5% Rxn 10 µl 5 µl 75 µl 10 µl 100 µl 10% Rxn 10 µl 10 µl 70 µl 10 µl 100 µl 20% Rxn 10 µl 20 µl 50 µl 10 µl 100 µl 50% Rxn 10 µl 50 µl 30 µl 10 µl 100 µl Before adding substrate, prewarm the reaction mixture to 37º. Be sure to add the substrate last, as addition of the substrate marks timepoint “0.” After addition of substrate, withdraw 20 µl of 166 the reaction at 0 min, 1 min, 3 min, 9 min, and 27 min. Each 20 µl timepoint should be immediately mixed with at least 100 µl ice-cold Tri-Reagent to stop the reaction. After extraction from Tri-Reagent and precipitation, load and run a 10% Urea-polyacrylamide gel. Labeled Decade markers can be used as a size ladder. The expected product size is 61 nt. Measure the fraction of substrate cleaved at each time point to estimate the concentration of active enzyme. Results from a representative experiment is shown below. Since the initial substrate concentration was 100 nM, the Microprocessor appears to cleave hsa-mir-125a with burst kinetics. We have used this behavior to estimate the concentration of Microprocessor at about 75 nM. 50 45 40 nM Product 35 30 1% Lysate 25 5% Lysate 20 10% Lysate 15 20% Lysate 10 50% Lysate 5 0 0 5 10 15 20 Time (min) 167 25 30 In vitro cleavage selection of circular pri-miRNA substrates Overview The goal of this selection is to explore Microprocessor cleavage determinants in sequences flanking the pre-miRNA hairpin. The overall strategy is to generate a large pool of variant molecules and subject this pool to Microprocessor cleavage. Products of cleavage are recovered and sequenced by high-throughput sequencing to detect motifs that are enriched in the product (Microprocessor-cleaved) population, relative to the original pool. A major hindrance to this strategy is that after cleavage of linear pri-miRNA substrates, there are two flanking RNA products, a 5p (upstream) product, and a 3p (downstream) product. We would like to clone and sequence both products, maintaining the relationship between upstream fragment and downstream fragment for any given partially-randomized RNA molecule. For example, the basal stem of a pri-miRNA hairpin involves pairing between bases in the 5p product and complementary bases in the 3p product. To detect evidence of covariation, we must recover and sequence both bases from the 5p and 3p product of the same substrate molecule. We have accomplished this by circularizing pri-miRNA substrates, and isolating linearized products for high-throughput sequencing. 168 Assembly of partially-randomized, circular pri-miRNA substrate The overall strategy is summarized in the following diagram. A T7 template corresponding to the desired RNA is assembled by PCR, then transcribed. An HDV ribozyme is included in the sequence to homogenize the 3′ ends; self-cleavage occurs in the transcription reaction. After gel purification, the 5′ and 3′ ends are treated to produce ligatable ends. The circularization is achieved using T4 RNA Ligase 1. Template assembly by PCR Templates for T7 transcription are assembled from synthetic oligonucleotides. IDT will synthesize long oligos with both constant sequence and partially randomized positions (for a price). In addition to the pri-miRNA sequence (both constant and partially-randomized), sequence corresponding to the HDV ribozyme is also added to the template amplicon. By way of example, oligonucleotide sequences are provided for the constant (unrandomized) C125circ template. However, the clonal pool (composed many copies of the wildtype miRNA sequence) should be prepared in parallel with the partially randomized pool (composed of a few copies of many variant sequences). C125circ.001a Left arm GACTCACTATAGGGTCACAGGTGAGGTTCTTGGGAGCCTGGCGTCTGGCCCAACCACACACCTGGGGAATT GCTGGCCTGACTTCTGACCCCTGACTCCT 169 C125circ.002a Right arm (ordering orientation) TCCTCACAGGTTAAAGGGTCTCAGGGACCTAGAGACTGGCAACATGGTGTGCGGTGGCCCGGTAGACCCTG GGGTGGGGGTATGAGGAGTCAGGGGTCAG C125circ.003 HDV adaptor CTTCTCCCTTAGCCTACCGAAGTAGCCCAGGTCGGACCGCGAGGAGGTGGAGATGCCATGCCGACCCTGGA TGTCCTCACAGGTTAAAGG C125circ.004 T7 Adaptor CAGAGATGCATAATACGACTCACTATAGGGTCACAG HDV 5′ Polishing Primer CTTCTCCCTTAGCCTACCGAAGTAGCCCAGG First, PAGE-purify long primers. In general, primers longer than 60 nt are difficult even for commercial operations to synthesize, and should be purified before use. Resuspend purified oligos to 100 µM where possible. An initial primer extension between the left and right primers is used to generate a small amount of template for PCR. Assemble the following reaction: 15 µl 15 µl 270 µl 40 µl 40 µl 20 µl 400 µl Left Arm oligo (100 µM) Right Arm oligo (100 µM) H2O 10X “House” PCR Buffer 10X “House” dNTP Mix “House” Taq Perform a primer extension using the program “L/R Ext.” 30 sec (ramp) 1 min 1 min (hold) 95°C Ramp to 37°C at 0.1°C/sec 37°C 72°C Repeat for 5 total cycles 10°C Without purifying or concentrating the primer extension, assemble the following reaction: 30 µl 355 µl 5 µl 5 µl 50 µl 50 µl 5 µl 500 µl Primer extension reaction H2O C125circ.003 HDV Adaptor (100 µM) C125circ.004 T7 Adaptor (100 µM) 10X “House” PCR Buffer 10X “House” dNTP Mix “House” Taq 170 Perform an initial PCR amplification using the program “L/R PCR.” Note that the annealing temperature is based on the amount of homology between the T7 adaptor and the left arm oligo, and the amount of homology between the HDV adaptor and the right arm oligo. 30 sec 30 sec 30 sec 30 sec (hold) 95°C 95°C 45°C or appropriate annealing temp. 72°C Repeat for 4 total cycles 10°C Perform a second PCR amplification to “polish” the template for use. Frequently, the long HDV oligo has internal lesions that prevent T7 RNA polymerase from proceeding through the template. Incomplete transcripts will not have intact HDV ribozyme sequences, and will not self-cleave to the correct size. A few additional rounds of PCR will selectively amplify molecules that have fewer lesions. 50 µl 335 µl 5 µl 5 µl 50 µl 50 µl 5 µl 500 µl Initial PCR reaction H2O HDV Polish Primer (100 µM) C125circ.004 T7 Adaptor (100 µM) 10X “House” PCR Buffer 10X “House” dNTP Mix “House” Taq Repeat L/R PCR program. The annealing temperature may be raised based on the melting temperatures of the HDV Polish Primer and the T7 Adaptor. Ethanol precipitate the reaction and resuspend in 50 µl H2O. Transcription of linear pri-miRNA pool Use T7 RNA Polymerase to transcribe the template prepared in the previous step. A 200 µl reaction scale should be sufficient for most applications, but more is always possible. 25 µl 85 µl 20 µl 40 µl 10 µl 10 µl 10 µl 200 µl Concentrated template pool H2O 10X “House” T7 Buffer 5X “House” NTP Mix DTT (100 mM) Fresh α-[32P]-UTP “House” T7 RNA Polymerase Incubate 37º x 2-3 hr 171 Add 5 µl Turbo DNAse (Ambion) Incubate 37º x 30 min Add 1/20 volume 500 mM EDTA Add 1/10 volume 3M NaCl Add 1 volume 100% ethanol Final: T7 reaction mixture with whatever NTP/Salt/Mg/PPi is present ~25 mM EDTA ~75 mM NaCl (supplemental) 50% ethanol Incubate at least 15 min in -20 Spin 15 min at 4º After precipitation, the reaction should be purified on a 5% Urea-polyacrylamide gel with an appropriate size marker. The expected products are (1) a full-length transcript that has not been self-cleaved; (2) the mature, self-cleaved transcript, and (3) the HDV ribozyme RNA after liberation from the rest of the transcript. Cut out and elute the mature, self-cleaved transcript. This is the linear pri-miRNA pool. For the C125circ primers given above, the expected product sequence is given below. The S125circ partially randomized transcript was twin to this sequence, except that positions colored red were partially randomized at 79% wildtype (indicated base) and 7% each of the other three bases. These positions comprise the basal miRNA stem and 35-45 nt of flanking sequence. ppp-5′GGGTCACAGGTGAGGTTCTTGGGAGCCTGGCGTCTGGCCCAACCACACACCTGGGGAATTGCTGG CCTGACTTCTGACCCCTGACTCCTCATACCCCCACCCCAGGGTCTACCGGGCCACCGCACACCAT GTTGCCAGTCTCTAGGTCCCTGAGACCCTTTAACCTGTGAGGACATCCA-[2′-3′ cyclic phosphate] Treatment of transcript ends for ligation After transcription and self-cleavage, the pool molecules have a 5′ triphosphate and a 2′-3′ cyclic phosphate. The goal of this phase is to convert these ends to a 5′ monophosphate and a 3′-OH, which are the substrates of T4 RNA Ligase 1. First, the triphosphate is removed using Calf Intestinal Phosphatase or Alkaline Phosphatase (CIP). Assemble the following reaction. It is often convenient to reconstitute the pellet from gel extraction directly in the appropriate amount of water. 172 (pellet) 50 µl 6 µl 4 µl 60 µl Linear pri-miRNA pool from previous H2O NEBuffer 3 CIP Incubate at 37°C x 1 hr Phenol-extract the reaction and precipitate. It is often convenient to directly resuspend the pellet in the next reaction’s buffer. Next, the 2′-3′ cyclic phosphate is removed using T4 Polynucleotide Kinase, which has this activity in addition to its kinase activity, in keeping with its role in repairing cleaved tRNA anticodon loops. Assemble the following: (pellet) 50 µl 120 µl 10 µl 180 µl First dephosphorylation from above H2O 1.5X MES Dephosphorylation Buffer T4 PNK Incubate at 37°C x 6 hr Since PNK is also used to 5′ phosphorylate the RNA, it is not necessary to phenol extract. Instead, ethanol precipitate the reaction and resuspend it in the phosphorylation reaction mix. Note: T4 DNA Ligase buffer from NEB is equivalent to T4 PNK Buffer supplemented with 1 mM ATP (final). If using PNK buffer, supplement the reaction to 1 mM ATP (final). (pellet) 170 µl 20 µl 10 µl 200 µl Dephosphorylated pri-miRNA pool H2O T4 DNA Ligase Buffer T4 PNK Incubate at 37°C x 1 hr Phenol-extract the reaction and precipitate. Resuspend in 340 µl H2O. Half will be used directly for circularization. Circularization of pri-miRNA pool using T4 RNA Ligase 1 Consistent with its role in repairing cleaved tRNA anticodon loops, T4 RNA Ligase 1 is reasonably good at ligating single stranded RNA ends that are held close to each other by double-stranded RNA. This is the design of the circularization strategy. Assemble the following reaction: 170 µl Phosphorylated pri-miRNA pool Heat to 85°C x 1 min and cool to room temperature 20 µl T4 RNA Ligase 1 Buffer 10 µl T4 RNA Ligase 1 200 µl Incubate at 37°C x 2 hr 173 Purify the ligation product on a 5% urea-polyacrylamide gel along with an appropriate size ladder. It is frequently convenient to include a small amount of the phosphorylated pri-miRNA pool as a reference. The circularized RNA runs slightly higher than its linearized form. The apparent linear RNA from the ligation lane is a mixture of unligated RNA and nicked circular RNA. Based on experiments using hot 5′ phosphorylated RNA, it appears that much of the apparently linear RNA is actually nicked circularized RNA, suggesting that the ligation reaction is relatively efficient, but that hydrolysis occurs during the reaction for reasons not clear. The nicked RNA does not appear to accumulate with time, indicating that its occurrence is not due to contaminating activity. If desired, the circular nature of the substrate can be verified by two methods. First, cleave the substrate with the Microprocessor to verify that two products form, corresponding to the premiRNA and a single molecule with the flanking sequence. Second, perform a partial hydrolysis time course from 30 sec to 10 min, and compare the products with the uncut circular substrate and the phosphorylated, unligated, linear substrate. At early time points, hydrolysis of the circular substrate should produce a product at the size of the unligated linear substrate. Further hydrolysis will produce a smear of degradation products. Selection of functional variants from circular pri-miRNA pool Cleavage timecourse of clonal wildtype pool and partially-randomized pool This experiment serves three purposes: (1) verification that the circularized pri-miRNA substrates are cleavage competent circular molecules, (2) estimation of the overall contribution of flanking RNA sequence by comparing the cleavage of the partially-randomized pool to the cleavage of wildtype pri-miRNA, and (3) optimization of cleavage timing for the actual selection. Assemble the following reaction, assuming 1 µM stock solutions of RNA and 50 nM Microprocessor complex in whole-cell lysate. As concentrations from different substrate and protein preps vary, volumes should be adjusted to match these final concentrations. 39 µl 50 µl 10 µl 1 µl 100 µl 1X Sonic Buffer Microprocessor Lysate 10X Cleavage/Binding Buffer (1 mM Mg) Circularized clonal or randomized pool (1 µM) Prior to addition of the RNA substrate, the reaction mixture should be prewarmed at 37º. After addition of substrate, withdraw 10 µl at the following times: 0 min, 20 sec, 40 sec, 1 min, 2 min, 4 min, 6 min, 8 min, 10 min, and 12 min. Each timepoint should be mixed immediately with 100 µl of ice-cold Tri-Reagent to stop the reaction. After extraction from Tri-Reagent, run the reactions on an 8% urea-polyacrylamide gel and estimate the amount of cleavage at each timepoint for both the clonal (wildtype) substrate and 174 the partially-randomized pool. For miRNAs where the flanking sequence contributes to recognition and cleavage, the clonal (wildtype) substrate is cleaved 4-5 fold more rapidly than the partially randomized pool. For miRNAs studied so far, this ratio appears to be constant for most early time points (between 20 sec and 6 min). When this is the case, we have selected a timepoint where ~1% of the partially randomized pool and used this timepoint for selection. In some situations, to achieve 1% cleavage it was necessary to adjust the ratio between Microprocessor complex and partially-randomized pool substrate. Selection of functional variants from partially-randomized circular pri-miRNA pool Using conditions optimized in the previous experiment, perform a selection for functional, Microprocessor-cleaved variants in the partially-randomized circular pri-miRNA pool. The reaction should be scaled such that ~100 fmol of cleaved product can be obtained. This corresponds to a selection for 6x1010 product molecules. After accounting for yield losses in library preparation, ~108 to 109 molecules will be available for sequencing; as of early 2012, a single lane of Illumina HiSeq is capable of producing upwards of 108 sequences. Thus, this scale is the minimum scale needed to fully leverage the current sequencing capacity. A target of 100 fmol product requires a moderately large cleavage reaction. For example, for cleavage conditions using a (typical) final reaction concentration of 10 nM pri-miRNA pool, a 1000 µl cleavage reaction will use a total of 10,000 fmol substrate. Cleavage of 1% of this amount yields 100 fmol of product for library prepation for high-throughput sequencing. After performing the selection reaction under optimized conditions, phenol-extract the reaction and precipitate it. The precipitate should be resuspended in the minimum amount of water needed to fully dissolve the pellet, and the resulting nucleic acid should separated on a 5% ureapolyacrylamide gel. Cut out the band corresponding to the flanking, partially-randomized RNA. For the S125circ example given above, the product is expected to be 119 nt. Since yield is valuable, it is advisable to use the crush-and-soak technique to purify this band (see Standard Protocols). Resuspend in 20 µl water. Library preparation for Illumina high-throughput sequencing Library preparation from the selected pool The goal of this phase is to ligate adaptors to the ends of the product sequence; these adaptors will permit amplification of the variant molecules, and contain some necessary sequence for the current (Feb. 2012) Illumina paired-end sequencing technology. We use splint ligation to assist in ligation of adaptors to the correct arm. Because both T4 RNA Ligase 2 and T4 DNA Ligase are highly sensitive to gaps or overhangs in the substrate, the splint ligation is highly selective for defined ends. In general, we have designed splints corresponding to the MirBase annotated cleavage site, which in turn is usually based on the observed ends by deep-sequencing mature miRNAs and their * species. However, recent work by the Zamore lab has suggested that the 3p arm cleavage site annotations may be incorrect, and this view is supported by our experience with splint ligation. Instead, we recommend that the 5p splint be designed according to the 175 annotated 5p arm cleavage, and the 3p arm splint be designed against an inferred cleavage site that leaves a 2-nt overhang. 176 3p Arm 5p Arm p-uggcgucu---------------------ucucuagg-OH S125circ.006 3p Arm Splint (ordering orientation) AGACGCCAAGATCGGA S125circ.007 5p Arm Splint ACGTGTACCCTAGAGA S125circ.006 3p Arm Splint (ordering orientation) AGACGCCAAGATCGGA S125circ.005 3p Arm Adaptor S125circ.007.A 5p Arm Adaptor, CAT barcode GAGATCTACACTCTTTCCCTACACGACGCTCTuccgaucuuggcgucu---------------------ucucuagg guacacguaugaGATCGGAAGAGCGGTTCAGCAGGAATGC S125circ.007 5p Arm Splint ACGTGTACCCTAGAGA (Splints do not cross-hybridize) S0.002 Solexa Rev Seq, -1 short CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATC S125circ.008 RT Primer (-1) GCATTCCTGCTGAACCGCTCTTCCGATC S0.001 Solexa Fwd Seq AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT GAGATCTACACTCTTTCCCTACACGACGCTCTuccgaucuuggcgucu---------------------ucucuaggguacacguaugaGATCGGAAGAGCGGTTCAGCAGGAATGC On a sequence level, this is the strategy for ligating and amplifying the S125circ product. The same adaptors can be used for other miRNA products, provided that a new splint is designed. Due to the circular permutation of the substrate, the 5p arm of the miRNA is at the 3′ end, and the 3p arm of the miRNA is at the 5′ end. To avoid confusion, this protocol will continue to use the 5p and 3p arm convention relative to the miRNA sequence. By way of example, after cleavage and gel purification, the desired product from S125circ selection is as follows. “NNNN” denotes partially randomized sequence in the S125circ. p-TGGCGTCTNNNNNNNNNNNNN-------------------------------NNNNNNNNNNNNNTCTCTAGG-OH There are several features of note. First, as noted above, due to circular permutation, the 5p arm basal stem sequence (TCTCTAGG) is at the 3′ end of the product molecule, while the 3p arm basal stem sequence (TGGCGTCT) is at the 5p end of the product. In the intact circular substrate, the pre-miRNA sequence joined these two ends to form a closed circle. Second, the partially randomized pool was designed such that eight unrandomized nucleotides cap the product. This will serve as the splint binding site during ligation. Third, since this product was generated by Drosha (RNase III) cleavage, it has a 5′ phosphate and 3′-OH and is therefore ready for ligation. Molecules in the product pool that were generated by other endonuclease activities, such as metal cation or base catalyzed hydrolysis and cleavage by RNase A or RNase T1, would have a 5-hydroxyls and 3′ phosphates (or cyclic 2′-3′ phosphates) and are therefore unligatable. Thus the splint ligation approach also selects for true products of Microprocessor activity, rather than unintended endonucleolytic degradation products. First, ligate adaptors to the 5p side. The following sequences were used to ligate the S125circ product pool. Note lowercase letters are RNA sequence. Also, the adaptors are synthetically 5′phosphorylated at the time of synthesis; this service can be requested from most oligo synthesis facilities. Finally, four separate adaptors were synthesized, each with a slight sequence variation. This provides the opportunity to do up to four selections using the same parent miRNA pool, and sequence the selections in the same Illumina lane (provided that each selection library uses a different adaptor barcode). Three nucleotides were used for the barcode, so in principle one could design a total of 43 or 64 barcodes. S125circ.007 5p Arm Splint ACGTGTACCCTAGAGA S125circ.007.A 5p Arm Adaptor, CAT barcode p-guacacguaugaGATCGGAAGAGCGGTTCAGCAGGAATGC S125circ.007.B 5p Arm Adaptor, ATG barcode p-guacacgucauaGATCGGAAGAGCGGTTCAGCAGGAATGC S125circ.007.C 5p Arm Adaptor, TGA barcode p-guacacguucaaGATCGGAAGAGCGGTTCAGCAGGAATGC S125circ.007.D 5p Arm Adaptor, TAG barcode p-guacacgucuaaGATCGGAAGAGCGGTTCAGCAGGAATGC 178 Assemble the ligation reaction. 1.2 µl 5 µl 1.0 µl 0.8 µl 8 µl 5p Arm Adaptor (100 µM) Gel-purified selection product 5p Arm Splint (100 µM) H2O Heat to 85º for 5 min, then air cool to room temperature +1 µl +1 µl 10 µl 10X DNA Ligase Buffer T4 DNA Ligase Incubate at 25º x overnight. Due to the short splint, do not incubate at 37º. Add 0.5 µl Turbo DNAse and incubate at 37º x 10 min to degrade the splint. After the reaction is complete, gel purify the product on a 5% urea-polyacrylamide gel using appropriate size markers. Yield may range from 10-50%, depending on the amount of product available, and the fraction of product with the desired ends. Cut out the band, elute using the crush-and-soak method, precipitate, and resuspend in 10 µl H2O. Next, ligate adaptors to the 3p arm side. The following sequences were used to ligate the S125circ product pool. Note lowercase letters are RNA sequence. Do not phosphorylate adaptors, (a) because it is not necessary, and (b) phosphorylation may make certain side products possible. S125circ.005 3p Arm Adaptor Do not phosphorylate. GAGATCTACACTCTTTCCCTACACGACGCTCTuccgaucu S125circ.006 3p Arm Splint AGACGCCAAGATCGGA Assemble the 3p arm ligation reaction. 1.2 µl 5 µl 1.0 µl 0.8 µl 8 µl 3p Arm Adaptor (100 µM) Gel-purified 5p ligation product 3p Arm Splint (100 µM) H2O Heat to 85º for 5 min, then air cool to room temperature +1 µl +1 µl 10 µl 10X DNA Ligase Buffer T4 DNA Ligase Incubate at 25º x overnight. Due to the short splint, do not incubate at 37º. 179 Add 0.5 µl Turbo DNAse and incubate at 37º x 10 min to degrade the splint. After the reaction is complete, gel purify the product on a 5% urea-polyacrylamide gel using appropriate size markers. Yield may range from 10-50%, depending on the amount of product available, and the fraction of product with the desired ends. Cut out the band, elute using the crush-and-soak method, precipitate, and resuspend in 10 µl H2O. Reverse transcribe the ligated product. Since the product is partially-randomized, the reverse transcription will be primed from the 5p arm ligation adaptor. Note that reverse transcriptase appears to be error-prone in the first few nucleotides after the primer; ideally, one or more nucleotides of constant sequence should be included before a critical sequence (e.g. the barcode) is encountered. For our selections, we used this primer. S125circ.008 RT Primer (-1) GCATTCCTGCTGAACCGCTCTTCCGATC Assemble the RT reaction: 5 µl 3p Ligation Product 5 µl H2O 6.25 µl 10X “House” dNTP Mix 0.3 µl RT primer 16.55 µl Heat to 85º for 5 min, then air cool to room temperature +5 µl +1.25 µl +2 µl ~25 µl 5X First Strand Buffer 100 mM DTT Superscript III Incubate at 55º x 2hr Base hydrolyze the RT reaction by adding 5 µl 1N NaOH and heating to 85º x 10 min. Add 25 µl of 1M HEPES-NaOH pH 7.0 and desalt over a P30 column. PCR amplify the cDNA of the selected pool. The primers hybridize to the cDNA and add the remaining sequences necessary for Illumina paired-end sequencing. It is crucial to use PAGE or some other technique to purify the primers. The sequences are: S0.001 Solexa Fwd Seq GAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT S0.002 Solexa Rev Seq, -1 short CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATC 180 Assemble the following reaction: 38 µl 0.5 µl 0.5 µl 5 µl 5 µl 1 µl 50 µl Desalted RT reaction S0.001 Solexa Fwd Primer (100 µM) S0.002 Solexa Rev Primer -1 (100 µM) 10X “House” dNTP 10X “House” Taq Buffer “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 62°C 72°C Repeat for 10 total cycles 10°C Use the following program. (hold) After 10 cycles are complete, withdraw 5 µl of the reaction and run on a 1.5% agarose gel. If product is visible, stop and purify the product. If no product is visible, amplify another 3 cycles and withdraw a second aliquot. If still no product is visible, amplify another 3 cycles and withdraw a third aliquot. If no product is visible after this stage (16 total cycles), it is likely that one must repeat the selection at a larger scale, taking greater care to preserve yield. If product is visible at any point, stop and purify. Use a 10% formamide-acrylamide gel and stain with Sybr Gold. After elution, precipitation, and resuspension, the purified DNA is ready for Illumina sequencing. Library preparation from the reference (uncleaved) pool To discover contributing sequence elements, we need to determine whether the sequences in the selected pool deviate in abundance from expectation. To minimize bias, we directly sample the unselected pool by amplifying the unselected pool RNA and preparing high-throughput sequencing libraries. Since no cleavage has occurred, the ligation-RT-amplification approach taken for the selected is neither necessary nor possible. Instead, the uncircularized, phosphorylated pool (before circularization ligation) is directly reverse transcribed, amplified, and sequenced. For the unselected S125circ pool, the reverse transcription primer was: S125circ.009 Ref RT TGGATGTCCTCACAGGTTAAAGGGTCTCAGGGACCTAG 181 Assemble the following reaction: 0.5 µl Phosphorylated, uncircularized pool RNA 9.5 µl H2O 6.25 µl 10X “House” dNTP Mix 0.3 µl RT primer 16.55 µl Heat to 85º for 5 min, then air cool to room temperature +5 µl +1.25 µl +2 µl ~25 µl 5X First Strand Buffer 100 mM DTT Superscript III Incubate at 55º x 2hr Base hydrolyze the RT reaction by adding 5 µl 1N NaOH and heating to 85º x 10 min. Add 25 µl of 1M HEPES-NaOH pH 7.0 and desalt over a P30 column. We will add the Illumina sequences needed for sequencing by PCR, in two stages. The primers for the first stage amplification of S125circ are as follows. S125circ.014 Ref-II Fwd Primer CTTTCCCTACACGACGCTCTTCCGATCTCAGGTGAGGTTCTTGGGAGCCTGGC S125circ.015 Ref-II Rev Primer GCATTCCTGCTGAACCGCTCTTCCGATCTTTAAAGGGTCTCAGGGACCTAGAG Assemble the following reaction: 38 µl 0.5 µl 0.5 µl 5 µl 5 µl 1 µl 50 µl Desalted RT reaction S125circ.014 Fwd (100 µM) S125circ.015 Rev (100 µM) 10X “House” dNTP 10X “House” Taq Buffer “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 55°C 72°C Repeat for 3 total cycles 10°C Use the following program. (hold) Polish the pool for Illumina sequencing using the following primers (second stage): 182 S0.001 Solexa Fwd Seq GAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT S0.002 Solexa Rev Seq, -1 short CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATC Assemble the following reaction: 5 µl 0.5 µl 0.5 µl 5 µl 5 µl 33 µl 1 µl 50 µl Stage 1 PCR Reaction S0.001 Solexa Fwd (100 µM) S0.002 Solexa Rev (100 µM) 10X “House” dNTP 10X “House” Taq Buffer H2O “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 55°C 72°C Repeat for 8 total cycles 10°C Use the following program. (hold) Run a 5 µl aliquot on a 1.5% agarose gel. If no product is visible, amplify an additional 3 cycles. When product is faintly visible on agarose, run the remaining reaction on a 10% formamideacrylamide gel and stain with Sybr Gold. After elution, precipitation, and resuspension, the purified DNA is ready for Illumina sequencing. 183 In vitro cleavage selection of linear pri-miRNA substrates Overview The goal of this selection is to explore Microprocessor cleavage determinants the stem-loop region (what will µltimately be the pre-miRNA). The overall strategy is to generate a large pool of variant molecules and subject this pool to Microprocessor cleavage. Products of cleavage are recovered and sequenced by high-throughput sequencing to detect motifs that are enriched in the product (Microprocessor-cleaved) population, relative to the original pool. For RNA motifs flanking the pre-miRNA, a major hindrance to this strategy was that after cleavage of linear pri-miRNA substrates, there will be three products: (1) the pre-miRNA, (2) a 5p (upstream) product, and (3) a 3p (downstream) product. By contrast, motifs in the loop and apical stem are relatively easy to explore, since the entire partially randomized region is contained within a single product molecule, which can be recovered and sequenced. We would like to generate a large number of variants, recover the variants that are cleaved by the Microprocessor, and sequence the successful variants. 184 Assembly of linear, partially-randomized, pri-miRNA substrate Template assembly by PCR Templates for T7 transcription are assembled from synthetic oligonucleotides. IDT will synthesize long oligos with both constant sequence and partially randomized positions (for a price). By way of example, oligonucleotide sequences are provided for the constant (unrandomized) C125loop template. However, the clonal pool (composed many copies of the wildtype miRNA sequence) should be prepared in parallel with the partially randomized pool (composed of a few copies of many variant sequences). Oligos for the clonal pool are the same as oligos for the randomized pool, except that nucleotides in red were partially randomized to 79% wildtype base and 7% chance of each nonwildtype base. S125loop.001a Central ACCATGTTGCCAGTCTCTAGGTCCCTGAGACCCTTTAACCTGTGAGGACATCCAGGGTCACAGGT GAGGTTCTTGGGAGCCTGGCGTCTGGCCCAACCAC S125loop.002 Left Arm CAGAGATGCATAATACGACTCACTATAgCCCCCACCCCAGGGTCTACCGGGCCACCGCACACCAT GTTGCCAGTCTCTAGG S125loop.003 Right Arm GGTCAGAAGTCAGGCCAGCAATTCCCCAGGTGTGTGGTTGGGCCAGACGCCA First, PAGE-purify long primers. In general, primers longer than 60 nt are difficult even for commercial operations to synthesize, and should be purified before use. Resuspend purified oligos to 100 µM where possible. The T7 template can be generated by a single PCR reaction. In the first cycle, the “Central” oligo acts as a template for an initial primer extension by the “Right Arm” oligo. In the second cycle, the extended “Right Arm” DNA acts a template for a second primer extension by the “Left Arm.” After two cycles, the reaction should proceed as a normal PCR. Assemble the following: 370 µl 0.3 µl 10 µl 10 µl 50 µl 50 µl 10 µl 500 µl H2O S125loop.001a (100 µM) S125loop.002 (100 µM) S125loop.003 (100 µM) 10X “House” PCR Buffer 10X “House” dNTP Mix “House” Taq 185 Use this program: 30 sec 30 sec 30 sec 30 sec (hold) 95°C 95°C 52°C or appropriate annealing temp. 72°C Repeat for 4 total cycles 10°C Ethanol precipitate the reaction and resuspend in 50 µl H2O. Transcription of linear pri-miRNA pool Use T7 RNA Polymerase to transcribe the template prepared in the previous step. A 200 µl reaction scale should be sufficient for most applications, but more is always possible. 25 µl 195 µl 40 µl 80 µl 20 µl 20 µl 20 µl 400 µl Concentrated template pool H2O 10X “House” T7 Buffer 5X “House” NTP Mix DTT (100 mM) Fresh α-[32P]-UTP “House” T7 RNA Polymerase Incubate 37º x 2-3 hr Add 5 µl Turbo DNAse (Ambion) Incubate 37º x 30 min Add 1/20 volume 500 mM EDTA Add 1/10 volume 3M NaCl Add 1 volume 100% ethanol Final: T7 reaction mixture with whatever NTP/Salt/Mg/PPi is present ~25 mM EDTA ~75 mM NaCl (supplemental) 50% ethanol Incubate at least 15 min in -20 Spin 15 min at 4º After precipitation, the reaction should be purified on a 5% Urea-polyacrylamide gel with an appropriate size marker. Cut out and elute the mature, self-cleaved transcript. This is the linear pri-miRNA pool. 186 For the C125loop primers given above, the expected product sequence is given below. The S125loop partially randomized transcript was twin to this sequence, except that positions colored red were partially randomized at 79% wildtype (indicated base) and 7% each of the other three bases. These positions comprise the apical miRNA stem and the loop. ppp-5′gCCCCCACCCCAGGGTCTACCGGGCCACCGCACACCATGTTGCCAGTCTCTAGGTCCCTGAGACCCTTTA ACCTGTGAGGACATCCAGGGTCACAGGTGAGGTTCTTGGGAGCCTGGCGTCTGGCCCAACCACACACCTG GGGAATTGCTGGCCTGACTTCTGACC-3′-OH Selection of functional variants from linear pri-miRNA pool Cleavage timecourse of clonal wildtype pool and partially-randomized pool This experiment serves three purposes: (1) verification that the linear pri-miRNA substrates are cleavage competent molecules, (2) estimation of the overall contribution of apical stem and/or loop RNA sequence by comparing the cleavage of the partially-randomized pool to the cleavage of wildtype pri-miRNA, and (3) optimization of cleavage timing for the actual selection. Assemble the following reaction, assuming 1 µM stock solutions of RNA and 50 nM Microprocessor complex in whole-cell lysate. As concentrations from different substrate and protein preps vary, volumes should be adjusted to match these final concentrations. 39 µl 50 µl 10 µl 1 µl 100 µl 1X Sonic Buffer Microprocessor Lysate 10X Cleavage/Binding Buffer (1 mM Mg) Linear clonal or randomized pool (1 µM) Prior to addition of the RNA substrate, the reaction mixture should be prewarmed at 37º. After addition of substrate, withdraw 10 µl at the following times: 0 min, 20 sec, 40 sec, 1 min, 2 min, 4 min, 6 min, 8 min, 10 min, and 12 min. Each timepoint should be mixed immediately with 100 µl of ice-cold Tri-Reagent to stop the reaction. After extraction from Tri-Reagent, run the reactions on an 10% urea-polyacrylamide gel and estimate the amount of cleavage at each timepoint for both the clonal (wildtype) substrate and the partially-randomized pool. For selection (next section), we have previously selected timepoints where ~1% of the partially randomized pool and used this timepoint for selection. In some situations, to achieve 1% cleavage it was necessary to adjust the ratio between Microprocessor complex and partially-randomized pool substrate. Selection of functional variants from partially-randomized linear pri-miRNA pool Using conditions optimized in the previous experiment, perform a selection for functional, Microprocessor-cleaved variants in the partially-randomized linear pri-miRNA pool. The reaction should be scaled such that ~10 fmol of cleaved product can be obtained. This 187 corresponds to a selection for 6x109 product molecules. There is very little loss in the library preparation, so essentially all of these molecules will be available for sequencing. A target of 10 fmol product requires just a small cleavage reaction. For example, for cleavage conditions using a (typical) final reaction concentration of 10 nM pri-miRNA pool, a 100 µl cleavage reaction will use a total of 1,000 fmol substrate. Cleavage of 1% of this amount yields 10 fmol of product for library prepation for high-throughput sequencing. After performing the selection reaction under optimized conditions, phenol-extract the reaction and precipitate it. The precipitate should be resuspended in the minimum amount of water needed to fully dissolve the pellet, and the resulting nucleic acid should separated on a 10% ureapolyacrylamide gel. Cut out the band corresponding to the pre-miRNA. For the S125loop example given above, the product is expected to be 60 nt. Since yield is valuable, it is advisable to use the crush-and-soak technique to purify this band (see Standard Protocols). Resuspend in 20 µl water. Library preparation for Illumina high-throughput sequencing Library preparation from the selected pool and reference pool The goal of this phase is to add adaptors to the ends of the product sequence; these adaptors will permit amplification of the variant molecules, and contain some necessary sequence for the current (Feb. 2012) Illumina paired-end sequencing technology. We use PCR with long primers to achieve this. Reverse transcribe the pre-miRNA product, or the reference pool RNA. Note that reverse transcriptase appears to be error-prone in the first few nucleotides after the primer; ideally, one or more nucleotides of constant sequence should be included before a critical sequence (e.g. randomized sequence) is encountered. For our selections, we used this primer. S125loop.004 RT Primer GGCATAGGCTCCCAAGAACCTC Assemble the RT reaction: 10 µl Pre-miRNA product or reference pool 6.25 µl 10X “House” dNTP Mix 0.3 µl RT primer 16.55 µl Heat to 85º for 5 min, then air cool to room temperature +5 µl +1.25 µl +2 µl ~25 µl 5X First Strand Buffer 100 mM DTT Superscript III Incubate at 55º x 2hr 188 Base hydrolyze the RT reaction by adding 5 µl 1N NaOH and heating to 85º x 10 min. Add 25 µl of 1M HEPES-NaOH pH 7.0 and desalt over a P30 column. PCR amplify the cDNA of the selected pool. The primers hybridize to the cDNA and will add (1) a short barcode, and (2) part of the sequence needed for Illumina single-end sequencing. It is crucial to use PAGE or some other technique to purify the primers. The reverse primer will be the RT primer used above. The forward primers vary (each contains a different barcode, or no barcode at all). We used: S125loop.005 Init. PCR Fwd GACGATCTCCCTGAGACCCTTTAA S125loop.005a Init. PCR Fwd, barcoded GACGATCgaTCCCTGAGACCCTTTAA S125loop.005b Init. PCR Fwd, barcoded GACGATCctTCCCTGAGACCCTTTAA Assemble the following reaction: 20 µl 1 µl 1 µl 10 µl 10 µl 57 µl 1 µl 100 µl Desalted RT reaction S125loop.005 Series Primer (100 µM) S125loop.004 RT Primer (100 µM) 10X “House” dNTP 10X “House” Taq Buffer H2O “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 50°C 72°C Repeat for 5 total cycles 10°C Use the following program. (hold) Without purifying the PCR reaction, assemble the second-stage PCR, which adds the remaining sequences necessary for Illumina single-end sequencing. The primers used are specific to the barcode used in the initial reaction, i.e. if 005a was used for the first reaction, 007a should be used for the second reaction. It is essential to purify these primers to get optimal sequencing results. S125loop.006corr Solexa-R Adaptor CAAGCAGAAGACGGCATACGAGGCTCCCAAGAACCTC 189 S125loop.007 Solexa-Seq Adaptor AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCTCCCTGAGACCCTTTAA S125loop.007a Solexa-Seq Adaptor, barcoded AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCgaTCCCTGAGACCCTTTAA S125loop.007b Solexa-Seq Adaptor, barcoded AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGACGATCctTCCCTGAGACCCTTTAA Assemble the following reaction: 20 µl 2 µl 2 µl 20 µl 20 µl 2 µl 100 µl Initial PCR Reaction S125loop.007 Series Primer (100 µM) S125loop.006corr R. Primer (100 µM) 10X “House” dNTP 10X “House” Taq Buffer “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 55°C 72°C Repeat for 5 total cycles 10°C Use the following program. (hold) Run a 5 µl aliquot on a 1.5% agarose gel. If no product is visible, amplify an additional 3 cycles. When product is faintly visible on agarose, run the remaining reaction on a 10% formamideacrylamide gel and stain with Sybr Gold. Due to the nature of the PCR-addition approach, there may be four dominant products visible on the gel: (1) Product from the initial PCR reaction, which may be further amplified by the dilute oligos that are still around, at 73-75 bp; (2) products of reactions where only the reverse extended arm has been added, at 85-87 bp; (3) products of reactions where only the forward extended arm has been added, at 115-117 bp; and (4) full-length product for Illumina sequencing, at 127-129 bp. This is the band that should be excised. After elution, precipitation, and resuspension, the purified DNA is ready for Illumina sequencing. 190 In Vitro Binding Selection for Functional pri-miRNAs Preparation of immunopurified Drosha-TN/DGCR8 Overview 293T cells are transiently transfected with plasmids encoding C-terminal FLAG tagged human DroshaTN (a dominant-negative form of Drosha in which the two RNase III domains are mutated), and N-terminal FLAG-HA tagged human DGCR8. Cells are harvested and sonicated. After centrifuge clearing of precipitates and insoluble material, the whole-cell lysate is stored in single-use aliquots. This protocol is modified from Lee and Kim, Meth. Enz. 427 (2007), with additional input from Jinju Han and V. Narry Kim. Plasmids: • • • pMAX-GFP (transfection marker) pFLAG-HA-DGCR8 (courtesy of T. Tuschl) pCK-DroshaTN-FLAG (courtesy of V. Narry Kim) Cell culture and transfection Maintain 293T cells under standard conditions. We use Dulbecco’s Modified Eagle Medium (DMEM) supplemented to 10% Fetal Bovine Serum. To transfect cells, start with three 15-cm dishes at 90-100% confluence. Split all of these cells into eight 15-cm dishes the day before transfection. Use 15 ml media per plate. The day after splitting (18-24 hours later), begin transfection. Assemble Lipofectamine dilution in one 50 ml Falcon tube: 720 µl Lipofectamine 2000 24 ml Opti-mem I Serum-Free Media Incubate at room temperature for 5 min. Meanwhile, assemble plasmid mixture in another 50 ml Falcon tube: 24 µg pMAX-GFP 60 µg pCK-Drosha-FLAG 60 µg pFLAG-HA-DGCR8 24 ml Opti-mem I Serum-Free Media Add plasmid mixture to Lipofectamine dilution, and gently invert twice to mix. complexes to form by incubating at room temperature for 20-30 min. Allow After complexes have formed, add 6 ml of mixture dropwise to each 15-cm dish. Rock plate gently 1-2 times to ensure even dispersal of transfection complexes. 191 12-24 hours after transfection, check for expression of GFP. At this relatively early point, 50%75% of cells will already be expressing GFP. Split all the transfected cells into twenty 15-cm dishes. Allow cells to grow for 48 hr. After this period, cells should be at or just past confluence. Frequently, with a successful transfection containing pMAX-GFP, the monolayer is faintly green in room lighting. Cells are now ready for harvesting. Lysate preparation Harvest overexpressing cells by removing media and pipetting PBS onto the monolayer. Due to the poor adherence of 293T cells, this is typically enough to dislodge the monolayer. Collect PBS suspension of cells, and keep on ice. Pellet PBS suspension by centrifugation. This pellet should be green from GFP in the cells; there should be little or no visible fluorescence in the supernatant. Prepare 1X Sonic Buffer without any reducing agents. Add protease inhibitors by dissolving a Mini EDTA-Free Protease Inhibitor tablet in 10 ml of 1X Sonic Buffer (no reducing agents). Keep on ice. Resuspend cell pellet in 10 ml of 1X Sonic Buffer + protease inhibitors. Prepare sonicator for use. We use a Branson Sonifier 250 at 50% duty cycle and output level 4. Clean the probe by sonicating RNase-ZAP solution for 10 pulses (approximately 20 seconds at 50% duty cycle), then sonicating deionized water for 10 pulses. Lyse 293T cells by sonicating for 10 pulses. Clear the lysate once by centrifuging lysate at 3500 x g for 15 min. If lysis is successful, the pellet should be yellow and perhaps light-green, while the supernatant is green. Collect supernatant. Clear supernatant once more by centrifuging at 3500 x g for 15 min. Collect supernatant. This is the cleared whole-cell extract. Immunoprecipitation with anti-FLAG beads Prepare 100 µl Anti-FLAG agarose or magnetic beads according to manufacturer’s directions. Wash twice more with 1X Sonic Buffer, without reducing agents. Add beads to cleared whole-cell extract. Rotate or agitate at 4º x 4-18hr. 192 Pellet the beads by centrifugation or magnet. Pull off the supernatant and wash 3 times in 1X Sonic Buffer + protease inhibitors, without reducing agents. Each wash should be at least 10fold larger than the volume of the packed beads, and each wash should incubate with the beads at 4º for at least 10 minutes. After the final wash, elute with a combination of FLAG peptide and reducing agent. Prepare the following elution buffer: 1X Sonic Buffer with Protease Inhibitors 0.7 µl/ml 2-mercaptoethanol 50% glycerol 100 ng/ul RNase-free BSA 100 ng/ul Yeast total RNA Use at least 3-fold more than the packed bead volume. Aliquot eluted complex into single-use aliquots and store in the vapor phase of liquid nitrogen. 193 In vitro binding selection of linear pri-miRNA substrates Overview The goal of this selection is to explore Microprocessor cleavage determinants the flanking regions upstream and downstream of the pre-miRNA. The overall strategy is to generate a large pool of variant molecules and subject this pool to a binding reaction with immunopurified Drosha-TN and DGCR8. We would like to generate a large number of variants, recover the variants that are bound by the Microprocessor, and sequence the successful variants. We seek motifs that are enriched in the product (Microprocessor-bound) population, relative to the original pool. Since dominant-negative Drosha with mutations in the RNase III domains is used in this experiment, the pri-miRNA substrate should remain intact after binding. Thus it is possible to simply reverse-transcribe and amplify the bound RNA for Illumina high-throughput sequencing. 194 Assembly of linear, partially-randomized, pri-miRNA substrate Template assembly by PCR Templates for T7 transcription are assembled from synthetic oligonucleotides. IDT will synthesize long oligos with both constant sequence and partially randomized positions (for a price). By way of example, oligonucleotide sequences are provided for the constant (unrandomized) C125 template. However, the clonal pool (composed many copies of the wildtype miRNA sequence) should be prepared in parallel with the partially randomized pool (composed of a few copies of many variant sequences). Oligos for the clonal pool are the same as oligos for the randomized pool, except that nucleotides in red were partially randomized to 79% wildtype base and 7% chance of each nonwildtype base. Note that, on the left arm, 15 nucleotides of nongenomic sequence are included. These are part of the sequences needed for Illumina pairedend sequencing. C125.001 Left Arm acgctcttccgatctCCCCAGGGTCTACCGGGCCACCGCACACCATGTTGCCAGTCTCTAGGTCC CTGAGACCCTTTAACCTGTGAGGACATCCAGGGTC C125.002 Right Arm AGGAGTCAGGGGTCAGAAGTCAGGCCAGCAATTCCCCAGGTGTGTGGTTGGGCCAGACGCCAGGC TCCCAAGAACCTCACCTGTGACCCTGGATGTCCTC First, PAGE-purify long primers. In general, primers longer than 60 nt are difficult even for commercial operations to synthesize, and should be purified before use. Resuspend purified oligos to 100 µM where possible. An initial primer extension between the left and right primers is used to generate a small amount of template for PCR. Assemble the following reaction: 15 µl 15 µl 270 µl 40 µl 40 µl 20 µl 400 µl Left Arm oligo (100 µM) Right Arm oligo (100 µM) H2O 10X “House” PCR Buffer 10X “House” dNTP Mix “House” Taq 195 Perform a primer extension using the program “L/R Ext.” 30 sec (ramp) 1 min 1 min (hold) 95°C Ramp to 37°C at 0.1°C/sec 37°C 72°C Repeat for 5 total cycles 10°C The T7 template can be generated a PCR reaction, using primers that add the T7 promoter, along with two template C residues (so that T7 can initiate with GTP): S0.002 T7-fwdSeq CAGAGATGCATAATACGACTCACTATAggacacgacgctcttccgatct S125.003 RT primer TATGAGGAGTCAGGGGTCAG Without purifying or concentrating the primer extension, assemble the following reaction: 30 µl 355 µl 5 µl 5 µl 50 µl 50 µl 5 µl 500 µl Primer extension reaction H2O S0.002 T7-fwdSeq (100 µM) S125.003 RT Primer (100 µM) 10X “House” PCR Buffer 10X “House” dNTP Mix “House” Taq Perform an initial PCR amplification using the program “L/R PCR.” Note that the annealing temperature is based on the amount of homology between the T7 adaptor and the left arm oligo, and the amount of homology between the RT primer and the right arm oligo. 30 sec 30 sec 30 sec 30 sec (hold) 95°C 95°C 55°C or appropriate annealing temp. 72°C Repeat for 4 total cycles 10°C Ethanol precipitate reaction and resuspend in 50 µl H2O. Transcription of linear pri-miRNA pool Use T7 RNA Polymerase to transcribe the template prepared in the previous step. A 400 µl reaction scale should be sufficient for most applications, but more is always possible. 196 25 µl 195 µl 40 µl 80 µl 20 µl 20 µl 20 µl 400 µl Concentrated template pool H2O 10X “House” T7 Buffer 5X “House” NTP Mix DTT (100 mM) Fresh α-[32P]-UTP “House” T7 RNA Polymerase Incubate 37º x 2-3 hr Add 10 µl Turbo DNAse (Ambion) Incubate 37º x 30 min Add 1/20 volume 500 mM EDTA Add 1/10 volume 3M NaCl Add 1 volume 100% ethanol Final: T7 reaction mixture with whatever NTP/Salt/Mg/PPi is present ~25 mM EDTA ~75 mM NaCl (supplemental) 50% ethanol Incubate at least 15 min in -20 Spin 15 min at 4º After precipitation, the reaction should be purified on a 5% Urea-polyacrylamide gel with an appropriate size marker. Cut out and elute the full-length transcript. This is the linear primiRNA pool. For the C125 primers given above, the expected product sequence is given below. The S125 partially randomized transcript was twin to this sequence, except that positions colored red were partially randomized at 82% wildtype (indicated base) and 6% each of the other three bases. These positions comprise the flanking RNA sequence, up to (but not including) the basal stem. ppp-5′ggacacgacgctcttccgatctCCCCCACCCCAGGGTCTACCGGGCCACCGCACACCATGTTGCCAGTCT CTAGGTCCCTGAGACCCTTTAACCTGTGAGGACATCCAGGGTCACAGGTGAGGTTCTTGGGAGCCTGGCG TCTGGCCCAACCACACACCTGGGGAATTGCTGGCCTGACTTCTGACCCCTGACTCCTCATA-3′-OH Selection of functional variants from linear pri-miRNA pool Trial competitive binding between clonal wildtype RNA and partially-randomized pool This experiment serves to estimate the overall contribution of flanking RNA sequence by comparing the binding of the partially-randomized pool to the binding of wildtype pri-miRNA. 197 First, transcribe a reference pri-miRNA sequence that is 15-20 nt shorter than the pool RNAs. For mir-125a, the following sequence can be ordered from IDT and transcribed using T7 polymerase (see Standard Protocols). The RNA can be body labeled, or 5′ dephosphorylated and 5′ labeled with ATP. See Standard Protocols. TATGAGGAGTCAGGGGTCAGAAGTCAGGCCAGCAATTCCCCAGGTGTGTGGTTGGGCCA GACGCCAGGCTCCCAAGAACCTCACCTGTGACCCTGGATGTCCTCACAGGTTAAAGGGT CTCAGGGACCTAGAGACTGGCAACATGGTGTGCGGTGGCCCGGTAGACCCTGGGGTGGG GGcTATAGTGAGTCGTATTATGCATCTCTG To perform the competitive binding experiment, assemble the following reaction, assuming 5 µM stock solutions of RNA. Although we do not have a measurement of the functional Microprocessor concentration, it is likely to be much less than the concentration of RNA used in the experiment. 20 µl 1.25 µl 1.25 µl 2.5 µl 25 µl DroshaTN/DGCR8 eluate Short Reference pri-miRNA (5 µM) C125 wildtype or S125 pool RNA (5 µM) 10X Cleavage/Binding Buffer (1 mM Mg) Allow binding reaction to proceed at room temperature for 15-30 min. “unbound” sample and store in 1X urea loading dye. Withdraw 0.5 µl Prepare nitrocellulose filters for binding. Place pedestals on vacuum system (we use an older vacuum setup that is totally unlabeled and inherited from David P. Bartel’s younger days; in principle, a new Qiagen vacuum manifold could be used). Wet filters by adding 500 µl 1X Sonic Buffer. Apply binding reaction to the center of a nitrocellulose filter, under vacuum. Wash the filter three times. For each wash, add at least 10-fold more than the reaction volume of 1X Sonic Buffer. Elute protein-RNA complexes from the filter by carefully lifting the filter and placing in a microcentrifuge tube. Add 500 µl 1X VCA Elution Buffer, and heat to 85º x 10 min. Vortex. Withdraw elution buffer and ethanol precipitate by adding 1000 µl 100% ethanol. After precipitation, estimate the number of filter-retained counts in the pellet. Resuspend the pellet in an appropriate amount of 1X urea loading dye. Run an equal counts between the bound and unbound sample (retained above) on a 4% urea-polyacrylamide gel. The gel should be run such that the xylene dye is almost at the bottom of the gel in order to maximize the resolution at ~180 nt. 198 Quantify the ratio of long (pool) RNA to short (reference) RNA, and normalize by the ratio in the input lane. The normalized ratio is an estimate of the relative binding of the long RNA vs. the shorter RNA. Ideally, the relative binding between the reference RNA and C125 (the clonal, wildtype, long RNA) is ~1, while the relative binding between the reference RNA and S125 (the partially-randomized, long RNA) is >1, i.e. binding favors the reference RNA. Selection of functional variants from partially-randomized linear pri-miRNA pool Perform a selection for functional, Microprocessor-binding variants in the partially-randomized linear pri-miRNA pool. Assemble the following reaction: 40 µl 5 µl 5 µl 50 µl DroshaTN/DGCR8 eluate C125 wildtype or S125 pool RNA (5 µM) 10X Cleavage/Binding Buffer (1 mM Mg) Apply binding reaction to the center of a nitrocellulose filter, under vacuum. Wash the filter three times. For each wash, add at least 10-fold more than the reaction volume of 1X Sonic Buffer. Elute protein-RNA complexes from the filter by carefully lifting the filter and placing in a microcentrifuge tube. Add 500 µl 1X VCA Elution Buffer, and add 1 µl Yeast Total RNA to act as a tube-blocking and co-precipitating agent. Heat to 85º x 10 min. Vortex. Withdraw elution buffer and ethanol precipitate by adding 1000 µl 100% ethanol. Resuspend the pellet directly in the following reaction. Reverse transcribe the RNA. For the pri-mir-125 binding selection, the following primer was used: S125.003 RT primer TATGAGGAGTCAGGGGTCAG Assemble the RT reaction. (pellet) Eluted bound RNA 10 µl H2O 6.25 µl 10X “House” dNTP Mix 0.3 µl RT primer 16.55 µl Heat to 85º for 5 min, then air cool to room temperature +5 µl +1.25 µl +2 µl ~25 µl 5X First Strand Buffer 100 mM DTT Superscript III Incubate at 55º x 2hr Base hydrolyze the RT reaction by adding 5 µl 1N NaOH and heating to 85º x 10 min. Add 25 µl of 1M HEPES-NaOH pH 7.0 and desalt over a P30 column. 199 PCR amplification for downstream applications Template preparation for the next round of selection The goal of this phase is to add adaptors to the ends of the product sequence; these adaptors will permit amplification and then transcription of the variant molecules. We use PCR with long primers to achieve this. PCR amplify the cDNA of the selected pool using the same primers used to generate the original T7 template. S0.002 T7-fwdSeq CAGAGATGCATAATACGACTCACTATAggacacgacgctcttccgatct S125.003 RT primer TATGAGGAGTCAGGGGTCAG Assemble the following reaction: 55 µl 4 µl 4 µl 40 µl 40 µl 253 µl 4 µl 400 µl Desalted RT reaction S125.003 RT Primer (100 µM) S0.002 T7-fwdSeq (100 µM) 10X “House” dNTP 10X “House” Taq Buffer H2O “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 55°C 72°C Repeat for 6 total cycles 10°C Use the following program. (hold) Run a 10 µl aliquot on a 1.5% agarose gel. If no product is visible, amplify an additional 3 cycles, and check again; we often get visible product around 10 cycles. When product is faintly visible on agarose. Ethanol precipitate the PCR reaction and resuspend in 50 µl H2O. This reaction can be used for T7 transcription for the next round of selection. Library preparation for Illumina high-throughput sequencing Any selected T7 template pool can be “polished” for Illumina high-throughput sequencing. To sequence the initial pool (Pool 0), the pool RNA should be reverse-transcribed and amplified as 200 above, in order to fully capture any biases that were introduced by reverse transcription and PCR. The “polishing” primers add (1) a short barcode, and (2) part of the sequence needed for Illumina single-end sequencing. It is crucial to use PAGE or some other technique to purify the primers. The reverse primers vary (each contains a different barcode, in blue). We used: S0.001 Solexa Fwd Seq AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT S125.005.A Solexa Reverse Paired-End Sequencing, barcoded CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTCATTATGAGG AGTCAGGGGTCAG S125.005.B Solexa Reverse Paired-End Sequencing, barcoded CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTATGTATGAGG AGTCAGGGGTCAG S125.005.C Solexa Reverse Paired-End Sequencing, barcoded CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTTGATATGAGG AGTCAGGGGTCAG S125.005.D Solexa Reverse Paired-End Sequencing, barcoded CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCTTAGTATGAGG AGTCAGGGGTCAG Assemble the following reaction: 0.3 µl 59 µl 1 µl 1 µl 10 µl 10 µl 1 µl 100 µl Pool DNA H2O S125.005 Series Primer (100 µM) S0.001 Solexa Fwd Seq (100 µM) 10X “House” dNTP 10X “House” Taq Buffer “House” Taq 30 sec 30 sec 30 sec 30 sec 95°C 95°C 55°C 72°C Repeat for 4 total cycles 10°C Use the following program. (hold) 201 Ethanol precipitate the PCR reaction and run on a 10% formamide-acrylamide gel. Stain with Sybr Gold and cut out the product. After elution, precipitation, and resuspension, the purified DNA is ready for Illumina sequencing. 202 Identification of Motif-Binding Proteins by Site-Specific Crosslinking Overview This protocol outlines a method to identify proteins bound to a particular RNA motif. Typically, a specific region of RNA is thought to be functionally important, and the underlying assumption of this protocol is that this region provides a sequence-specific binding site for important proteins. Given the motif, a 4-thiouridine (or, in principle, another reactive nucleoside analog) is incorporated in specific positions, preferably at positions known to be important for function. After 365 nm UV-mediated crosslinking of the 4-thiouridine to protein candidates, the covalently-linked protein-RNA complex is subjected to further analysis to characterize the unknown protein. A schematic of this process, in the context of pri-miRNAs, is provided below. Protocol development was heavily influenced by Sontheimer EJ. Site-Specific Crosslinking with 4-thiouridine. Mol. Biol. Rep. (1994). 203 Assembly of 4-thiouridine containing RNA substrate A 4-thiouridine containing substrate must be assembled or synthesized such that the following conditions are met: 1. The 4-thiouridine is incorporated at a position thought to be important for binding of the protein-of-interest. 2. A radioactive phosphate is incorporated at or near the 4-thiouridine site, such that after complete digestion of the RNA by RNase (preferably RNase T1), the fragment of RNA that contains the 4-thiouridine also contains the radioactive phosphate. 3. A biotin moiety is incorporated such that after binding to the protein complex of interest, the piece of RNA that contains the 4-thiouridine also contains the biotin moiety. However, the biotin should be incorporated such that cleavage by RNase (preferably RNase T1) separates the biotin from the 4-thiouridine. We have assembled substrates with these properties by splint ligating together a mix of RNAs containing a particular 4-thiouridine, RNAs containing biotin, and RNAs containing a 5′-32P. RNAs with 4-thiouridine were synthesized commercially by Dharmacon (we have not been able to find a supplier for the (4-S-U)pG suggested by Sontheimer); likewise, RNAs containing 3′ or 5′ biotin were synthesized commercially by Dharmacon or IDT. Other RNAs were synthesized by T7 transcription, where possible. See Supplement 1 for protocols related to splint ligation, 5′ phosphorylation, and T7 transcription. It is essential to perform all operations, to the greatest extent possible, under light-protected conditions. As an example, we have constructed a pri-miRNA substrate that contains a 4-thiouridine in the CNNC motif, located in the 3p flanking region of the pri-miRNA. The first step is to use T7 transcription to synthesize the “left” arm of the substrate, which includes the 5p flanking region of the miRNA, the 5p arm of the hairpin, the loop, the 3p arm, and part of the 3p flanking region. This is the T7 template used for this purpose: CX30.017 (E) T7-L.Arm Template TCCGAGGCAGTAGGCAGCTGCAAACATCCGACTGAAAGCCCATCTGTGGCTTCACAGCTTCCAGT CGAGGATGTTTACAGTCGCTCACTGTCAACAGCAATATACCTTCTTTAGCCTTCTGTTGGGTTAA CCTATAGTGAGTCGTATTATGCATCTCTG The transcription product was gel purified. Next, a 4-thiouridine containing RNA was purchased from Dharmacon: CX30.020RNA CU(4-S-U)CAAGGG This RNA was 5′ phosphorylated with γ-32P-ATP and polynucletide kinase, then ligated to the CX30.017 transcript by splint ligation with T4 RNA Ligase 2 and the following DNA splint from IDT. 204 CX30.029 3p Splint GCTCCTAAAGTAGCCCCTTGAAGTCCGAGGCAGTA Finally, the following RNA was purchased from Dharmacon. CX30.028 3p Biotin Fragment GCTACTTTAGGAGCAATTATC-3′-Biotin This RNA was 5′ phosphorylated with cold ATP and polynucleotide kinase, then ligated to the 017+020 ligation product using T4 RNA Ligase 2 and the same DNA splint (CX30.029). The resulting ligation product looks like this: 5′pppGGTTAACCCAACAGAAGGCTAAAGAAGGTATATTGCTGTTGACAGTGAGCGACTGTAAA CATCCTCGACTGGAAGCTGTGAAGCCACAGATGGGCTTTCAGTCGGATGTTTGCAGCTGCCT ACTGCCTCGGACT(4-S-U)CAAGGGGCTACTTTAGGAGCAATTATC-3′-biotin The product was validated by RNaseH cleavage at various places in the 5p sequence, the hairpin loop, and the 3p sequence. In addition, the ligation product is cleavage-competent in Drosha/DGCR8 overexpressing lysate. RNA-protein crosslinking and complex analysis Optimization of 365 nm UV dose The goal of UV dose optimization is to maximize the amount of crosslinking to proteins of interest, while minimizing other, possibly nonspecific crosslinks. In practice it seems to be difficult to distinguish between the two, so we generally choose a UV dose to achieve crosslinking just shy of maximum. First, assemble a binding reaction under conditions where the putative protein of interest is known to bind the RNA. Presumably, conditions in which the RNA is functional should be appropriate conditions to perform the binding reaction. For the Microprocessor complex and binding to the CNNC motif, we have used the following reaction binding and/or cleavage reaction: 39 µl 50 µl 10 µl 1 µl 100 µl 1X Sonic Buffer Microprocessor Lysate from 293T 10X Cleavage/Binding Buffer (1 mM Mg) 4-S-U containing RNA substrate (1 µM) 205 Prepare a platform for crosslinking. We wrap an aluminum heat block or aluminum tube rack in parafilm, and inverted it so that the binding reaction can be spotted on to a flat, parafilm-covered surface, according to this cross-sectional diagram. The advantage of using aluminum as a solid support is that the block can be preheated or prechilled to a desired temperature, and will reasonably hold this temperature during the crosslinking process. Binding Reaction (20100 ul droplet) Parafilm layer Aluminum block Spot aliquots of the reaction onto the parafilm surface in 20-100 µl volumes, taking care not to puncture the parafilm layer with the pipette tip (if the RNA is radioactive, the aluminum block will adsorb the radioactive RNA and is quite difficult to clean). Place the block with the binding reaction droplets into the crosslinker. We use a Stratagene UV 2400 Stratalinker with 365 nm bulbs. Crosslink the RNA. We use a constant-energy setting rather than a timer for greater consistency, particularly as individual bulbs wear down and become dimmer or fail. For the UV dose titration, a good starting point is to place 5 spots of 20 µl, and expose them to UV in multiple iterations, removing one spot at a time, as described in the following table: NB1: 500 mJ = “5000 x 100 micro Joules” NB2: The maximum setting is about 1000 mJ = “9999 x 100 micro Joules” Cycle 1 2 3 4 5 Setting 500 mJ 500 mJ 1000 mJ 1000 mJ 1000 mJ Total Dose: Spot #1 Spot #2 Spot #3 Spot #4 Spot #5 X X X X X 0 mJ 500 mJ X X X X 500 mJ 500 mJ 500 mJ X X X 1000 mJ 500 mJ 500 mJ 1000 mJ X X 2000 mJ 500 mJ 500 mJ 1000 mJ 1000 mJ 1000 mJ 4000 mJ After each dose, remove the appropriate spot and place in a light-protected (e.g. amber) Eppendorf tube. For each dose, aliquot half the recovered reaction (usually about 80% of the original volume, less for higher doses) into separate tubes. Keep one set of crosslinked reactions at room temperature, which will be a set of undigested controls. 206 For the other set of crosslinked reactions, digest the RNA in a manner such that the RNA is largely degraded, but the hot phosphate label is still associated with the crosslinked RNA-protein complex. For the CNNC crosslinking experiment described as an example, we used RNase T1. We added 0.5 µl high-concentration RNase T1 to each 20 µl reaction and incubated at room temperature for 15 minutes. After RNase digestion, add SDS-PAGE buffer, e.g. Laemmli Buffer, to every sample and boil for 5 minutes. Run the reactions on an appropriate SDS-PAGE gel. For most crosslinking reactions, we have used Invitrogen Novex Nu-Page 4-12% Bis-Tris gels and MOPS buffer. After separation on SDS-PAGE, expose the gel to film or phosphorimager plate. Proteins of interest should be visible as discrete, UV-dependent, RNase-resistant bands. Large-scale purification of crosslinked complexes for analysis The goal of this section is to purify substantial amounts of crosslinked RNA-protein complex using streptavidin-biotin affinity pulldown. The first step is to simply increase the scale of the binding reaction, and crosslink using optimized UV dosing. If complex is being prepared for mass-spectrometry analysis, it is worth determining whether the mass-spectrometry operator is willing to accept radioactive samples. If not, it will be necessary to prepare a non-radioactive or “cold” 4-thiouridine substrate, and do experiments in parallel with both hot and cold substrates. For the CNNC binding experiment, a 10-fold scale up was sufficient for mass spectrometry, although just barely so. 390 µl 500 µl 100 µl 10 µl 1000 µl 1X Sonic Buffer Microprocessor Lysate from 293T 10X Cleavage/Binding Buffer (1 mM Mg) 4-S-U containing RNA substrate (1 µM) As before, half the binding reaction should be kept at room temperature without exposure to UV light. The other half should be crosslinked using previously optimized conditions. The goal of the purification is to only retain proteins that are covalently linked to the RNA in a UV-dependent manner. Consequently, we purify both crosslinked and uncrosslinked complexes; the uncrosslinked preparation is used as a negative control in downstream analysis experiments. Wash streptavidin beads according to manufacturer’s instructions. Wash twice more with 1X Sonic buffer to equilibrate the beads. Use enough beads so that the binding capacity is in reasonable excess over the amount of RNA used in the binding reaction. For example, in the 10X scale CNNC binding experiment, 10 pmol biotinylated RNA was used. The binding capacity of the beads is 500 pmol/mg, or 0.5 pmol RNA per µl of beads. A reasonable amount to use would therefore be 40 µl of beads (20 pmol capacity). Add beads to binding reactions (uncrosslinked and crosslinked). Incubate at 4º, rotating or agitating, for 15 min. Magnetically pellet the beads and withdraw the unbound supernatant. We 207 routinely keep all purification intermediates, including the unbound supernatant. For radioactive substrates, compare counts on the beads to counts in the supernatant to ensure that most or all counts have been captured on the beads. Additional incubation time is occasionally needed to fully capture the biotinylated RNA. Wash the beads thoroughly. Wash buffers should be at least 10-fold excess over packed bead volume, and beads should be incubated in each wash cycle for at least 15 min at room temperature, agitated or rotated. We have used the following washing pattern: Wash 1: 1X Laemmli buffer Wash 2: 1X VCA-EB (see Standard Reagents) Move beads to fresh light-protected tubes Wash 3: 1X Laemmli buffer Wash 4: 1X VCA-EB After the final wash has been removed, elute the RNA-protein complex from the beads by cleaving the RNA with RNase. We use RNase T1 for our experiments; other RNases can be used subject to the conditions outlined at the beginning of this section. For the 10X scale CNNC complex pulldown, we diluted high-concentration RNase T1 1:25 in 1X Sonic Buffer and added 20 µl of diluted RNase T1 to the beads. Note that small volumes were used in order to preserve our ability to load the sample on to 1.5 mm thick SDS-PAGE gels. Incubate at room temperature, rotating or agitating, for at least 15 min. Magnetically pellet the beads and withdraw the supernatant. If using radioactive substrate, verify that all or nearly all counts are released into the supernatant. This is the eluate containing the RNA-protein complex (and a lot of RNase). We have primarily used the eluate for mass-spectrometry. Check with the mass-spectrometry facility or operator before purifying individual bands for this experiment. We have previously run the sample on 1.5 mm 4-12% Bis-Tris gels using MOPS buffer. Hot and cold complex purifications were carried out in parallel, and run on adjacent lanes in the gel. The radioactive lane was used to mark the gel cutting location for the cold lane, and the cold gel slices submitted for mass spectrometry. We have tried using Invitrogen SilverQuest silver staining to visualize individual protein-RNA complex bands, but this appears to severely impede mass-spectrometric workup. Another approach is to run the eluate on a 1 mm SDS-PAGE gel, transfer to nitrocellulose or PVDF membrane, and Western blot the eluate for candidate proteins. Due to the low amount of protein purified, we have not had much success with this approach. 208 If candidate proteins are available by “educated guess” or from mass-spectrometry results, a more robust approach immunoprecipitate the protein-RNA complex, as suggested by Sontheimer and as described below. Candidate protein testing by immunoprecipitation of crosslinked complexes The goal of this section is to determine whether a protein of interest binds the target RNA site (4thiouridine location). If an antibody is available for immunoprecipitation, a reasonably robust approach is to crosslink the RNA-protein complex, immunoprecipitate the protein, and demonstrate the association of radioactive RNA with the immunoprecipitated protein. Assemble a sufficiently large binding reaction such that immunoprecipitated protein-RNA complex can be easily visualized by phosphorimager or film exposure. You will need enough binding reaction to do immunoprecipitation with the candidate protein’s antibody, as well as an isotype control antibody. For the CNNC motif binding experiment, a 100 µl binding reaction was used (for four total immunoprecipitations). 39 µl 50 µl 10 µl 1 µl 100 µl 1X Sonic Buffer Microprocessor Lysate from 293T 10X Cleavage/Binding Buffer (1 mM Mg) 4-S-U containing RNA substrate (1 µM) Crosslink using optimized crosslinking conditions and recover all droplets. Divide into aliquots. For the CNNC binding experiment, four immunoprecipitations were carried out: (1) Mouse IgG, (2) Anti-FLAG M2 antibody, (3) Anti-Candidate1 antibody, and (4) Anti-Candidate2 antibody. For each immunoprecipitation, add 1 µl high-concentration RNase T1 per 25 µl binding reaction (for other RNases, use an appropriate concentration). The RNase T1 will digest the substrate while the immunoprecipitation is ongoing. For each immunoprecipitation, add an appropriate dilution of antibody for immunoprecipitation. For example, 1:500 (10 µg/ml) anti-FLAG is a typical dilution for immunoprecipitation. Incubate at 4º 1 hour to overnight. Wash Protein A or Protein G agarose beads or magnetic beads according to manufacturer’s instructions. Wash twice more with 1X Sonic Buffer to equilibrate the beads. Add sufficient beads to bind the amount of antibody used in the immunoprecipitation. For example, Sigma Protein G agarose has a binding capacity of 8 mg IgG per ml of beads. Thus 10 µl of beads is usually more than enough to bind the added IgG. Add beads to the immunoprecipitation mixes. Incubate one hour at 4º, agitated or rotating. 209 Pellet beads and pull off supernatant. Wash 2-3 times in 1X Sonic Buffer. If a more aggressive wash is desired, we have previously supplemented Sonic Buffer to 500 mM KCl (final) and 0.1% Tween-20 (final). Elute the RNA-protein complex from the beads by adding 20 µl 1X Laemmli Buffer and boiling beads for 10 minutes. Pellet the beads and carefully pull off the eluate. Beads that carry over may clog pipette tips and make gel loading difficult. Run the eluate on an appropriate SDS-PAGE gel, e.g. 4-12% Bis-Tris with MOPS buffer. Visualize radioactive bands by phosphorimager or film. Candidate proteins that bind the target RNA site will show a radioactive band at or near the expected size of the protein, since the immunoprecipitated protein is covalently linked to the radioactive phosphate in the RNA. If a radioactive band is seen but is not at the target protein size, consider the possibility that the candidate protein is in a complex with another protein and the target RNA, but the candidate protein does not directly bind the RNA (i.e. the candidate protein coprecipitates with the actual binding protein, but is not directly crosslinked to the RNA). 210 Supplement 1: Standard Protocols Crush-and-soak technique for acrylamide gel extraction This technique uses a centrifuge to force a gel slice through a 22-gauge needle hole. This produces an acrylamide gel slurry, and decreases the diffusion distance between RNA molecules and the elution fluid. This technique requires that the gel slice be small enough to fit in a 0.5 ml Sarstedt Eppendorf tube. First, the 0.5 ml tube must be prepared for use. Close the tube cap and stand the tube on its cap. Unpackage a new, unused and sterile 22-gauge needle. Attach this needle to a syringe for convenient handling. Heat the tip of the needle using a Bunsen burner to red hot. Obviously, it is important to handle the plastic portion, and not to overheat (i.e. to avoid melting the plastic and the human parts). Carefully stab the hot needle through the very bottom of the conical portion of the tube (sitting inverted on the bench). For obvious safety reasons, it is critical to avoid holding the tube with one’s hand, as this places the offending hand at needlestick risk. Although this needle is sterile, it is likely that a needlestick with a red-hot needle will be quite painful. After the hole has been created, carefully place the 0.5 ml tube inside a standard 1.5 ml eppendorf tube. Minimize the amount of glove contact with the exterior of the 0.5 ml tube, since this will be in contact with the interior of the 1.5 ml tube. Place the gel slice in 0.5 ml eppendorf tube. Place the entire assembly (gel slice in the 0.5 ml tube; 0.5 ml tube inside the 1.5 ml Eppendorf tube) into a centrifuge. Centrifuge at full speed (~13,000 x g) for 1 min or until the gel slice is spun through the hole. Discard 0.5 ml tube. Suspend the gel particles in 500 µl of 300 mM NaCl solution, and rotate overnight for elution. After elution is complete, filter the slurry through a Costar Spin-X filter. Precipitate the filtrate by adding 1000 µl ethanol, incubating in cold, and centrifuging as usual. 211 T7 transcription (“Midi” scale) Note: 1 µl of 1 µM solution = 1 pmol solute NB1. T7 RNA polymerase requires a double-stranded promoter region. The enzyme will extend on a single-stranded antisense template. If a PCR product is used for runoff transcription, the T7 promoter sense strand oligo should be excluded. NB2. It is crucial that the first incorporated nucleotide be G, i.e. all T7 transcription products will begin with pppG… Minimal T7 promoter sequence and template strand T7 promoter 5′ TAATTACGACTCACTATA 3′ Template strand 3′ ATTAATGCTGAGTGATATCNNNNNNNNNNNNN…NNNNNN 5′ 5 µl 40 µl 80 µl 20 µl 10 µl 500 pmol oligo template 500 pmol T7 promoter sense strand (100 µM) 10 X House T7 Buffer 5X House NTP Mix 100 mM DTT House T7 Enzyme H2O 400 µl Incubate 37º x 2-3 hr Add 10 µl Turbo DNAse (Ambion) Incubate 37º x 30 min Add 1/20 volume 500 mM EDTA Add 1/10 volume 3M NaCl Add 1 volume 100% ethanol Final: T7 reaction mixture with whatever NTP/Salt/Mg/PPi is present ~25 mM EDTA ~75 mM NaCl (supplemental) 50% ethanol Incubate at least 15 min in -20 Spin 15 min at 4º 212 Phosphorylation of nucleic acid 5′ Ends Note: 1 µl of 1 µM solution = 1 pmol solute Cold NB1. For cold phosphorylation, a 1 mM final ATP concentration is used. This is sufficiently higher than Km such that PNK quantitatively phosphorylates the substrate in 15 min, as measured by PAGE shift. NB2. For the 10X Buffer, either supplement NEB 10X PNK buffer to 1 mM ATP (final) or use 10X T4 DNA Ligase Buffer, which has the same buffer composition with 1 mM ATP (final). 1 µl 1 µl Up to 100 pmol 5′ ends 10X NEB T4 DNA Ligase Buffer (Yes, Ligase.) T4 PNK H2O 10 µl Incubate 37º x 30 min Hot NB3. For hot phosphorylation, the hot ATP concentration is limiting and is sub-Km. NB4. NEN/PE ATP is 6000 Ci/mmol, 150 mCi/ml = 2.5e-4 mmol/ml = 25 µM. For quantitative phosphorylation, use less than 25 pmol 5′ ends and be prepared to wait for >1 hr. 1 µl 1 µl 1 µl Up to 25 pmol 5′ ends 6000 Ci/mmol y-ATP 10X NEB T4 PNK Buffer T4 PNK H2O 10 µl Incubate 37º x 60 min 213 Dephosphorylation of nucleic acid 5′ ends Note: 1 µl of 1 µM solution = 1 pmol solute NB1. Unit conditions in typical CIP reactions is sufficient to dephosphorylate recessed 5′ ends. Nevertheless, if significant secondary structure shields the 5′ end, reaction can be carried out at 50º; CIP is quite heat stable. Up to 100 pmol 5′ ends 10X NEBuffer 3 T4 CIP H2O 1 µl 1 µl 10 µl Incubate 37-50º x 60 min Dephosphorylation of 2′-3′ cyclic phosphates and 3′ phosphates NB1. Adapted from Huili Guo’s RNaseq protocol NB2. I have not tested this protocol to strictly define the parameters for RNA concentration. 20 µl 1 µl RNA, preferably in water 1.5X MES dephosphorylation buffer T4 PNK H2O 30 µl Incubate 37º x 6 hrs 214 Splint ligation of RNA Note: 1 µl of 1 µM solution = 1 pmol solute Using T4 RNA Ligase 2 (RNL2) NB1. T4 RNL2 is exceptionally efficient when the concentration of the nicked helical substrate is around 10 µM. Below 10 µM, yield is either low or dominated by side products. NB2. Splint should have 40-50º Tm so that the nicked substrate is stable at 37º. 100 pmol RNA with 5′-P (3p substrate) 100 pmol RNA with 3′-OH (5p substrate) 100 pmol Splint Oligo H2O 8 µl Heat to 85º for 5 min, then air cool to room temperature +1 µl +1 µl 10 µl 10X RNL2 Buffer T4 RNL2 Incubate 37º x 4 hr or at 25º x overnight Optional: add 0.5 µl Turbo DNAse and incubate at 37º x 10 min to degrade the splint. Using T4 DNA Ligase NB3. T4 DNA Ligase was used for the Moore and Sharp ligation protocol. The Km of DNA Ligase is in the low nM range. Although it is less efficient than RNL2 with high concentration of substrates, it is less sensitive to substrate concentration. NB4. Typically a ligation target is at low concentration and the other components are in excess. If this is the case, the splint oligo should be at least higher in concentration than the lower of the two substrates. 1-100 pmol RNA with 5′-P (3p substrate) 1-100 pmol RNA with 3′-OH (5p substrate) 1-100 pmol Splint Oligo. H2O 8 µl Heat to 85º for 5 min, then air cool to room temperature +1 µl +1 µl 10 µl 10X DNA Ligase Buffer T4 DNA Ligase Incubate 37º x 4 hr or at 25º x overnight Optional: add 0.5 µl Turbo DNAse and incubate at 37º x 10 min to degrade the splint. 215 Partial hydrolysis of RNA NB1: Adapted from Huili Guo’s RNaseq protocol 10 µl RNA, preferably in water 2X Fragmentation Buffer H2O 20 µl Incubate 85º x variable time, ranging from 30 sec to 20 min. Individual substrates should be optimized. Chill immediately on ice and neutralize. Neutralization option 1 (Huili Guo): Add 380 µl of 300 mM NaOAc pH 5.2. Neutralization option 2: Add 20 µl of 1M Tris-Cl pH 7.6. Add 360 µl of 300 mM NaCl. Precipitate RNA by adding 1000 µl 100% ethanol. 216 Supplement 2: Standard Reagents Buffers 1X Sonic Buffer Modified from Lee and Kim, Meth. Enz. 427 (2007), with additional input from Jinju Han and V. Narry Kim. Concentrations: 20 mM Tris-Cl pH 8.0 100 mM KCl 0.2 mM EDTA 5 mM DTT (equivalent to 0.7 µl/ml 2-mercaptoethanol); add fresh 10X Drosha Cleavage/Binding Buffer Designed to be added to reactions assembled in 1X Sonic Buffer. Due to large volume of Yeast Total RNA, this buffer requires assembly of 10X Sonic Buffer concentrate. 5 mM Mg version – concentrations at 10X: 1X Sonic Buffer 3 µg/ul Yeast Total RNA (Ambion) 50 mM MgCl 1 mM Mg version – concentrations at 10X: 1X Sonic Buffer 3 µg/ul Yeast Total RNA (Ambion) 10 mM MgCl 1.5X MES dephosphorylation buffer Concentrations at 1.5X: 150 mM MES-NaOH, pH 5.5 15 mM MgCl2 15 mM β-mercaptoethanol (add fresh) 450 mM NaCl 10X “House” T7 Transcription Buffer Concentrations at 10X: 400mM Tris-Cl, pH 7.9 at 20C 25mM Spermidine 260mM MgCl2 217 0.1% Triton X-100 Urea Loading Buffer Concentrations at 2X: 8 M 25 mM EDTA Add bromophenol blue and xylene cyanol FF as needed. Urea 5X “House” NTP Mix Note: From powder, resuspend 1 gram in 2 mls of H2O. Empirically pH with 1M NaOH to pH 7. Can estimate equivalents of NaOH needed based on salt form of the NTP and charge of 3.5 at pH 7. Concentrations at 5X: 40 mM GTP 25 mM CTP 25 mM ATP 10 mM UTP 10X “House” dNTP Mix Concentrations at 10X: 2 mM dATP 2 mM dCTP 2 mM dGTP 2 mM dTTP 10X “House” PCR Buffer Concentrations at 10X: 100 mM TRIS pH 8.3 @20° 500 mM KCl 15 mM MgCl2 0.1% Gelatin 1X VCA Elution Buffer (VCA-EB) Modified from DPB Elution buffer (DPBEB) Concentrations at 1X: 8M Urea 25 mM EDTA 300 mM NaCl 218 2X Fragmentation Buffer Modified from Huili Guo. Concentrations at 2X: 2 mM EDTA 10 mM Na2CO3 90 mM NaHCO3 (Final carbonate buffer pH 9.3) Commercial products γ-[32P]-ATP (NEN/PE) Catalog number: NEG035C001MC Concentration: 6000ci/mmol, 150 mCi/ml α-[32P]-UTP (NEN/PE) Catalog number: BLU007X500UC Concentration: 800Ci/mmol, 10mCi/ml T4 DNA Ligase (NEB) Catalog number M0202 Concentration: 400,000 cohesive end units/ml Storage Temperature: -20°C 10X T4 DNA Ligase Buffer (NEB) Catalog number B0202 Concentrations at 1X: 50 mM Tris-HCl 10 mM MgCl2 10 mM Dithiothreitol 1 mM ATP pH 7.5 @ 25°C Storage Temperature: -20°C T4 RNA Ligase 2 (NEB) Catalog number M0239 Concentration: 10,000 units/ml Storage Temperature: -20°C 219 10X T4 RNA Ligase 2 Buffer (NEB) Concentrations at 1X: 50 mM Tris-HCl 2 mM MgCl2 1 mM DTT 400 µM ATP pH 7.5 @ 25°C T4 Polynucleotide Kinase (NEB) Catalog number M0201 Concentration: 10,000 units/ml Storage Temperature: -20°C 10X T4 Polynucleotide Kinase Buffer (NEB) Concentrations at 1X: 70 mM Tris-HCl 10 mM MgCl2 5 mM Dithiothreitol pH 7.6 @ 25°C Yeast Total RNA (Ambion) Catalog number: AM7118 Concentration: 10 mg/ml Storage Temperature: -20°C Lipofectamine 2000 (Invitrogen) Catalog number: 11668-027 Storage Temperature: 4°C Opti-MEM I Reduced Serum Medium (Invitrogen) Catalog number: 31985-062 Storage Temperature: 4°C Complete Mini EDTA-Free Protease Inhibitor Tablets (Roche) Catalog Number: 1836170001 Storage Temperature: 4°C 220 MicroSpin G-25 Columns (GE) Catalog Number: 28917922 Storage Temperature: Room Temperature Note: G-25 columns have higher overall RNA retention than P-30 columns, and are somewhat less consistent. However, these columns reasonably pass 10-mer RNAs and are therefore useful for preparation of Decade markers Micro Bio-Spin 30 (aka P30) Columns (Bio-Rad) Catalog Number: 732-6250 Storage Temperature: 4°C Note: P-30 columns are highly consistent with low overall RNA retention. The cutoff for this column is somewhere between 10 and 20 nt. Decade Markers (Ambion) Catalog Number: AM7778 Storage Temperature: -20°C Tri-Reagent (Ambion) Catalog Number: AM9738 Storage Temperature: 4°C Turbo DNAse (Ambion) Catalog Number: AM2238 Concentration: 2 units/ul Storage Temperature: -20°C 0.5 ml Microcentrifuge Tube (Sarstedt) Catalog Number: 72.699 Spin-X centrifuge tube filters (Costar / Corning) Catalog Number: 8161 Pore size: 0.22 um Superscript III Reverse Transcriptase (Invitrogen) Catalog Number: 18080044 Concentration: 200 units/ul Storage Temperature: -20°C SYBR Gold nucleic acid stain (Invitrogen) Catalog Number: S11494 221 Concentration: 10,000X Storage Temperature: -20°C Stratalinker 2400 bulbs, 365 nm (Thermo Fisher) Catalog Number: 50125580 RNase T1 (Cloned), a.k.a High-Concentration (Ambion) Catalog Number: AM2280 Concentration: 1000 units/ul Storage Temperature: -20°C Dynabeads MyOne Streptavidin C1 Magnetic Beads (Invitrogen) Catalog Number: 650-02 Concentration: 10 mg/ml Binding Capacity: 500 pmol/mg ssDNA oligo EZview Red Protein G Affinity Gel (Sigma) Catalog Number: E3403 Binding Capacity: 8 mg/ml rabbit IgG EZview Red Anti-FLAG M2 (Sigma) Catalog Number: F2426 Binding Capacity: 0.6 mg tagged bacterial alkaline phosphatase / ml packed slurry Storage Temperature: -20°C Anti-FLAG M2 Magnetic Beads (Sigma) Catalog Number: M8823 Binding Capacity: 0.6 mg tagged bacterial alkaline phosphatase / ml packed slurry Storage Temperature: -20°C Ultrapure BSA (Applied Biosystems) Catalog Number: AM2616 Concentration: 50 mg/ml Storage Temperature: -20°C 3X FLAG Peptide (Sigma) Catalog Number: F4799 Storage Temperature: 4°C (lyophilized) 222 Immobilon-NC Membrane Filters (Millipore) Catalog Number: HATF01300 Pore size: 0.45 um Diameter: 13 mm “Surfactant-Free” 223 224 Appendix 2. Statistical methods Contents Overview ......................................................................................................................................226 Calculation of relative cleavage ...................................................................................................226 Calculation of the odds ratio score...............................................................................................229 Calculation of the Watson–Crick base pairing score ...................................................................230 Calculation of Information Content .............................................................................................231 225 Overview After selection and sequencing, analysis of the data is rooted in measuring frequencies of motifs in the reference pool and the selected pool. Intuitively, these frequencies reflect the preference of the Microprocessor for a given motif: if the Microprocessor prefers the motif, it would be enriched by the selection, and would be present more frequently in the selected pool than in the reference pool. By contrast, a disfavored motif would be depleted by the selection, so that the motif would be present less frequently in the selected pool than in the reference pool. The following discussion formalizes this intuition in order to quantify the relative contribution of different motifs to recognition and cleavage by the Microprocessor. Calculation of relative cleavage For each base at any given position, we know two values based on sequencing of selected and reference pools. We want to estimate the selectivity of the complex for a particular base, or a particular motif. Bayes’ Theorem, written in odds form, can be used to compare one motif to one other motif at that position: 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × = 𝑃(𝑚𝑜𝑡𝑖𝑓𝑗 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑗 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑗 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) (Eq. 1A: Bayesian statement for motifi vs. motifj) The first term is the proportion of motifi to motifj in the initial, unselected pool of sequences, estimated by sequencing the reference pool. The third term is the proportion of motifi to motifj in the selected pool, estimated by sequencing the cleavage/selection product. The middle term describes the relative cleavage of motifi vs. motifj, and is the observed relative cleavage of motifi over motifj, at a particular position. This term is ideally a property of the enzyme cleavage conditions, independent of the distribution of motifs in the initial pool. For every motifi of length n, there are 4n terms that describe its relative cleavage with respect to each of every other possible motifj of length n at that position: 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × = 𝑃(𝑚𝑜𝑡𝑖𝑓1 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓1 ) 𝑃(𝑚𝑜𝑡𝑖𝑓1 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 226 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × = 𝑃(𝑚𝑜𝑡𝑖𝑓2 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓2 ) 𝑃�𝑚𝑜𝑡𝑖𝑓𝑗 �𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒� … 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × = 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) … 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × = 𝑃(𝑚𝑜𝑡𝑖𝑓4𝑛 ) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑚𝑜𝑡𝑖𝑓4𝑛 ) 𝑃(𝑚𝑜𝑡𝑖𝑓4𝑛 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) In practice, however, we rarely consider a single, isolated motif relative to another isolated motif, due to the design of the selection. In addition, the analysis occasionally necessitates comparison of degenerate motifs, e.g. Watson-Crick pairs vs. nonpairs, or wildtype vs. nonwildtype motifs. In these cases, each probability in Eq. 1A must be partitioned and expressed as the sum of those sub-motifs. Let I be a partition of motifi into individual submotifs, and let J be a partition of motifj into submotifs, i.e. 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) = ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐼) and 𝑃�𝑚𝑜𝑡𝑖𝑓𝑗 � = ∑𝐽 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝑗) ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐼) × 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐼) ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐼|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) = ∑𝐽 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐽) × 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐽) ∑𝐽 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓 ∈ 𝐽|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) Factoring, ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼)×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) � 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) � ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) × ∑ 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽)×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽) = 𝑃(𝑚𝑜𝑡𝑖𝑓𝑗 ) � 𝐽 � 𝑃(𝑚𝑜𝑡𝑖𝑓𝑗 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) ∑𝐽 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽) (Eq. 1B: Bayesian statement after partition of motifi and motifj) Thus, the relative cleavage (middle) term is the ratio of the average cleavage of submotifs in I to the average cleavage of submotifs in J, weighted by the frequencies of the individual submotifs in the initial pool. The equation is even more straightforward in three special cases: • Probabilities of cleavage are roughly equal between submotifs. This case is useful in this particular analysis, because each sequence is unique by stipulation. Thus, in order to 227 consider any motif at all, we may reasonably assume that sequences outside the motif contribute equally to cleavage. Similarly, when considering the relative cleavage of Watson-Crick pairs vs. nonpairs, we assume for the sake of argument that individual pairs are equivalent, and that individual nonpairs are equivalent. • Submotifs are approximately equiprobable in the initial pool. This case primarily occurs when comparing motifs composed exclusively of wildtype bases to motifs composed of nonwildtype bases, e.g. CNNC vs. (not-C)NN(not-C). The initial pool was designed such that nonwildtype bases are equiprobable, and sequencing of the initial pool shows nonwildtype bases are indeed nearly equiprobable. Probabilities of cleavage and submotif frequencies are independent. We cannot rely on this case in much of the relevant analysis, since motifs composed of wildtype nucleotides are more frequent in the pool, and are more likely to have higher probabilities of cleavage than nonwildtype motifs, assuming that the wildtype sequence has evolved in nature to optimize cleavage. Nevertheless, in this condition the weighted average converges to the arithmetic average as the number of submotifs increases. combinations of independent cleavage probabilities demonstrates this: 8 4 2 1 0.5 Number of submotifs 228 1048576 65536 4096 256 16 0.25 1 (Weighted average) (arithmetic average) • and A simulation of 2000 submotif frequencies In any of these possibly overlapping situations, the equation collapses to: ∑ 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) 𝐼 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) |𝐼| × ∑ 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽) = 𝐽 𝑃(𝑚𝑜𝑡𝑖𝑓𝑗 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑗 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) |𝐽| (Eq. 1C: Bayesian statement after equipartition of motifi and motifj) and the relative cleavage (middle) term is simply the ratio of (unweighted) average cleavages of the submotifs. Calculation of the odds ratio score The calculation of relative cleavage requires us to nominate a specific reference motif (motifj). In situations where there is no obvious reference motif, and when rapidly screening for enriched motifs, it is convenient to consider a single motif at a time, and simply compare it to the aggregation of all other motifs. We use a score called the “odds ratio.” In principle, our formulation is: 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 = 1 − 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 1 − 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) (Eq. 2A: “Odds Ratio” measure of relative cleavage for motifi over all other motifs) This is a specific case of Eq. 1B, rewritten with the set I as before, and Ic, the complement of I, composed of all submotifs of length n that are not members of set I: ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼)×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) � � 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) ∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) × ∑ 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 = 𝑐 𝑐 )×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 ) 1 − 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) � 𝐼𝑐 � 1 − 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 |𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑐 ∑𝐼𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 ) (Eq. 2A: Reformulation of Eq. 1B for the Odds Ratio) Thus, the odds ratio is the ratio of the average cleavage of submotifs in I to the average cleavage of submotifs in Ic, weighted by the frequencies of the individual submotifs in the initial pool. The odds ratio can be further simplified under the same special conditions as with the relative cleavage measure. One important caveat is that odds ratios cannot strictly be used to compare two motifs. To see why this is so, consider two motifs of length n at the same position, motifi and motifj. 229 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜𝑖 = 𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜𝑗 = �∑𝐼 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼)×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) � ∑ 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼) 𝐼 𝑐 )×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 𝑐 ) ∑ � 𝐼𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 � ∑𝐼𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 𝑐 ) ∑𝐽 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽)×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽) � ∑𝐽 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽) � ∑𝐽𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽𝑐 )×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽𝑐 ) � ∑𝐽𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽𝑐 ) � Because the sets I and J are mutually exclusive, it follows that Ic≠Jc and the denominators are not equal. Therefore, when directly comparing two motifs, the relative cleavage score (Eq. 1B) is superior. However, the Odds Ratio is still useful for considering many motifs at the same time. Since 𝐼 𝑐 = 𝐽𝑐 + 𝐽 − 𝐼 it is intuitive that, as the number of motifs being considered increases, 𝑃(𝑚𝑜𝑡𝑖𝑓𝑖 ) → 0 and 𝑃�𝑚𝑜𝑡𝑖𝑓𝑗 � → 0 ∑𝐽𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽𝑐 )×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽𝑐 ) ∑𝐽𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐽𝑐 ) ∑𝐼𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 𝑐 )×𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 𝑐 ) ∑𝐼𝑐 𝑃(𝑠𝑢𝑏𝑚𝑜𝑡𝑖𝑓∈𝐼 𝑐 ) → and the ability to compare two odds ratio scores improves accordingly. Calculation of the Watson–Crick base pairing score To screen for specifically for Watson–Crick pairing between all possible combinations of randomized positions, we used a scoring metric to compare the geometric average of odds ratios for Watson–Crick pairing to that of odds ratios for non-Watson–Crick pairs. The score has no fundamental meaning, and simply serves to identify position pairs where Watson–Crick nucleotide identities are, on average, more preferred than non-Watson–Crick identities. It is also a useful metric for prioritizing position pairs for followup analysis. Pairing score = � � Watson–Crick 1/4 Odds ratio� 230 −� � non−Watson–Crick 1/12 Odds ratio� Calculation of Information Content To calculate the information content at each position, we use Bayes’ Theorem to infer the distribution of bases after selection from a completely random pool, then calculate an information content score based on the post-selection distribution. We can use Bayes’ Theorem to infer how a distribution of the four bases changes after selection, for any initial pool. Because we are considering all four bases at once, we must consider the relative cleavage of any given base vs. the other three bases. For clarity, the formula is shown for one base, A. According to Bayes’ Theorem for Total Probability: = 𝑃(𝑏𝑎𝑠𝑒 → 𝐴 𝑎𝑓𝑡𝑒𝑟 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛) = 𝑃(𝑏𝑎𝑠𝑒 → 𝐴|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐴) × 𝑃(𝑏𝑎𝑠𝑒 → 𝐴) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐴) × 𝑃(𝑏𝑎𝑠𝑒 → 𝐴) �+𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐶) × 𝑃(𝑏𝑎𝑠𝑒 → 𝐶) +𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐺) × 𝑃(𝑏𝑎𝑠𝑒 → 𝐺) +𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝑈) × 𝑃(𝑏𝑎𝑠𝑒 → 𝑈) Reorganization shows: 𝑃(𝑏𝑎𝑠𝑒 → 𝐴|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐶) 𝑃(𝑏𝑎𝑠𝑒→𝐶) 𝑃(𝑏𝑎𝑠𝑒→𝐺) × + 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐺) × = �1 + 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐴) 𝑃(𝑏𝑎𝑠𝑒→𝐴) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐴) 𝑃(𝑏𝑎𝑠𝑒→𝐴) 𝑃(𝑏𝑎𝑠𝑒→𝑈) + 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝑈) × � 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐴) 𝑃(𝑏𝑎𝑠𝑒→𝐴) −1 In other words, the new probability depends on the relative cleavage of A vs. the other nucleotides, and the relative abundance of A vs. the other nucleotides in the initial pool. We can calculate the relative cleavage of one base vs. another, e.g. A vs. C, using the selection data and Eq. 1A. 𝑅𝑎𝑡𝑖𝑜 𝑜𝑓 𝐶 𝑡𝑜 𝐴 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐶𝑙𝑒𝑎𝑣𝑎𝑔𝑒 𝑅𝑎𝑡𝑖𝑜 𝑜𝑓 𝐶 𝑡𝑜 𝐴 × = (𝑝𝑟𝑒 − 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛) 𝐶 𝑣𝑠. 𝐴 (𝑝𝑜𝑠𝑡 − 𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛) 𝑃(𝑏𝑎𝑠𝑒 → 𝐶) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐶) 𝑃(𝑏𝑎𝑠𝑒 → 𝐶|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) × = 𝑃(𝑏𝑎𝑠𝑒 → 𝐴) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐴) 𝑃(𝑏𝑎𝑠𝑒 → 𝐴|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 231 𝑃(𝑏𝑎𝑠𝑒 → 𝐶|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐶) 𝑃(𝑏𝑎𝑠𝑒 → 𝐴|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) � = 𝑃(𝑏𝑎𝑠𝑒 → 𝐶) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒 → 𝐴) 𝑃(𝑏𝑎𝑠𝑒 → 𝐴) For a totally random initial pool, the bases are equiprobable, i.e. Hence: 𝑃(𝑏𝑎𝑠𝑒 → 𝐴) = 𝑃(𝑏𝑎𝑠𝑒 → 𝐶) = 𝑃(𝑏𝑎𝑠𝑒 → 𝐺) = 𝑃(𝑏𝑎𝑠𝑒 → 𝑈) 𝑃𝑖𝑛𝑓𝑒𝑟𝑟𝑒𝑑 𝐴 = 𝑃(𝑏𝑎𝑠𝑒 → 𝐴|𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐶) = �1 + 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐴) + 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐺) + 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝑈) � 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐴) 𝑃(𝑐𝑙𝑒𝑎𝑣𝑎𝑔𝑒|𝑏𝑎𝑠𝑒→𝐴) −1 This is the inferred fraction of A bases after selection from a random pool. Knowing this fraction, we can calculate the information “contribution” for that base: 𝐼𝐴 = 𝑃𝑖𝑛𝑓𝑒𝑟𝑟𝑒𝑑 𝐴 × �𝑙𝑜𝑔2 �𝑃𝑖𝑛𝑓𝑒𝑟𝑟𝑒𝑑 𝐴 � + 2� And the total information at that position is: 𝐼𝑡𝑜𝑡𝑎𝑙 = 2 + ∑𝑎𝑙𝑙 𝑏𝑎𝑠𝑒𝑠 𝑃𝑖𝑛𝑓𝑒𝑟𝑟𝑒𝑑 × 𝑙𝑜𝑔2 �𝑃𝑖𝑛𝑓𝑒𝑟𝑟𝑒𝑑 � 232 233 234 Appendix 3. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes H. Rosaria Chiang1,2, Lori W. Schoenfeld1,2, J. Graham Ruby3, Vincent C. Auyeung1,2,4, Noah Spies1,2, Daehyun Baek1,2, Wendy K. Johnston1,2, Carsten Russ5, Shujun Luo6, Joshua E. Babiarz7, Robert Blelloch7, Gary P. Schroth6, Chad Nusbaum5, David P. Bartel1,2 1 Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 2 Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 3 Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA 4 Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA 5 Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA 6 Illumina, Inc., Hayward, CA 94545, USA 7 Institute for Regeneration Medicine, Center for Reproductive Sciences, and Department of Urology, University of California San Francisco, San Francisco, CA 94143, USA Published as: Chiang, et al. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes and Development. 2010 May 15;24(10):992-1009. Contribution: V.C.A. performed the analysis of RNA editing. 235 Mammalian microRNAs: experimental evaluation of novel and previously annotated genes H. Rosaria Chiang,1,2 Lori W. Schoenfeld,1,2 J. Graham Ruby,1,2,7 Vincent C. Auyeung,1,2,3 Noah Spies,1,2 Daehyun Baek,1,2 Wendy K. Johnston,1,2 Carsten Russ,4 Shujun Luo,5 Joshua E. Babiarz,6 Robert Blelloch,6 Gary P. Schroth,5 Chad Nusbaum,4 and David P. Bartel1,2,8 1 Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA; 2Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; 3Harvard-Massachusetts Institute of Technology Division of Health Sciences and Technology, Cambridge, Massachustts 02139, USA; 4Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02141, USA; 5Illumina, Inc., Hayward, California 94545, USA; 6Institute for Regeneration Medicine, Center for Reproductive Sciences, and Department of Urology, University of California at San Francisco, San Francisco, California 94143, USA MicroRNAs (miRNAs) are small regulatory RNAs that derive from distinctive hairpin transcripts. To learn more about the miRNAs of mammals, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. Analysis of these sequences confirmed 398 annotated miRNA genes and identified 108 novel miRNA genes. More than 150 previously annotated miRNAs and hundreds of candidates failed to yield sequenced RNAs with miRNA-like features. Ectopically expressing these previously proposed miRNA hairpins also did not yield small RNAs, whereas ectopically expressing the confirmed and newly identified hairpins usually did yield small RNAs with the classical miRNA features, including dependence on the Drosha endonuclease for processing. These experiments, which suggest that previous estimates of conserved mammalian miRNAs were inflated, provide a substantially revised list of confidently identified murine miRNAs from which to infer the general features of mammalian miRNAs. Our analyses also revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan precursor miRNA (pre-miRNA), consequential 59 heterogeneity, newly identified instances of miRNA editing, and evidence for widespread pre-miRNA uridylation reminiscent of miRNA regulation by Lin28. [Keywords: MicroRNA; miRNA biogenesis; noncoding RNA genes; high-throughput sequencing] Supplemental material is available at http://www.genesdev.org. Received November 11, 2009; revised version accepted March 19, 2010. MicroRNAs (miRNAs) are endogenous ;22-nucleotide (nt) RNAs that post-transcriptionally regulate gene expression (Bartel 2004). miRNAs mature through three intermediates: a primary miRNA transcript (pri-miRNA), a precursor miRNA (pre-miRNA), and a miRNA:miRNA* duplex. RNA Polymerase II transcribes the pri-miRNA, which contains one or more segments that each fold into an imperfect hairpin. For canonical metazoan miRNAs, the RNase III enzyme Drosha together with its partner, the RNA-binding protein DGCR8, recognize the hairpin, and Drosha cleaves both strands ;11 base pairs (bp) from the base of the stem (Han et al. 2006). The cut leaves a 7 Present address: Department of Biochemistry and Biophysics, University of California San Francisco, San Francisco, CA 94158, USA. 8 Corresponding author. E-MAIL dbartel@wi.mit.edu; FAX (617) 258-6768. Article published online ahead of print. Article and publication date are online at http://www.genesdev.org/cgi/doi/10.1101/gad.1884710. 992 59 phosphate and 2-nt 39 overhang (Lee et al. 2003). The liberated pre-miRNA hairpin is then exported to the cytoplasm by Exportin-5 (Yi et al. 2003; Lund et al. 2004). There, the RNase III enzyme Dicer cleaves off the loop of the pre-miRNA, ;22 nt from the Drosha cut (Lee et al. 2003), again leaving a 59 monophosphate and 2-nt 39 overhang. The resulting miRNA:miRNA* duplex, comprised of ;22-nt strands from each arm of the original hairpin, then associates with an Argonaute protein such that the miRNA strand is usually the one that becomes stably incorporated, while the miRNA* strand dissociates and is degraded. In addition to canonical miRNAs, some miRNAs mature through pathways that bypass Drosha/DGCR8 recognition and cleavage. Members of the mirtron subclass of pre-miRNAs are excised as intron lariats from the primiRNA by the spliceosome and, following debranching, fold into Dicer substrates (Okamura et al. 2007; Ruby et al. GENES & DEVELOPMENT 24:992–1009 Ó 2010 by Cold Spring Harbor Laboratory Press ISSN 0890-9369/10; www.genesdev.org 236 Mammalian microRNAs 2007a). For some mirtrons, known as tailed mirtrons, a longer intron is excised such that only one end of the pre-miRNA is generated by the spliceosome, whereas the other end of the pre-miRNA matures through the Droshaindependent trimming of a 59 or 39 tail (Ruby et al. 2007a; Babiarz et al. 2008). Members of another subclass of premiRNAs, called endogenous shRNAs, are suitable Dicer substrates without preprocessing by either Drosha or the spliceosome (Babiarz et al. 2008). Other small silencing RNAs are generated from the sequential processing of long hairpins or long bimolecular duplexes. These small RNAs are classified as endogenous siRNAs rather than miRNAs because they derive from extended duplexes that produce many different small RNA species, whereas miRNAs derive from distinctive hairpins that produce one or two dominant species (Bartel 2004). The first indication of the abundance of miRNA genes came from sequencing small RNAs from mammals, flies, and worms (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). Hundreds of mammalian miRNAs have been identified by Sanger sequencing of cloned small RNA-derived cDNAs (Lagos-Quintana et al. 2001, 2002, 2003; Houbaviy et al. 2003; Berezikov et al. 2006b; Landgraf et al. 2007). Some miRNAs, however, are expressed only in a limited number of cells or through a limited portion of development, and their rarity makes them difficult to detect. Computational methods have been used to identify mammalian miRNAs initially missed by sequencing, and some of these predicted miRNAs have been evaluated experimentally—e.g., by rapid amplification of cDNA ends (RACE) (Lim et al. 2003; Xie et al. 2005), hybridization to RNA blots (Berezikov et al. 2005), microarrays (Bentwich et al. 2005), and RNA-primed array-based Klenow extension (RAKE) (Berezikov et al. 2006b). Each of these experimental methods, however, can yield false positives. Indeed, recent work in invertebrates and plants (Rajagopalan et al. 2006; Ruby et al. 2006, 2007b) has shown that the fraction of erroneously annotated miRNAs can be quite high, depending on the quality of the initial computational predictions. Even when miRNA genes are predicted correctly, the resolution of the prediction is often insufficient to confidently determine the precise 59 end of the mature miRNA. Because miRNAs repress target mRNAs by pairing to the seed sequence, which is defined relative to the position of the miRNA 59 end, singlenucleotide resolution of 59-end annotations is required for useful downstream analysis of their physiological consequences (Bartel 2009). Another approach for finding miRNAs and other small RNAs missed in the early discovery efforts is highthroughput sequencing (Lu et al. 2005). In mammals, high-throughput sequencing methods that have contributed to miRNA discovery efforts have included massively parallel signature sequencing (MPSS) (Mineno et al. 2006), miRNA serial analysis of gene expression (miRAGE) (Cummins et al. 2006), 454 pyrosequencing (Berezikov et al. 2006a, 2007; Calabrese et al. 2007), and Illumina sequencing (Babiarz et al. 2008; Kuchenbauer et al. 2008). Here we use the Illumina sequencing-by-synthesis platform (Seo et al. 2004) for miRNA discovery in mice. Analyses of these reads, combined with experimental evaluation of newly identified miRNAs as well as previous annotations, led us to substantially revise the set of confidently identified murine miRNAs, thereby providing a more accurate picture of the general features of mammalian miRNAs and their abundance in the genome. In addition, our results revealed new aspects of miRNA biogenesis and modification, including tissuespecific strand preferences, sequential Dicer cleavage of a metazoan pre-miRNA, cases of consequential 59 heterogeneity, newly identified instances of miRNA editing, and widespread pre-miRNA uridylation reminiscent of Lin28-like miRNA regulation. Results We sequenced small-RNA libraries from three mouse tissues—brain, ovary, and testes—as well as embryonic day 7.5 (E7.5), E9.5, E12.5, and newborn. Combining these data with data collected similarly from mouse embryonic stem (ES) cells (Babiarz et al. 2008) yielded 28.7 million reads between 16 nt and 27 nt in length that perfectly matched the mouse genome assembly (Supplemental Table 1). Of these reads, 79.3% mapped to miRNA hairpins, and 7.1% mapped to other annotated noncoding RNA genes (Supplemental Table 2). Because the sequencing protocol was selective for RNAs with 59 monophosphate and 39 hydroxyl groups, this dominance of miRNA species was expected (Lau et al. 2001). miRNA gene discovery As when analyzing high-throughput data from invertebrates (Ruby et al. 2006, 2007b; Grimson et al. 2008), we identified miRNA genes in mice by applying the following criteria: (1) expression of the candidate miRNA, with a relatively uniform 59 terminus; (2) pairing characteristics of the predicted hairpin; (3) absence of annotation suggesting non-miRNA biogenesis; (4) absence of proximal reads suggesting that the candidate is a degradation intermediate; and (5) presence of reads corresponding to a miRNA* species with potential to pair to the miRNA candidate with ;2-nt 39 overhangs. Using a low-stringency genomic search strategy that considered the first four criteria, 736 miRNA candidates were identified from the total data set of mouse reads. Manual inspection of these candidates, focusing on all five criteria, narrowed the list to 465 canonical miRNA genes, 377 of which were already annotated in miRBase version 14.0 (GriffithsJones 2004) and 88 of which were novel (Fig. 1A; Supplemental Fig. S1; Supplemental Table 3). We also found 14 mirtrons (including 10 tailed mirtrons), four of which were already annotated, and 16 endogenous shRNAs, six of which were annotated previously (Fig. 1B). When added to the 88 novel canonical miRNA genes, the newly identified mirtons and shRNAs raised the total number of novel genes to 108. Of these 108 genes, 36 appeared to be close paralogs of previously annotated miRNA genes (most of which were paralogs of mir-466, mir-467, or mir-669), producing GENES & DEVELOPMENT 237 993 Chiang et al. Figure 1. Mouse miRNAs and candidates initially identified by high-throughput sequencing. (A) Overlap between previously annotated miRNA hairpins (miRBase version 14.0; green), miRNA candidates identified in the current study, and the subset of these candidates that met our criteria for classification as confidently identified canonical miRNAs (red). Additional considerations increased the number of confidently identified canonical miRNAs to 475. (B) Overlap between previously annotated mirtrons and shRNAs and the mirtrons and shRNAs supported by our study, colored as in A. miRNA reads that were identical to the previously annotated miRNAs, creating ambiguity as to which loci contributed to the sequenced reads. Most of these close paralogs (35 of 36), as well as 14 other novel loci, were clustered with annotated miRNAs. The 72 novel genes with reads distinguishable from those of previously identified genes were expressed at a lower level than the previously annotated genes (median read counts 27 and 8206, respectively), and, compared with previously annotated miRNAs, a higher fraction of these novel miRNAs were located within introns of annotated RefSeq (Pruitt et al. 2005) mRNAs (47% and 26%, respectively). Experimental evaluation of unconfirmed miRNAs Of 564 miRBase-annotated miRNA genes (including four confirmed mirtons and six confirmed shRNAs) that map to mm8 genome assembly, 157 annotated miRNAs did not pass the filters for miRNA candidates (Fig. 1A,B; Supplemental Fig. S1; Supplemental Table 4). Of these 157, 26 mapped to annotated rRNA and tRNA loci, 52 had no reads mapping to them, and another 72 had some reads but in numbers deemed insufficient for confident annotation. The remaining seven either had reads with very heterogeneous 59 ends, which suggested nonspecific degradation of a non-pri-miRNA transcript (mir-464, mir1937a, and mir-1937b); had many reads that mapped well into the loop of the putative hairpin, which were inconsistent with Dicer processing (mir-451, mir-469, and mir805); or did not give a predicted fold with the requisite pairing involving the candidate and predicted miRNA* (mir-484) (Supplemental Fig. S2). For five of these seven, we have no reason to suspect that they might be authentic miRNA genes. Among the remaining two, mir-484 might be regarded as a miRNA candidate because manual refolding was able to generate a hairpin with the requisite pairing, but, even so, this candidate lacked reads for the predicted miRNA*. miR-451 is a noncanonical miRNA generated from an unusual hairpin without production of a miRNA:miRNA* duplex (S Cheloufi and G Hannon, pers comm.). We do not suspect that any other annotated miRNA genes failed to pass our filters for the same reason as mir-451. An additional 20 annotated miRNA hairpins were in our set of candidates but failed the manual inspection because they lacked predicted miRNA* reads even after allowing for alternate hairpin structures. Hundreds of 994 candidates from other miRNA discovery efforts (Xie et al. 2005; Berezikov et al. 2006b) also failed to pass the filters, usually because no reads mapped to them. One of the annotated miRNA genes missing from our data sets was mir-220, which had been predicted computationally using MiRscan as a miRNA gene candidate conserved in humans, mice, and fish, and was supported experimentally using RACE analysis of zebrafish small RNAs (Lim et al. 2003). In contrast, the other 37 miRNAs newly annotated by Lim et al. (2003) were among our confirmed miRNAs. The absence of mir-220 in our data sets might have reflected either very low expression in the sequenced samples or inaccuracy of its annotation. Similarly, mir-207, annotated in a contemporaneous study that cloned novel miRNAs from mouse tissues, was missing from our data set, but another 27 miRNAs annotated from that study were confirmed (Lagos-Quintana et al. 2003). To evaluate whether the missing annotated miRNAs and candidates represented authentic miRNAs, we developed a moderate-throughput assay to examine if their respective hairpins could be processed as miRNAs in cultured cells (Fig. 2A). If these putative miRNAs were missing from our data sets because they were not expressed in the sequenced tissues or stages, we reasoned that they would probably be detected in cells ectopically expressing their respective hairpins, because most authentic miRNAs are processed correctly from heterologous transcripts that include the full hairpin flanked by ;100 nt of genomic sequence on each side of the hairpin (Chen et al. 2004; Voorhoeve et al. 2006). Alternatively, if these putative miRNAs were missing because they were not authentic miRNAs and therefore lacked the features needed for Drosha and Dicer processing, they would not be sequenced from cells ectopically expressing their hairpins. To evaluate many hairpins simultaneously, we transfected pools of hairpin-expressing constructs into HEK293T cells and isolated small RNAs for high-throughput sequencing. The performance of 26 positive controls, chosen from canonical human/mouse miRNAs confirmed by our sequencing from mice, illustrated the value of the assay. For all but one of these controls, miRNA and miRNA* reads were more abundant in the cells ectopically expressing the hairpin than in the cells without the hairpin constructs (Fig. 2B–D; Supplemental Figs. S3, S4). For example, both hsa-miR-193b and mmu-miR-137 (from humans and mice, respectively) were >10 fold overexpressed (Fig. 2B). The positive controls included genes of tissue-specific miRNAs, GENES & DEVELOPMENT 238 Mammalian microRNAs Figure 2. Experimental evaluation of annotated miRNAs and previously proposed candidates. (A) Schematic of the expression vector transfected into HEK293T cells. (B) Examples of the standard ectopic expression assay, transfecting plasmids indicated in the key. Reads from the control transfection (no hairpin plasmid) were from endogenous expression in HEK293T cells. (C) Assay results for annotated human miRNAs and published candidates. Bars are colored as in B; asterisks indicate detectable overexpression ($1 read from both the anticipated miRNA and miRNA*, with miRNA and miRNA* combined expressed more than threefold over endogenous levels). (D) Assay results for unconfirmed annotated mouse miRNAs and published candidates. Mouse controls were selected from miRNAs that were sequenced from our mouse samples. Bars are colored as in B; detectable overexpression is indicated (asterisks). Shown are the results compiled from two experiments (Supplemental Figs. S3, S4). including mir-122 (liver), mir-133 (muscle), mir-223 (neutrophil), and several neuron-specific miRNAs, with the idea that hairpins of tissue-specific miRNAs might require tissue-specific factors for their processing, and therefore might be sensitive to the potential absence of such factors in HEK293T cells. Differences were observed, ranging from ;100 to 10,000 reads above the control transfection (Fig. 2C, hsa-mir-214 and hsa-mir-9-1, respectively), consistent with the idea that factors absent in HEK293T cells might play a role in processing of some miRNAs. Alternatively, some miRNA hairpins might be processed less efficiently in all cell types, perhaps because our vectors might not present the hairpins in an optimal context for processing. Perhaps hsa-mir-192, the control gene that did not overexpress in our assay, lacked crucial processing determinants needed in all cells. In either scenario, the very high sensitivity of highthroughput sequencing enabled miRNAs to be observed from most of the less efficiently processed hairpins. GENES & DEVELOPMENT 239 995 Chiang et al. From the 52 annotated mouse miRNAs that our study did not sequence, 17 miRNAs, including mir-220 and mir207, were tested in the ectopic expression assay. One, mir698, generated a single read corresponding to the annotated miRNA, and the rest failed to generate any reads representing the annotated miRNA (Fig. 2D). From the 72 annotated miRNAs that we could not identify due to insufficient number of reads, 28 were tested, and only four of these were found to be overexpressed (Fig. 2D). The difficulty in overexpressing a canonical control miRNA (hsa-miR-192) illustrates that our ectopic expression assay cannot be used to prove conclusively that a particular hairpin does not represent an authentic miRNA gene. However, the inability to overexpress each of the 17 unsequenced miRNAs, as well as most of the 28 insufficiently sequenced miRNAs, strongly indicated that, overall, these annotations have been faulty, and that our failure to detect previously annotated miRNAs in mouse samples was not merely due to inadequate sequencing coverage. We also tested 10 of the 20 annotated miRNA genes that we identified as candidates but did not confidently classify as miRNA genes because the predicted miRNA* species was not sequenced. Four of seven genes without a miRNA* read and one of three genes with substantially offset miRNA* reads produced the predicted miRNA* species in our ectopic expression assay (Fig. 2D). mir-184 and mir-489, both of which tested positive in this assay, are conserved. mir-184 is conserved throughout mammals, and mir-489 is conserved to chicken, although the miRNA seed, which is highly conserved in mammals and chickens, differs in mice and rats. Thus, these two genes, as well as mir-875, which is a broadly conserved gene without a miRNA* read, were added to our set of confidently identified miRNA genes. Also added were mir-290, mir291a, mir-291b, mir-292, mir-293, mir-294, and mir-295, which were missing in the genome assembly (mm8) used in our analysis because they fall in the region of the genome that is difficult to assemble. Including these 10 genes, plus mir-451, brings the total number of confidently identified miRNA genes to 506, which includes 475 canonical genes. Our sets of confirmed and novel murine miRNAs also provided the opportunity to evaluate results of more recent computational efforts to find miRNAs conserved among mammals. One set of studies predicted miRNAs based on phylogenetic conservation, and then tested these and additional murine-specific hairpins using RAKE and cloning (Berezikov et al. 2005, 2006b). Among the 322 candidates supported by these experiments, 11 were in our sets of miRNAs (two in our confirmed set, and nine in our novel set), and another nine did not satisfy our annotation criteria but had at least one read consistent with the predictions. Another study started with MiRscan predictions conserved in four mammals, and filtered these predictions for potential seed pairing to conserved motifs in 39 untranslated regions (UTRs) (Xie et al. 2005). Of their 144 final candidates, 45 were paralogs of miRNAs already published at the time of prediction. Of the remaining 99 candidates, 27 were in our sets of miRNAs (26 in our confirmed set and one in our novel set), and one did not satisfy our annotation criteria but had three reads 996 consistent with the miRNA* of the predicted miRNA. However, only four of the 27 confirmed miRNA genes (4% of the 99 novel predictions) gave rise to the mature miRNA with the predicted seed, suggesting that filtering MiRscan predictions for potential seed pairing provided little, if any, added benefit. This conclusion concurs with a recent analysis of miRNA targeting: miRNAs that are not conserved beyond mammals do not have enough preferentially conserved sites to place these sites as among the most conserved UTR motifs (Friedman et al. 2009). Therefore, it stands to reason that preferentially conserved UTR motifs would provide little value for predicting such miRNAs. To investigate whether the computational candidates might have been missed because of low expression in tissues and stages from which we sequenced, we included representatives from each study in our ectopic expression assay. We randomly selected 12 Xie et al. (2005) candidates and eight Berezikov et al. (2006b) candidates that our study did not sequence, as well as four human candidates from the Berezikov et al. (2005) set whose mouse orthologs were not sequenced. None generated reads representing the candidate miRNAs (Fig. 2C,D). Taken together, our results raise new questions regarding the authenticity of these candidates, and suggest that previous extrapolation from these candidates, which had suggested that mammals have a surprisingly high number of conserved miRNA genes (as many as 1000) (Berezikov et al. 2005), should be revised accordingly. Experimental evaluation of novel miRNAs and new candidates We also used the ectopic expression assay to evaluate novel miRNAs identified from our sequencing. Of the 25 evaluated hairpins, 18 (72%) generated a significant number of miRNA-like reads in HEK293T cells, indicating that most, although perhaps not all, of our 108 novel annotations represented authentic miRNAs (Fig. 3; Supplemental Figs. S5, S6). These 25 hairpins were selected arbitrarily for evaluation, except for a preference for rare miRNAs; i.e., those that had <10 mature miRNA reads. The rare miRNAs and the higher-abundance miRNAs performed similarly (five of seven and 11 of 14 positives, respectively). To evaluate Drosha and Dicer dependence of the overexpressed hairpins, the experiment was repeated with and without a plasmid encoding a dominant-negative allele of either Drosha or Dicer (Fig. 3A; Han et al. 2009). All but two canonical miRNA controls and most of the novel canonical miRNAs (16 of 17) responded to TNdrosha coexpression (Fig. 3B; Supplemental Fig. S7). Fewer responded to TNdicer, suggesting that this construct was less disruptive of normal miRNA processing (Supplemental Fig. S7). The tested hairpins included several noncanonical miRNA precursors. The level of mmu-miR-1224, an annotated mirtronic miRNA (Berezikov et al. 2007), increased in the presence of TNdrosha, as expected if this pre-miRNA had more access to Exportin-5 and Dicer when the canonical pre-miRNAs were reduced (Grimm GENES & DEVELOPMENT 240 Mammalian microRNAs et al. 2006). Although mmu-miR-1839, an annotated shRNA (Babiarz et al. 2008), did not overexpress, mmumiR-344e and mmu-miR-344f, novel shRNAs, did over- express from our vector, and, as expected for shRNAs, their biogenesis was Drosha-independent (Fig. 3B; Supplemental Figs. S5–S7). Repeating the ectopic expression assay in Dicer knockout and control cells confirmed that mmu-miR-344e biogenesis was Dicer-dependent (data not shown). We also evaluated our candidates that had not satisfied our criteria for confident annotation as miRNAs, usually because they lacked reads representing the predicted miRNA*. We tested three sets of these candidates. One set represented our candidates that lacked predicted miRNA* reads, yet, based on small RNA sequencing results from wild-type and mutant ES cells (Babiarz et al. 2008), appeared DGCR8- and Dicer-dependent. Another set represented candidates that appeared conserved in syntenic regions of other mammalian genomes, and the third set was selected at random from among the remaining candidates. All but one of the 28 tested candidates failed to generate miRNA-like reads, and the processing of the candidate that did generate miRNA-like reads in HEK293T cells was not dependent on Dicer, based on its presence in Dicer knockout ES cells (Babiarz et al. 2008). The results evaluating the novel miRNAs and candidates illustrated the importance of requiring a convincing miRNA* read as a criterion for confident miRNA annotation. Five previously annotated miRNAs that were initially rejected due to lack of a convincing miRNA* read had tested positive in our overexpression assay (Fig. 2D), which indicated that this criterion was too stringent for some of the previously annotated genes. However, the results for the newly identified miRNAs and candidates showed that the presence of a convincing miRNA* read was the primary criterion that distinguished the novel canonical miRNAs (most of which tested positive) from the remaining candidates (nearly all of which tested negative). By requiring a convincing miRNA* read in addition to the other four annotation criteria, our approach accurately distinguished miRNA reads from the millions of other small RNA reads generated by high-throughput sequencing, with relatively few false positives among the novel annotations and few false negatives among the rejected candidates. miRNA expression profiles To compare expression levels of each miRNA in different sequenced samples, we constructed relative miRNA expression profiles (Fig. 4; Supplemental Table 5), and to compare the relative expression of various miRNAs with Figure 3. Experimental evaluation of novel miRNAs and candidates. (A) Examples of assays evaluating Drosha dependence, transfecting plasmids indicated in the key. (B) Assay results for control miRNAs, novel miRNAs, and miRNA candidates. Bars are colored as in A; detectable overexpression (black asterisks), overexpression attempted but not detected (black minus sign), detectable Drosha dependence (orange asterisks), and Drosha dependence assayed but not detected (orange minus sign) are all indicated. Shown are the results compiled from three experiments (Supplemental Figs. S5–S7). GENES & DEVELOPMENT 241 997 Chiang et al. Figure 4. miRNA relative expression profiles. Profiles of mature miRNAs were constructed as described (Ruby et al. 2007b). The relative contribution of each miRNA from each sample and the sum of the normalized reads of all samples are provided (Supplemental Table 5). each other, we generated a table of overall miRNA abundance (Supplemental Table 5). Most miRNAs had substantially stronger expression in some tissues or stages than in others, in agreement with previous observations (Wienholds et al. 2005). We expect that strong tissue- or stage-specific expression preferences inferred from our limited sample set will be revised as more tissues and stages are surveyed. General features of mammalian miRNAs Our analyses of high-throughput sequencing data and subsequent experimental evaluation reshaped the set of known murine miRNAs, setting aside 173 questionable 998 annotations and adding 108 novel miRNA genes to bring the total number of confidently identified murine genes to 506. A majority (60%) of the 506 genes appeared conserved in other mammals (Supplemental Fig. S1; Supplemental Table 6). However, only 15 of the 108 novel miRNA genes were conserved in other mammals, suggesting that the number of nonconserved miRNA genes will soon surpass that of conserved ones as high-throughput sequencing is applied more deeply and more broadly. Five novel miRNAs (mir-3065, mir-3071, mir-3074-1, mir3074-2, and mir-3111) mapped to the antisense strand of previously annotated miRNAs (mir-338, mir-136, mir-24-1, mir-24-2, and mir-374, respectively), which, when added to the previously identified mir-1-2/mir-1-2-as pair, brings GENES & DEVELOPMENT 242 Mammalian microRNAs the total number of sense/antisense miRNA pairs to six. In addition, the mir-486 hairpin has a palindromic sequence, which resulted in the same reads mapping to both the sense (mir-486) and antisense (mir-3107) hairpins. Analysis of the antisense loci of all 498 miRNA genes identified six additional loci that gave rise to some antisense reads resembling miRNAs (antisense loci of mir-21, mir-126, mir-150, mir-337, mir-434, and mir-3073). As more highthroughput data is acquired, these as well as other antisense loci are likely to be annotated as miRNA genes. However, <0.00002 of our miRNA reads corresponded to miRNAs from antisense loci (excluding the reads mapping ambiguously to mir-486/mir-3107), raising the possibility that none of the murine antisense miRNAs have a function comparable with that of miR-iab-as in flies (Bender 2008; Stark et al. 2008; Tyler et al. 2008). Our substantially revised set of miRNA genes provided the opportunity to speak to the general features of 475 canonical miRNAs in mice, with the properties of the 295 conserved genes applying also to the conserved genes of humans and other mammals (Table 1). Most canonical miRNA genes (61%) were clustered in the genome, falling within 50 kb of another miRNA gene, on the same genomic strand. Even when excluding the four known megaclusters (Calabrese et al. 2007), which are on chromosomes 2, 12 (two clusters), and X (with 69, 35, 16, and 18 genes, respectively), a sizable fraction of the remaining genes (153 of 337) were in clusters of two to seven genes. As observed in humans (Baskerville and Bartel 2005), miRNAs from these loci within 50 kb of each other tended to have correlated expression, consistent with their processing from polycistronic pri-miRNA transcripts (Supplemental Fig. S8). In a scenario of one transcript per cluster, the 475 canonical miRNA genes would derive from 245 transcription units. In addition, many miRNA hairpins mapped to introns. Just over a third (38%) of the hairpins fell within introns of annotated mRNAs. Several lines of evidence—including coexpression correlations, chromatin marks, and directed experiments—indicate that miRNAs can be processed from introns (Baskerville and Bartel 2005; Kim and Kim 2007; Marson et al. 2008). In this scenario, as many as 107 Table 1. Properties of canonical miRNAs Total Conserved Nonconserved Hairpins Cluster analysis In clusters In small clusters In large clusters Not in clusters Intron overlap In introns (same strand) Opposite introns Not in introns Arm preferences With miRNA from 59 arm With miRNA from 39 arm With miRNAs from both arms 475 295 180 291 153 138 184 163 129 34 132 128 24 104 52 180 22 273 77 18 200 103 4 73 202 141 137 102 65 39 132 56 76 (44%) of the 245 transcription units could double as premRNAs. Other hairpins were found within transcripts that lacked other annotated functions, falling either within introns or exons, or in transcripts without evidence of splicing. miRNA hairpins are generally thought to each give rise to a single dominant mature guide RNA. This was usually the case for the murine miRNAs, although, as in other species, this result relied on grouping together as a single functional species all the isoforms that share the same 59 terminus. This grouping is justified based on the current understanding of miRNA target recognition, which stipulates that heterogeneity often observed at miRNA 39 termini should have no effect on miRNA target recognition (Bartel 2009). Most mature miRNA reads (97%) were 20–24 nt in length, with 20mer, 21mer, 22mer, 23mer, and 24mer comprising 5%, 19%, 47%, 21%, and 4% of the reads, respectively (Supplemental Fig. S9). Although a single dominant mature species appears to be the most frequent outcome of miRNA biogenesis, some miRNA hairpins give rise to two or more species that each could function to target different sets of mRNAs. This expanded targeting potential arises from multiple mechanisms, including utilization of both strands of the miRNA:miRNA* duplex with similar frequency, 59 heterogeneity, sequential Dicer cleavage, and RNA editing. Addition of untemplated nucleotides to the 39 termini of the miRNAs can also occur, and although not thought to change targeting specificity, these changes could indicate post-transcriptional regulation of miRNA stability. Occurrence of each of these phenomena is described below. miRNAs from both arms, with occasional tissue-specific differences in the preferred arm Most canonical miRNA genes produced one dominant mature miRNA species, from either the 59 or 39 arm of the pre-miRNA hairpin, with an overall tendency to derive from the 59 arm (Table 1), as reported for previously annotated human miRNAs (Hu et al. 2009). Some, however, yielded a similar number of reads from both arms, suggesting that the two species enter the silencing complex with similar frequencies. For these genes, mature species from the 59 and 39 arms were annotated using the -5p and -3p suffixes, as is conventional in such cases (GriffithsJones 2004). Discrimination favoring one arm over the other was less pronounced for both the nonconserved miRNAs and the less highly expressed miRNAs (Fig. 5A), although for the miRNAs with very few reads this trend was likely enhanced by our requirement for a miRNA* read. Overall, the discrimination was high, with the species from the less dominant arm comprising 4.1% of the reads that map to a miRNA or miRNA*. For the 10 most abundant miRNAs (sampling just the most abundant member in cases of repetitive miRNAs), discrimination was even higher, with the less dominant arm comprising only 1.3% of the reads. Nevertheless, the miRNA* species of these more highly expressed miRNAs were sequenced at a median frequency 13-fold greater than that of the median nonconserved miRNA, suggesting that a search for GENES & DEVELOPMENT 243 999 Chiang et al. Figure 5. Reads from both arms of a hairpin, and sequential reads from the same arm. (A) Fraction and abundance of miRNA reads from each miRNA hairpin. To calculate the fraction, the miRNA reads were divided by the total number of miRNA and miRNA* reads, considering on each arm only the major 59 terminus. The dashed lines indicate the median fraction of miRNA reads and the median number of miRNA reads for conserved (red) and nonconserved (blue) miRNAs. (B) Switching of the dominant arm in different samples. For each sample, the fold enrichment of miRNA reads produced from the 59 arm over those produced from the 39 arm and vice versa was calculated. Shown are results for nonrepetitive miRNAs that switch dominant arms, with at least a fivefold differential between two samples. The samples are color-coded (key), and an asterisk indicates samples with statistically significant enrichment of miRNAs produced from one arm over the other (P < 0.05, x2 test). (C) Sequential Dicer cleavage. Predicted secondary structure of mmu-mir-3102 premiRNA (Hofacker et al. 1994). biological function for these miRNA* species might be at least as fruitful as that for the poorly expressed nonconserved miRNAs. If the mature miRNA accumulated preferentially from one arm of the pre-miRNA hairpin, the preferred arm generally remained consistent across the various libraries. For a few miRNAs, however, the preferred arms switched between samples (Fig. 5B), as reported previously using PCR-based miRNA quantification (Ro et al. 2007). For example, miR-142-5p was sequenced more frequently in ovary, testes, and brain, and miR-142-3p was sequenced more frequently in embryonic and newborn samples. These results imply a developmental switch in targeting preferences. A similar arm-switching phenomena has been reported for a sponge miRNA (Grimson et al. 2008), and was observed for 20 other nonrepetitive mouse miRNA genes (Fig. 5B). 1000 Sequential Dicer cleavage of a mirtron hairpin In plants, a few pri-miRNA hairpins with long, continuous RNA duplexes are cleaved sequentially by Dicer to generate two adjacent miRNA:miRNA* duplexes (Kurihara and Watanabe 2004; Rajagopalan et al. 2006). Those precursors bear little resemblance to the shorter, imperfectly base-paired hairpins of metazoan miRNA genes. In mice, similar precursors are found in the form of hairpin siRNA (hp-siRNA) precursors, but their expression appears to be limited to germline tissues and totipotent ES cells, which lack a robust interferon response to intracellular dsRNA (Babiarz et al. 2008; Tam et al. 2008; Watanabe et al. 2008). However, we detected two miRNA:miRNA* duplexes deriving from the mmumir-3102 pre-miRNA hairpin, an apparent mirtron as evidenced by reads mapping to both boundaries of an GENES & DEVELOPMENT 244 Mammalian microRNAs intron (Fig. 5C; Supplemental Table 3). After splicing and debranching, the excised intron was predicted to fold into a 104-nt pre-miRNA hairpin—substantially longer than the average pre-miRNA length of 61 nt (calculated from the set of confirmed miRNAs). Reads from this locus suggested that Dicer cleaved this pre-miRNA twice, with the first cut generating the outer miRNA:miRNA* duplex and the second cut generating the inner miRNA: miRNA* duplex (Fig. 5C). The inner miRNA (miR3102.2-3p) was among a set of proposed miRNA candidates (Berezikov et al. 2006b), but the most frequently sequenced species from this hairpin was the outer miRNA (miR-3102.1) (Fig. 5C). Of the 16 genomes examined, the extended mir-3102 hairpin with both the inner and outer miRNAs appeared conserved only in rats, although the orthologous loci in cows, dogs, and humans also could fold into shorter hairpins, with miR-3102.1 potentially conserved in cows. We suspect that it is more than a coincidence that the single metazoan example of a sequentially diced miRNA is initially processed by the spliceosome rather than by Drosha. One way to explain this observation is that DGCR8/Drosha interacts directly with the loop of primiRNA stem–loops when recognizing its substrates (Zeng et al. 2005), and that the lack of sequentially diced Drosha-dependent miRNA hairpins in animals reflects the limited reach of this complex. 59 Heterogeneity Most conserved miRNAs had very precise 59 processing, with alternative 59 isoforms comprising only 8% of all miRNA reads (Fig. 6A,B). These results, analogous to those observed in worms and flies (Ruby et al. 2006, 2007b), are consistent with the idea that selective pressure to avoid off-targeting acts to optimize precision of the cleavage event that produces the 59 terminus of the dominant species so as to prevent a consequential number of molecules with seed sequences in the wrong register. Moreover, 59 termini of conserved miRNAs were more precise than those of miRNA* reads (4% and 12% offset reads, respectively, excluding those that produce comparable numbers of small RNAs from each arm). For cases in which Dicer produced the 59 terminus of the miRNA, the Dicer cut appeared somewhat more precise than the Drosha cut (5% offset reads for miRNAs on the 39 arm, compared with 7% offset reads for miRNA* on the 59 arm), hinting that features of the pre-miRNA structure may supplement the distance from the Drosha cut as determinants of Dicer cleavage specificity (Ruby et al. 2006, 2007b). A few miRNAs had less uniform 59 termini (Fig. 6A,B). For some miRNAs, 59 heterogeneity has been documented previously (Ruby et al. 2007b; Stark et al. 2007; AzumaMukai et al. 2008; Wu et al. 2009), the most prominent example being hsa-miR-124, a conserved neuronal miRNA for which the 59-shifted isoform was initially annotated as the miRNA and eventually replaced by the more prominent isoform following more extensive sequencing (LagosQuintana et al. 2002; Landgraf et al. 2007). Another pro- minent miRNA with unusually diverse 59 termini was miR-133a. This conserved miRNA, which is highly expressed in heart and muscle, had a second dominant isoform (miR-133a.2) that was shifted 1 nt downstream from the annotated miRNA (miR-133a.1) (Fig. 6C; Supplemental Table 3). To test whether this heterogeneity might be explained by differential processing of the two mir-133a paralogous hairpins, as observed for the two Drosophila mir-2 hairpins (Ruby et al. 2007b), we tested the two mir133a hairpins in our ectopic expression assay. Although mir-133a-1 was somewhat more prone to produce the miR133a.2 isoform, both hairpins produced a substantial amount of both isoforms (Fig. 6C). To investigate the functional consequences of miRNA 59 heterogeneity, we examined published array data showing the responses of mRNAs after deleting either mir-223, a miRNA with substantial heterogeneity, or mir155, a miRNA with little heterogeneity. miR-223 is highly expressed in neutrophils, and analysis of small RNA sequences from isolated neutrophils (Baek et al. 2008) was consistent with our sequencing results (Supplemental Table 3) in showing 59 heterogeneity, with 81% of the reads mapping to the 59 end of the major isoform miRNA and 12% mapping to the 59 end of a second isoform that was shifted by 1 nt in the 39 direction (Fig. 6D). As expected, mRNAs with canonical 7–8mer sites (Bartel 2009) matching the seed of the major isoform were significantly derepressed in the mir-223 deletion mutant (P < 10 12, Kolmogorov–Smirnov [K–S] test, compared with no site distribution). mRNAs with canonical sites matching the minor isoform also showed a significant tendency to be derepressed, albeit to a lesser degree (P = 0.0022 3 10 7, 0.013 3 10 7, and 1.7 3 10 7, for 8mer, 7mer-m8, and 7–8mers combined, respectively) (Fig. 6D). This result could not be attributed to the overlap between sites matching the major and minor isoforms because all mRNAs with a 6mer seed match to the major isoform (ACUGAC) were excluded, and additional analyses ruled out participation of the ‘‘shifted 6mer’’ match (Friedman et al. 2009) to the major isoform (AACUGA) (Supplemental Fig. S10A). Analogous analysis of miR-155 yielded strong evidence for function of the major isoform (Rodriguez et al. 2007) but no sign of function for the minor isoform, which comprised very few (1%) of our miR-155 reads (Fig. 6E; Supplemental Table 3). Taken together, our results show that some miRNAs have alternative 59 miRNA isoforms that are expressed at levels sufficient to direct the repression of a distinct set of endogenous targets and thereby broaden the regulatory impact of the miRNA genes. Therefore, we suggest that, rather than choosing one isoform over the other for annotation as the authentic miRNA, more of these alternative isoforms should be annotated, with the expectation that, for some highly expressed miRNAs, more than one 59 isoform contributes to miRNA function. RNA editing RNA editing in which adenosine is deaminated and thereby converted to inosine (I) has been reported for GENES & DEVELOPMENT 245 1001 Chiang et al. Figure 6. miRNAs with 59 heterogeneity. (A) The distribution of conserved (red) and nonconserved (blue) miRNAs with reads #5 nt offset at their 59 terminus. (B) The fraction of offset reads and abundance of reads for each miRNA hairpin, colored as in A. The dashed lines indicate the median level of reads for conserved (red) and nonconserved (blue) miRNAs. (C) 59 Heterogeneity of miR-133a. Data from mouse heart (Rao et al. 2009) and newborn are mapped to the mir-133a-1 hairpin (top), and data from the ectopic expression assay are mapped to the indicated transfected hairpin (bottom). The lines indicate miR-133a.1 (dark blue) and miR-133a.2 (light blue), and red nucleotides indicate those that differ between mir-133a-1 and mir-133a-2. (D) Effect of losing miR-223 on messages with 39 UTR sites for miR-223 major and minor isoforms. (Top) Small RNA sequencing data from mouse neutrophils (Baek et al. 2008) were mapped to the mir-223 hairpin as in C. For each set of messages with the indicated 39 UTR site for miR-233 (major isoform sites, bottom left; minor isoform sites, bottom right), the fraction that changed at least to the degree indicated following loss of miR-223 is plotted, using data published for neutrophils differentiated in vivo (Baek et al. 2008). (E) Effect of losing miR-155 on messages with 39 UTR sites for miR155 major and minor isoforms, plotted as in D using published data from T cells (Rodriguez et al. 2007). (Top) Sequencing data from our study are mapped to the mir-155 hairpin as in C. The mRNAs with 8mer and 7mer-A1 sites for the minor isoform were excluded from the analysis because these sites overlapped with 7mer-m8 sites for the major isoform. 1002 GENES & DEVELOPMENT 246 Mammalian microRNAs some miRNA precursors (Blow et al. 2006; Landgraf et al. 2007; Kawahara et al. 2008). Because I pairs with C, such edits could change miRNA target recognition. Reasoning that the mammalian adenosine deaminases (ADARs) responsible for A-to-I editing are expressed primarily in the brain, we searched for sequencing reads from the brain that did not match the genome and had as their closest match a mature miRNA or miRNA*. After filtering for mismatches occurring >2 nt from the 39 end, a step taken to avoid considering instances of untemplated 39-terminal addition, only 4% of the reads had single mismatches to the genome (Supplemental Fig. S11A). Moreover, the fraction of sequences with A-to-G changes (indicative of A-to-I editing) was only 0.61%, a fraction resembling that of other mismatches (Supplemental Fig. S11A). This fraction was also similar to that of the A-to-G changes in our synthetic internal standards used for preparing the sequencing libraries. These results indicate that mature edited miRNAs are very rare and difficult to distinguish above the background level of sequencing errors. The low frequency of editing in mature miRNAs was consistent with the findings that edited processed miRNAs are more than fourfold less common in mice relative to humans (Landgraf et al. 2007), and are less common than edited miRNA precursors (Kawahara et al. 2008). The latter observation might be due to rapid degradation or impaired processing, which has been shown for miR-142 (Yang et al. 2006) and miR-151 (Kawahara et al. 2007a). Although editing did not appear to be a widespread phenomenon among all mature miRNAs, editing at specific sites might still be important for a few individual miRNAs. To investigate this possibility, mismatch fractions were calculated as the fraction of reads bearing a particular mismatch over all reads covering that genomic position. For each library, a change was considered significant if the fraction exceeded 5% and at least 10 reads contained the mismatch. Additional filters designed to remove sequencing errors, alignment artifacts, and instances of untemplated nucleotide addition preferentially retained A-to-G changes while removing nearly all other events (Supplemental Fig. S11B). Sixteen A-to-G events passed the filters and subsequent manual examination, all of which occurred only in the brain library (Table 2). Five of these inferred editing sites were also observed in a low-throughput sequencing effort in human brain samples (Kawahara et al. 2008), indicating that editing of some miRNAs is conserved between mammals. Consistent with that study, eight of 16 editing sites occurred in a UAG motif. A separate examination of read alignments with up to three mismatches showed that the vast majority of edited reads were edited at one position, suggesting that either editing of multiple sites in the same RNA molecule is rare, or multiply edited RNAs are degraded more rapidly. A-to-I editing of a seed nucleotide would dramatically affect targeting. In addition to editing in the miR-376 cluster described previously (Kawahara et al. 2007b, 2008), we found another eight miRNAs that are edited within the seed of either the miRNA or the miRNA*. A-to-I editing could also affect miRNA loading, and thereby indirectly affect targeting. Indeed, the editing of miR-540 might Table 2. Inferred A-to-I editing sites in miRNAs miRNA miR-219-2-3p miR-337-3p miR-376a* miR-376b-3p miR-376c miR-378 miR-379* miR-381 miR-411-5p miR-421 miR-467d miR-497 miR-497* miR-540* miR-1251 miR-3099 Position Fraction edited 15 10 4 6 6 16 5 4 5 14 3 2 20 3 6 7 0.064 0.062 0.297 0.501 0.311 0.087 0.095 0.125 0.239 0.054 0.094 0.104 0.699 0.080 0.431 0.209 help explain why the 59 arm is more abundant in the brain than in other tissues, although editing is too infrequent to fully explain the switch in strand bias. Altering Drosha and Dicer processing could also indirectly affect targeting. Analysis of 59 ends showed that seven of 16 instances of editing were associated with a statistically significant (P < 0.05) shift in the 59 nucleotide, presumably due to changes in the Drosha and Dicer cleavage site (Supplemental Fig. S11D). Untemplated nucleotide addition Much more prevalent than editing of internal nucleotides was addition of untemplated nucleotides to miRNA 39 termini. As reported previously for miRNAs in mammals (Landgraf et al. 2007), and also observed for those of worms and flies (Ruby et al. 2006, 2007b), nucleotides most frequently added to murine miRNAs were U and A (Fig. 7A). Addition of C or G was no higher than background, as estimated by monitoring apparent addition to tRNA fragments (Fig. 7A). Possible sources of the background rate could be sequencing error, transcription error, or a low level of biological nucleotide addition. Some miRNAs were much more frequently extended than others (Supplemental Table 7). One very frequently extended miRNA was miR-143, for which the extended reads outnumbered the nonextended ones (196,565 compared with 114,980 reads, respectively). For extension by U, RNAs from the pre-miRNA 39 arm were three times more frequently extended than were those from the 59 arm (Fig. 7A,B, P = 2.3 3 10 4, K–S test). This preference, not observed for the A extension (Fig. 7A,C), suggests that much of the U extension occurs to the pre-miRNA, prior to Dicer cleavage—a state in which the 39 arm but not the 59 arm would be available for extension (Fig. 7D). TUT4-catalyzed poly(U) addition to the let-7 premiRNA, which is specified by Lin28, plays an important role in post-transcriptional repression of let-7 expression (Heo et al. 2008, 2009; Hagan et al. 2009). Our analyses indicating untemplated U extension to many other premiRNAs hint that this type of regulation may not be GENES & DEVELOPMENT 247 1003 Chiang et al. Figure 7. Untemplated nucleotide addition. (A) Untemplated nucleotide addition rate for miRNA and miRNA* reads from the indicated arm. Rates for each miRNA are provided (Supplemental Table 6). As a control, tRNA degradation fragments were analyzed similarly. Numbers of genes analyzed are indicated in parentheses. (B) Distribution of rates for untemplated U addition to RNAs from the 59 arm (blue) and from the 39 arm (red). (C) Distribution of rates for untemplated A addition to RNAs from the 59 arm (blue) and from the 39 arm (red). (D) Schematic of the biogenesis stage in which U could be added to the RNA of only one arm (pre-miRNA, left), and the stage in which U could be added to the RNA of either arm (mature miRNA and miRNA*, right). limited to let-7, but that analogous pathways, presumably using mediators other than Lin28, act to regulate the expression of other murine miRNAs. Discussion The status of miRNA gene discovery in mammals Our current study sets aside nearly a third (173 of 564) of the miRBase version 14.0 gene annotations for lack of convincing evidence that these produce authentic miRNAs. It also adds another 108 novel miRNA loci, raising the question of how many more authentic loci remain undiscovered. This question is difficult to answer. Ever since the recognition that the poorly conserved miRNAs are also the ones expressed at lower levels in mammals, and thus are the most difficult to detect by both computational and experimental methods, we have known that it is impossible to provide a meaningful estimate of the number of mammalian miRNA genes remaining to be discovered (Bartel 2004). The broadly conserved miRNAs are another matter. Only three of the 88 novel canonical miRNAs had recognizable orthologs sequenced in chickens, lizards, frogs, or fish, and these three were antisense to previously annotated broadly conserved miRNA genes. Therefore, apart from miRNAs expressed at very low levels from the antisense strand of known genes, we suspect that the list of broadly conserved miRNA gene families is nearing completion. The current set of murine miRNA genes includes 192 genes that fall into 89 broadly conserved miRNA gene families (Supplemental Table 6). Another 107 miRNA gene families appeared conserved in other mammals (Supplemental Table 6). These were represented by 120 murine genes, including 14 novel 1004 genes. Of these novel genes, 11 were founding members of novel conserved gene families. Some of these were identified with only 11 reads, indicating that additional pan-mammalian gene families remain to be found, although we have no evidence supporting the idea that the number of conserved gene families will rise to the very high levels suggested by some earlier computational studies (Berezikov et al. 2005, 2006b; Xie et al. 2005). For now, we can say that mammals have at least 196 conserved miRNA gene families represented in mice by at least 312 pre-miRNA hairpins (303 canonical and nine noncanonical hairpins) produced from at least 194 unique transcription units. Because a single miRNA hairpin can produce multiple functional isoforms, generated by either 59 processing heterogeneity or utilization of both arms of the miRNA duplex, a single conserved hairpin can produce more than one conserved miRNA isoform. Because the different isoforms have different seed sequences, they fall into different families of mature miRNAs. Thus, the number of conserved families of miRNAs (i.e., mature guide RNAs) will exceed the number of conserved families of genes (i.e., hairpins). Perhaps the best known example of a hairpin with two broadly conserved isoforms is mir-9, for which conserved miRNAs from both arms of the hairpin are readily detected by using in situ hybridization in both zebrafish and marine annelids (Wienholds et al. 2005; Christodoulou et al. 2010). Numerous conserved genes produce more than one miRNA isoform (Figs. 5A, 6A), but for most of these we do not yet know whether production of the alternative isoform is conserved in other species. High-throughput sequencing from other species will help identify many additional conserved GENES & DEVELOPMENT 248 Mammalian microRNAs isoforms. We anticipate that the discovery of multiple conserved isoforms will contribute much more to the future growth in the list of broadly conserved miRNA families than will the discovery of new conserved genes. As expected, the conserved miRNAs tended to be expressed at much higher levels than were the nonconserved ones, with the median read frequency of conserved miRNAs 44-fold greater than that of the nonconserved miRNAs (Figs. 5A, 6B). Therefore, even if many nonconserved miRNA genes remained to be found, these would add little to the number of annotated miRNA molecules in a given cell or tissue, and presumably even less to the impact of miRNAs on gene expression (Bartel 2009). Indeed, even more pressing than the question of how many poorly conserved miRNAs remain undetected is the question of whether any of the known poorly conserved miRNAs have any consequential function in the animal. Most of these poorly conserved miRNAs could have derived from transcripts that fortuitously acquired hairpin regions with features needed for some Drosha/Dicer processing. In this scenario, most of these newly emergent miRNAs will be lost during the course of evolution before ever acquiring the expression levels needed to have a targeting function sufficient for their selective retention in the genome. Consistent with the hypothesis that most of these miRNAs play inconsequential regulatory roles, these miRNAs generally accumulated to much lower levels in our ectopic expression assay, (Fig. 3B, median read frequencies of 58 and 844 for nonconserved and conserved miRNAs, respectively), and they displayed weaker specificity for one arm of the hairpin (Fig. 5A), as would be expected if there was no advantage for the cell to efficiently use their respective hairpins. Nonetheless, some were processed efficiently, and at least a few poorly conserved miRNAs probably have acquired consequential species-specific functions. Although none have known functions, such hairpins are worthy of annotation as miRNA loci (just as protein-coding genes can be annotated before the protein is known to be functional), and as a class these newly emergent miRNAs could provide an important evolutionary substrate for the emergence of new regulatory activities. The major challenge for miRNA gene discovery stems from the difficulty in proving that a nonconserved, poorly expressed candidate is an authentic miRNA, combined with the even greater difficulty in proving that a questionable candidate is not an authentic miRNA. This challenge has become all the more acute as miRNA discovery has reached the point to which nearly all of the novel candidates are both nonconserved and poorly expressed. Our approach of testing pools of candidates in an ectopic expression assay provides useful data for evaluating miRNA authenticity. However, our approach cannot provide conclusive proof for or against the authenticity of a proposed candidate, leaving open the possibility that some of the nonconserved, poorly expressed candidates that we classify as ‘‘confidently identified miRNAs’’ are false positives. When considering the limitations of the current tools for miRNA gene identification, this possi- bility cannot be avoided. Therefore, if any nonconserved, poorly expressed miRNAs are annotated as miRNAs, the resulting list of miRNAs will have to be somewhat fuzzy, with an expectation that some of the annotated genes will not be authentic miRNAs. This expectation should not be viewed as advocating the indiscriminant annotation of all candidates as miRNAs. Our proposal is that miRNA gene discovery efforts should annotate as miRNAs only those novel candidates that both are found in highthoughput sequencing libraries and pass a set of criteria that is sufficiently stringent such that a majority of the novel canonical miRNAs are cleanly processed in a Drosha-dependent manner when using the ectopic expression assay. Although implementing this proposal would not prevent all false positives from entering the databases, it would preserve a higher quality set of miRNAs while eliminating few authentic annotations. Those wanting to take additional measures to avoid false positives could focus on only the subset of miRNAs that both meet these criteria and are conserved in other species. Unknown features required for Drosha/Dicer processing Before learning the results of our experiments, we wondered whether any ectopically overexpressed hairpin of suitable length would be processed as if it were a miRNA, a result that would have rendered our assay too permissive to be of value. In this scenario, most of the specificity that distinguished authentic miRNA genes from other regions of the genome with the potential to produce transcripts that fold into seemingly miRNA-like hairpins would have been a function of whether or not the regions were transcribed. This scenario was not realized, however, and our assay turned out to be informative, which illustrates how much of Drosha/Dicer substrate recognition still remains unknown. Many of the previously proposed miRNA hairpins that had no reads in our mouse samples were indistinguishable from authentic miRNA hairpins with regard to the known determinants for Drosha/Dicer recognition, yet none of these unconfirmed hairpins produced miRNA and miRNA* molecules in our very sensitive assay (Fig. 2C,D). These results showing that major processing specificity determinants still remain undiscovered point to the importance of finding these determinants—efforts that, if successful, will mark the next substantive advance in accurately predicting and annotating metazoan miRNAs. Materials and methods Library preparation Total RNA samples from mouse ovary, testes, and brain were purchased from Ambion, and total RNA from mouse E7.5, E9.5, E12.5, and newborn were obtained from the Chess laboratory. The small RNA cDNA libraries were made as described (Grimson et al. 2008), except for the 39 adaptor ligation, which was 59 adenylated pTCGTATGCCGTCTTCTGCTTGidT. For a detailed protocol, see http://web.wi.mit.edu/bartel/pub/protocols.html. GENES & DEVELOPMENT 249 1005 Chiang et al. miRNA discovery The reads with inserts of 16–27 nt were processed as described (Babiarz et al. 2008). The miRNA candidates were identified using reads matching genomic regions that were not very highly repetitive (reads with <500 genomic matches). Reads from all data sets were combined and grouped by their 59-terminal loci, requiring that each candidate 59 locus pass five criteria listed in the text. (1) To pass the expression criterion, a candidate required $10 normalized reads. (2) To address the hairpin requirement, the secondary structure of the candidate was evaluated by selecting for each 59-terminal locus the most abundant sequence and extending its 59 end by 2 nt to define the range of the potential miRNA/miRNA* duplex. Three genomic windows were extracted with the 59 end extended an additional 10 nt and the 39 end extended either 50 nt, 100 nt, or 150 nt. Three more windows were extracted extending the 39 end by 10 nt and the 59 end another 50 nt, 100 nt, or 150 nt. The secondary structure of each of the six windows was predicted using RNAfold (Hofacker et al. 1994), and the number of hairpin base pairs (denoted using bracket notation) involving the 59-extended miRNA candidate was calculated as the absolute value of ([number of 59-facing brackets] [number of 39-facing brackets]). A candidate with a minimum of 16 bp using at least one of the six genomic windows satisfied the hairpin criteria. (3) The candidates with non-miRNA biogenesis were found by mapping to annotated noncoding RNA loci (rRNA, tRNA, snRNA, and srpRNA). (4) The candidates likely produced by degradation were defined as those failing the 59 homogeneity requirement. A candidate satisfied the 59 homogeneity requirement if at least half of the reads within 30 nt of the candidate 59 end were present within 2 nt of the candidate 59 end and if the candidate 59 end comprised at least half of the reads within 2 nt of the candidate 59 end, or if there was only one other 59 end within 30 nt of the candidate 59 end that had more than half of the reads mapping to the candidate 59 end. (5) Manual inspection of reads mapped to predicted secondary structures identified candidates accompanied by potential miRNA* reads. For 10 previously annotated miRNAs and seven novel miRNAs, a suitable miRNA* read was found only after considering alternative hairpin folds predicted to be suboptimal using mfold (Mathews et al. 1999; Zuker 2003). For the analysis of mir-290, mir-291a, mir-291b, mir-292, mir293, mir-294, and mir-295, which are not present in mm8 genome assembly, we mapped all reads to mm9 genome assembly corresponding to the region [chr7(+): 3,218,627–3,220,842]. For conservation analysis, a candidate was considered broadly conserved if the hairpin structure and the seed sequence were conserved to chickens, fish, frogs, or lizards (galGal3, danRer5, xenTro2, and anoCar1, respectively) in the University of California at Santa Cruz whole-genome alignments (Kuhn et al. 2009). To identify a candidate conserved in mammals, we looked at 12 additional genomes (bosTau3, canFam2, cavPor2, equCab1, hg18, loxAfr1, monDom4, ornAna1, panTro2, ponAbe2, rheMac2, and rn4) and calculated the branch length score from a phylogenetic tree trained on mouse 39 UTR data (Friedman et al. 2009), using the cutoff score of 0.7. A gene was considered to be in a conserved miRNA gene family if the hairpin produced a miRNA with a seed matching that of a conserved miRNA (Supplemental Table 6). Ectopic overexpression assays To generate expression constructs, pre-miRNA hairpins and the surrounding regions were amplified from human genomic DNA (NCI-BL2126) or from mouse BL6 genomic DNA using Pfu Ultra II 1006 polymerase (Stratagene) and primers with Gateway (Invitrogen)compatible ends designed to anneal ;100 nt upstream of and downstream from the miRNA hairpins. PCR products were inserted into Gateway vector pDONR221 and subsequently into pcDNA3.2/V5-DEST, and the resulting plasmids were transformed into DH5-a cells. Positive clones were selected by colony PCR and were sequenced. Clones that did not have a mutation within premiRNA hairpins were selected. Plasmid DNA from the confirmed expression clones was purified for transfection using the Plasmid Mini Kit (Qiagen). For each standard assay, plasmids for up to 10 hairpin expression constructs were mixed in equal amounts to create seven or eight pools of ;1.4 mg of DNA each, with each pool including one to three positive control hairpins. HEK293T cells were cultured in DMEM supplemented with 10% FBS, and were plated in 12-well plates ;24 h prior to transfection to reach ;80%–90% confluency. Each well of cells was transfected with one pool of DNA using Lipofectamine 2000 (Invitrogen). For the standard assays, 145–200 ng of pMaxGFP (Amaxa) was cotransfected with each pool to enable transfection efficiency to be confirmed by GFP expression. Control wells (no hairpin plasmid) were transfected only with 145 ng of pMaxGFP. For the Drosha/Dicer dependency assays, seven to eight hairpin constructs were combined to create six pools of ;400 ng each. Each pool was mixed with 1.2 mg of the pCK-Drosha-Flag(TN) (TNdrosha), pCK-Flag-Dicer(TN) (TNdicer), or pCK-dsRed.T4 (control vector, constructed by replacing the Drosha-coding sequence of TNdrosha with dsRed-coding sequence) and used to transfect one well of HEK293T cells as above. Control wells were transfected with 1.2 mg of either TNdrosha, TNdicer, or control vector. For the dependency assays, each transfection was performed in duplicate wells. Cells from all assays were harvested 39–48 h after transfection. Cells from each treatment were combined, total RNA was extracted using TriReagent (Ambion), and small RNA libraries were prepared for Illumina sequencing. The reads were processed as above, and RNA species were matched to the transfected hairpins. In the standard assay, reads were normalized by the median of the 30 most frequently sequenced endogenous miRNAs. For assays testing Drosha/Dicer dependency, reads were normalized based on the number of reads corresponding to an 18-nt internal standard that had been spiked into equivalent amounts of total RNA prior to beginning library preparation. Reads matching the transfected hairpins were grouped by their 59 termini (59-terminal locus). The locus with the largest number of reads was considered the 59-terminal locus of the mature miRNA produced by the hairpin, and similarly, the most dominant 59 locus on the opposite arm was considered the miRNA*. The normalized miRNA and miRNA* read numbers were summed to calculate the expression level. If an overexpressed hairpin generated mature miRNA with the dominant 59-terminal locus corresponding to the expected locus and at least one read corresponding to the miRNA* with an ;2-nt 39 overhang, it was considered expressed. A hairpin was classified as overexpressed if there were at least threefold more reads in the hairpin transfection than in the control transfection, after adding psuedocounts of five to both. A hairpin was classified as Droshaor Dicer-dependent if the knockdown was at least threefold. Identification of arm-switching miRNAs To determine the read numbers from the 59 and 39 arms, reads from each sample were grouped based on their 59 termini, and the read numbers were tallied for those corresponding to the miRNA or miRNA* 59 terminus. Only samples with five or more reads on either arm were considered. The fold enrichment was calculated as the ratio of 59 and 39 arm reads after adding pseudocounts of one. GENES & DEVELOPMENT 250 Mammalian microRNAs RNA editing analysis Sequencing libraries from individual tissues were combined and mapped to the genome using the Bowtie alignment tool (Langmead et al. 2009). The alignments were filtered for sequences that uniquely aligned to the genome, contained at most one mismatch to the genome, and had 59 ends that mapped to within 1 nt of an annotated miRNA or miRNA* 59 end. The 12 possible mismatch types were then quantified at each position covered by the filtered reads. For example, to screen for A-to-G mismatches indicative of A-to-I editing sites, the editing fraction was calculated as the number of reads containing an A-to-G mismatch at a particular position, divided by the number of filtered reads covering that position. Sites were considered editing candidates if the editing fraction was >5%, had at least 10 A-to-G mismatch reads, and did not occur in the last 2 nt of the corresponding miRNA or miRNA*. Candidate editing sites were then manually examined and discarded if an alternative explanation was more parsimonious. For example, the only nonbrain editing candidate mapped to let-7c-1, but was most likely due to a handful of let-7b reads containing untemplated nucleotide additions that fortuitously matched the let-7c-1 locus. Consistent with this explanation, the putatively edited reads were unusually long and at unusually low abundance. Candidate editing sites were also checked in the Perlegen SNP database (Frazer et al. 2007) and dbSNP; no editing candidates corresponded to known SNPs. Untemplated nucleotide analysis To examine untemplated nucleotide addition, non-genome-mapping reads were filtered for those that matched miRNA or miRNA* sequences but also included a nongenomic poly(N) at the 39 end. The untemplated nucleotide addition rate was calculated as the ratio of reads with the untemplated nucleotide to the sum of the reads with and without the untemplated nucleotide. After excluding miRNAs that map to multiple loci, and any miRNAs or miRNA*s with a genomic T at the position immediately 39 of the annotated sequence, there were 343 miRNA/miRNA* species with untemplated U on the 59 arm and 318 on the 39 arm. Similarly, there were 287 59 arm species with untemplated A on the 59 arm and 324 on the 39 arm. The background tRNA untemplated U addition rate was calculated similarly. A two-sided K–S test was used to assess significant differences in distributions. Accession numbers All small RNA reads are available at the GEO database with accession number GSE20384. Acknowledgments We thank N. Lau and A. Chess for embryonic and newborn total RNA, R. Friedman for calculating branch length scores for the analysis of conservation, A. Marson and N. Hannet for technical advice, and V.N. Kim for TNdrosha and TNdicer plasmids. This work was supported by a grant from the NIH (GM067031) to D.B. References Azuma-Mukai A, Oguri H, Mituyama T, Qian ZR, Asai K, Siomi H, Siomi MC. 2008. Characterization of endogenous human Argonautes and their miRNA partners in RNA silencing. Proc Natl Acad Sci 105: 7964–7969. Babiarz JE, Ruby JG, Wang YM, Bartel DP, Blelloch R. 2008. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes & Dev 22: 2773–2785. Baek D, Villén J, Shin C, Camargo FD, Gygi SP, Bartel DP. 2008. The impact of microRNAs on protein output. Nature 455: 64–71. Bartel DP. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 116: 281–297. Bartel DP. 2009. MicroRNAs: Target recognition and regulatory functions. Cell 136: 215–233. Baskerville S, Bartel DP. 2005. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11: 241–247. Bender W. 2008. MicroRNAs in the Drosophila bithorax complex. Genes & Dev 22: 14–19. Bentwich I, Avniel A, Karov Y, Aharonov R, Gilad S, Barad O, Barzilai A, Einat P, Einav U, Meiri E, et al. 2005. Identification of hundreds of conserved and nonconserved human microRNAs. Nat Genet 37: 766–770. Berezikov E, Guryev V, van de Belt J, Wienholds E, Plasterk RHA, Cuppen E. 2005. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120: 21–24. Berezikov E, Thuemmler F, van Laake LW, Kondova I, Bontrop R, Cuppen E, Plasterk RHA. 2006a. Diversity of microRNAs in human and chimpanzee brain. Nat Genet 38: 1375–1377. Berezikov E, van Tetering G, Verheul M, van de Belt J, van Laake L, Vos J, Verloop R, van de Wetering M, Guryev V, Takada S, et al. 2006b. Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis. Genome Res 16: 1289–1298. Berezikov E, Chung WJ, Willis J, Cuppen E, Lai EC. 2007. Mammalian mirtron genes. Mol Cell 28: 328–336. Blow MJ, Grocock RJ, van Dongen S, Enright AJ, Dicks E, Futreal PA, Wooster R, Stratton MR. 2006. RNA editing of human microRNAs. Genome Biol 7: R27. doi: 10.1186/gb-2006-7-4-r27. Calabrese JM, Seila AC, Yeo GW, Sharp PA. 2007. RNA sequence analysis defines Dicer’s role in mouse embryonic stem cells. Proc Natl Acad Sci 104: 18097–18102. Chen C-Z, Li L, Lodish HF, Bartel DP. 2004. MicroRNAs modulate hematopoietic lineage differentiation. Science 303: 83–86. Christodoulou F, Raible F, Tomer R, Simakov O, Trachana K, Klaus S, Snyman H, Hannon GJ, Bork P, Arendt D. 2010. Ancient animal microRNAs and the evolution of tissue identity. Nature 463: 1084–1088. Cummins JM, He YP, Leary RJ, Pagliarini R, Diaz LA, Sjoblom T, Barad O, Bentwich Z, Szafranska AE, Labourier E, et al. 2006. The colorectal microRNAome. Proc Natl Acad Sci 103: 3687–3692. Frazer KA, Eskin E, Kang HM, Bogue MA, Hinds DA, Beilharz EJ, Gupta RV, Montgomery J, Morenzoni MM, Nilsen GB, et al. 2007. A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature 448: 1050–1053. Friedman RC, Farh KKH, Burge CB, Bartel DP. 2009. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19: 92–105. Griffiths-Jones S. 2004. The microRNA registry. Nucleic Acids Res 32: D109–D111. doi: 10.1093/nar/gkh023. Grimm D, Streetz KL, Jopling CL, Storm TA, Pandey K, Davis CR, Marion P, Salazar F, Kay MA. 2006. Fatality in mice due to oversaturation of cellular microRNA/short hairpin RNA pathways. Nature 441: 537–541. Grimson A, Srivastava M, Fahey B, Woodcroft BJ, Chiang HR, King N, Degnan BM, Rokhsar DS, Bartel DP. 2008. Early origins and evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455: 1193–1197. Hagan JP, Piskounova E, Gregory RI. 2009. Lin28 recruits the TUTase Zcchc11 to inhibit let-7 maturation in mouse embryonic stem cells. Nat Struct Mol Biol 16: 1021–1025. GENES & DEVELOPMENT 251 1007 Chiang et al. Han JJ, Lee Y, Yeom KH, Nam JW, Heo I, Rhee JK, Sohn SY, Cho YJ, Zhang BT, Kim VN. 2006. Molecular basis for the recognition of primary microRNAs by the Drosha–DGCR8 complex. Cell 125: 887–901. Han J, Pedersen JS, Kwon SC, Belair CD, Kim Y-K, Yeom K-H, Yang W-Y, Haussler D, Blelloch R, Kim VN. 2009. Posttranscriptional crossregulation between Drosha and DGCR8. Cell 136: 75–84. Heo I, Joo C, Cho J, Ha M, Han JJ, Kim VN. 2008. Lin28 mediates the terminal uridylation of let-7 precursor microRNA. Mol Cell 32: 276–284. Heo I, Joo C, Kim Y-K, Ha M, Yoon M-J, Cho J, Yeom K-H, Han J, Kim VN. 2009. TUT4 in concert with Lin28 suppresses microRNA biogenesis through pre-microRNA uridylation. Cell 138: 696–708. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. 1994. Fast folding and comparison of rna secondary structures. Monatsh Chem 125: 167–188. Houbaviy HB, Murray MF, Sharp PA. 2003. Embryonic stem cell-specific microRNAs. Dev Cell 5: 351–358. Hu H, Yan Z, Xu Y, Hu H, Menzel C, Zhou Y, Chen W, Khaitovich P. 2009. Sequence features associated with microRNA strand selection in humans and flies. BMC Genomics 10: 413. Kawahara Y, Zinshteyn B, Chendrimada TP, Shiekhattar R, Nishikura K. 2007a. RNA editing of the microRNA-151 precursor blocks cleavage by the Dicer–TRBP complex. EMBO Rep 8: 763–769. Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K. 2007b. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315: 1137– 1140. Kawahara Y, Megraw M, Kreider E, Iizasa H, Valente L, Hatzigeorgiou AG, Nishikura K. 2008. Frequency and fate of microRNA editing in human brain. Nucleic Acids Res 36: 5270–5280. Kim Y-K, Kim VN. 2007. Processing of intronic microRNAs. EMBO J 26: 775–783. Kuchenbauer F, Morin RD, Argiropoulos B, Petriv OI, Griffith M, Heuser M, Yung E, Piper J, Delaney A, Prabhu AL, et al. 2008. In-depth characterization of the microRNA transcriptome in a leukemia progression model. Genome Res 18: 1787–1797. Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, et al. 2009. The UCSC Genome Browser Database: Update 2009. Nucleic Acids Res 37: D755–D761. doi: 10.1093/nar/gkn875. Kurihara Y, Watanabe Y. 2004. Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci 101: 12753–12758. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294: 853–858. Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. 2002. Identification of tissue-specific microRNAs from mouse. Curr Biol 12: 735–739. Lagos-Quintana M, Rauhut R, Meyer J, Borkhardt A, Tuschl T. 2003. New microRNAs from mouse and human. Rna 9: 175– 179. Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, et al. 2007. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129: 1401–1414. Langmead B, Trapnell C, Pop M, Salzberg S. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25. doi: 10.1186/gb-200910-3-r25. 1008 Lau NC, Lim LP, Weinstein EG, Bartel DP. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294: 858–862. Lee RC, Ambros V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294: 862–864. Lee Y, Ahn C, Han JJ, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark O, Kim S, et al. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415–419. Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP. 2003. Vertebrate microRNA genes. Science 299: 1540. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ. 2005. Elucidation of the small RNA component of the transcriptome. Science 309: 1567–1569. Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U. 2004. Nuclear export of microRNA precursors. Science 303: 95–98. Marson A, Levine SS, Cole MF, Frampton GM, Brambrink T, Johnstone S, Guenther MG, Johnston WK, Wernig M, Newman J, et al. 2008. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134: 521–533. Mathews DH, Sabina J, Zuker M, Turner DH. 1999. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288: 911–940. Mineno J, Okamoto S, Ando T, Sato M, Chono H, Izu H, Takayama M, Asada K, Mirochnitchenko O, Inouye M, et al. 2006. The expression profile of microRNAs in mouse embryos. Nucleic Acids Res 34: 1765–1771. Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC. 2007. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130: 89–100. Pruitt KD, Tatusova T, Maglott DR. 2005. NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33: D501–D504. doi: 10.1093/nar/gki025. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. 2006. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes & Dev 20: 3407–3425. Rao PK, Toyama Y, Chiang HR, Gupta S, Bauer M, Medvid R, Reinhardt F, Liao R, Krieger M, Jaenisch R, et al. 2009. Loss of cardiac microRNA-mediated regulation leads to dilated xardiomyopathy and heart failure. Circ Res 105: 585–594. Ro S, Park C, Young D, Sanders KM, Yan W. 2007. Tissuedependent paired expression of miRNAs. Nucleic Acids Res 35: 5944–5953. Rodriguez A, Vigorito E, Clare S, Warren MV, Couttet P, Soond DR, van Dongen S, Grocock RJ, Das PP, Miska EA, et al. 2007. Requirement of bic/microRNA-155 for normal immune function. Science 316: 608–611. Ruby JG, Jan C, Player C, Axtell MJ, Lee W, Nusbaum C, Ge H, Bartel DP. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell 127: 1193–1207. Ruby JG, Jan CH, Bartel DP. 2007a. Intronic microRNA precursors that bypass Drosha processing. Nature 448: 83–86. Ruby JG, Stark A, Johnston WK, Kellis M, Bartel DP, Lai EC. 2007b. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 17: 1850–1864. Seo TS, Bai XP, Ruparel H, Li ZM, Turro NJ, Ju JY. 2004. Photocleavable fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific coupling chemistry. Proc Natl Acad Sci 101: 5488–5493. Stark A, Kheradpour P, Parts L, Brennecke J, Hodges E, Hannon GJ, Kellis M. 2007. Systematic discovery and characterization GENES & DEVELOPMENT 252 Mammalian microRNAs of fly microRNAs using 12 Drosophila genomes. Genome Res 17: 1865–1879. Stark A, Bushati N, Jan CH, Kheradpour P, Hodges E, Brennecke J, Bartel DP, Cohen SM, Kellis M. 2008. A single Hox locus in Drosophila produces functional microRNAs from opposite DNA strands. Genes & Dev 22: 8–13. Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S, Hodges E, Anger M, Sachidanandam R, Schultz RM, et al. 2008. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453: 534–538. Tyler DM, Okamura K, Chung W-J, Hagen JW, Berezikov E, Hannon GJ, Lai EC. 2008. Functionally distinct regulatory RNAs generated by bidirectional transcription and processing of microRNA loci. Genes & Dev 22: 26–36. Voorhoeve PM, le Sage C, Schrier M, Gillis AJM, Stoop H, Nagel R, Liu Y-P, van Duijse J, Drost J, Griekspoor A, et al. 2006. A genetic screen implicates miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell 124: 1169– 1181. Watanabe T, Totoki Y, Toyoda A, Kaneda M, KuramochiMiyagawa S, Obata Y, Chiba H, Kohara Y, Kono T, Nakano T, et al. 2008. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453: 539–543. Wienholds E, Kloosterman WP, Miska E, Alvarez-Saavedra E, Berezikov E, de Bruijn E, Horvitz HR, Kauppinen S, Plasterk RHA. 2005. MicroRNA expression in zebrafish embryonic development. Science 309: 310–311. Wu H, Ye C, Ramirez D, Manjunath N. 2009. Alternative processing of primary microRNA transcripts by Drosha generates 59 end variation of mature microRNA. PLoS One 4: e7566. doi: 10.1371/journal.pone.0007566. Xie XH, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. 2005. Systematic discovery of regulatory motifs in human promoters and 39 UTRs by comparison of several mammals. Nature 434: 338–345. Yang W, Chendrimada TP, Wang Q, Higuchi M, Seeburg PH, Shiekhattar R, Nishikura K. 2006. Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13: 13–21. Yi R, Qin Y, Macara IG, Cullen BR. 2003. Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs. Genes & Dev 17: 3011–3016. Zeng Y, Yi R, Cullen BR. 2005. Recognition and cleavage of primary microRNA precursors by the nuclear processing enzyme Drosha. EMBO J 24: 138–148. Zuker M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31: 3406–3415. GENES & DEVELOPMENT 253 1009 Chiang135681_SuppFig1 Undetected annotated miRNAs (157) Not sequenced (52) Not enough reads (72) Failed other filters (33) DGCR8 & Dicer-dependent (290, 226) DGCR8-dependent (2, 2) Confirmed miRNAs (387) Annotated miRNAs (407) Dicer-dependent (7, 3) Not strongly dependent (3, 3) miRNA candidates not confidently confirmed (20) Cannot determine (85, 49) DGCR8 & Dicer-dependent (37, 0) Total candidates (736) Novel miRNAs (108) Dicer-dependent (1, 0) Not strongly dependent (1, 0) Cannot determine (69, 15) New candidates (329) DGCR8 & Dicer-dependent (45, 0) miRNA candidates (221) Dicer-dependent (5, 0) Not strongly dependent (42, 8) Cannot determine (129, 9) Supplementary Figure 1. Mouse miRNA candidates identified by Illumina sequencing. MicroRNAs that are annotated in miRBase v.14.0 are boxed in green. The miRNA hairpin loci were further categorized by DGCR8- and Dicer-dependency using sequencing data from wild-type an mutant ES cells (Babiarz et al. 2008). The number in parenthesis is the total number of loci in the category. If followed by another number, the second number is the number of conserved loci. A candidate was considered DGCR8- and Dicer-dependent using criteria of a previous study (Babiarz et al. 2008), except that predicted hairpin loci replaced the 100-nt windows, with the read cutoffs scaled to the hairpin lengths. 254 Chiang135681_FigureS3 hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-193b hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141 cand142 cand181 cand316 mmu-mir-122 mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-105 mmu-mir-207 mmu-mir-220 mmu-mir-327 mmu-mir-343 mmu-mir-453 mmu-mir-568 mmu-mir-654 mmu-mir-678 mmu-mir-680-3 mmu-mir-687 mmu-mir-697 mmu-mir-698 mmu-mir-717 mmu-mir-719 mmu-mir-761 mmu-mir-882 mmu-mir-682 mmu-mir-690 mmu-mir-707 mmu-mir-763 mmu-mirc-niob-MM_28 mmu-mirc-niob-MM_57 mmu-mirc-niob-MM_76 mmu-mirc-niob-MM_155 mmu-mirc-niob-MM_185 mmu-mirc-niob-MM_227 mmu-mirc-niob-MM_290 mmu-mirc-niob-MM_298 MIR90 MIR103 MIR146 MIR165 MIR170 MIR174 MIR181 MIR192 MIR213 MIR223 MIR237 MIR252 ≤1 * * * * * Human miRNA controls * * * * * * * * Lim 2003 Berezikov 2005 * * * * * * * * * Mouse miRNA controls * * Not sequenced Not enough reads Berezikov 2006b Xie 2005 10 100 1000 10000 100000 Reads No hairpin plasmid Hairpin plasmid Supplemental Figure S3. Ectopic-expression assay evaluating unconfirmed annotated miRNAs and predicted miRNAs. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figure S4 to produce Figure 2C and D. 255 Chiang135681_FigureS4 * hsa-mir-193b * mmu-mir-122 Human miRNA control * mmu-mir-133a-1 * mmu-mir-137 * mmu-mir-138-1 * * mmu-mir-139 mmu-mir-153 Mouse miRNA controls * mmu-mir-208a * mmu-mir-216a * mmu-mir-217 * mmu-mir-223 * mmu-mir-224 * mmu-mir-375 mmu-mir-599 mmu-mir-669i * mmu-mir-684-1 mmu-mir-684-2 mmu-mir-685 mmu-mir-688 mmu-mir-690 * mmu-mir-693 mmu-mir-704 mmu-mir-705 mmu-mir-707 mmu-mir-763 mmu-mir-1187 Not enough reads mmu-mir-1192 mmu-mir-1894 mmu-mir-1903 mmu-mir-1904 mmu-mir-1907 mmu-mir-1927 * * mmu-mir-1929 mmu-mir-1936 mmu-mir-1937c mmu-mir-1940 mmu-mir-1959 mmu-mir-1960 mmu-mir-1966 mmu-mir-1970 * mmu-mir-184 mmu-mir-297a-6 mmu-mir-466f-4 * mmu-mir-489 mmu-mir-1191 mmu-mir-1953 No miRNA* * * mmu-mir-1969 mmu-mir-449c * mmu-mir-677 mmu-mir-1944 ≤1 Incorrect miRNA* 10 100 1000 10000 100000 Reads No hairpin plasmid Hairpin plasmid Supplemental Figure S4. Ectopic-expression assay evaluating unconfirmed annotated miRNAs. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figure S3 to produce Figure 2C and D. 256 Chiang135681_FigureS5 * hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-193b hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141 cand142 cand181 cand316 mmu-mir-122 mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-1941 mmu-mir-1964 mmu-mir-1968 mmu-mir-1912 mmu-mir-3061 mmu-mir-3072 mmu-mir-3073 mmu-mir-3075 mmu-mir-3081 mmu-mir-3089 mmu-mir-3090 mmu-mir-3093 mmu-mir-3095 mmu-mir-3108 mmu-mir-3109 mmu-mir-3110 mmu-mir-344f mmu-mir-3104 noStar-014 noStar-033 noStar-043 noStar-073 noStar-080 noStar-087 noStar-117 noStar-135 noStar-150 noStar-154 noStar-166 wrongStar-016 noStar-149 ≤1 * * * Human miRNA controls * * * * * * * * Lim 2003 Berezikov 2005 * * * * * * * * * * * Mouse miRNA controls * * * * * * * * * Novel miRNAs * * * * * Novel shRNAs * DGCR8- & DCR-dependent candidates Other candidate 10 100 1000 10000 100000 Reads No hairpin plasmid Hairpin plasmid Supplemental Figure S5. Ectopic-expression assay evaluating predicted miRNAs, novel miRNAs, and miRNA candidates. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figures S6 and S7 to produce Figure 3B. 257 Chiang135681_FigureS6 * hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-193b hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141 cand142 cand181 cand316 mmu-mir-122 mmu-mir-133a-1 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-1188 mmu-mir-1197 mmu-mir-1933 mmu-mir-1947 mmu-mir-1224 mmu-mir-1839 mmu-mir-509 mmu-mir-3059 mmu-mir-3063 mmu-mir-3065 mmu-mir-3067 mmu-mir-3079 mmu-mir-3086 mmu-mir-3091 mmu-mir-3100 mmu-mir-3112 mmu-mir-344e mmu-mir-3111 noStar-046 noStar-148 wrongStar-017 noStar-020 noStar-034 noStar-054 noStar-056 noStar-068 noStar-093 noStar-122 noStar-126 noStar-160 wrongStar-002 wrongStar-007 wrongStar-009 ≤1 * * * * Human miRNA controls * * * * * * * * Lim 2003 Berezikov 2005 * * * * * * * * * * * Noncanonical controls Early miRBase Novel miRNA * * Mouse miRNA controls * * * * * * * Rare novel miRNAs * * Novel shRNAs Conserved candidates Other candidates * 10 100 1000 10000 100000 Reads No hairpin plasmid Hairpin plasmid Supplemental Figure S6. Ectopic-expression assay evaluating novel miRNAs, miRNA candidates, predicted miRNAs, and an unconfirmed annotated miRNA (mmu-mir-509). Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figures S5 and S7 to produce Figure 3B. 258 Chiang135681_FigureS7 Human miRNA control Lim 2003 Mouse miRNA controls Noncanonical controls Early miRBase Novel miRNAs Novel rare miRNAs Novel shRNAs Candidates hsa-mir-193b hsa-mir-220a mmu-mir-122 mmu-mir-133a-2 mmu-mir-137 mmu-mir-138-1 mmu-mir-139 mmu-mir-153 mmu-mir-208a mmu-mir-216a mmu-mir-217 mmu-mir-223 mmu-mir-224 mmu-mir-375 mmu-mir-1933 mmu-mir-1941 mmu-mir-1947 mmu-mir-1964 mmu-mir-1968 mmu-mir-1224 mmu-mir-1839 mmu-mir-509 mmu-mir-1912 mmu-mir-3059 mmu-mir-3061 mmu-mir-3072 mmu-mir-3073 mmu-mir-3075 mmu-mir-3081 mmu-mir-3090 mmu-mir-3095 mmu-mir-3108 mmu-mir-3109 mmu-mir-3110 mmu-mir-3063 mmu-mir-3065 mmu-mir-3079 mmu-mir-3086 mmu-mir-3091 mmu-mir-344e mmu-mir-344f noStar-020 noStar-056 noStar-122 noStar-148 wrongStar-002 wrongStar-009 <=1 10 No hairpin plasmid + no TNdrosha/TNdicer plasmid Hairpin plasmid + no TNdrosha/TNdicer plasmid 100 Reads 1000 No hairpin plasmid + TNdrosha plasmid Hairpin plasmid + TNdrosha plasmid 10000 100000 No hairpin plasmid + TNdicer plasmid Hairpin plasmid + TNdicer plasmid Supplemental Figure S7. Drosha/Dicer-dependent biogenesis of novel miRNAs. The selected hairpins were transfected into HEK293T with a control vector (blue), TNdrosha (red), or TNdicer (green). Similar transfections using the control vector instead of the hairpins are shown in light blue, orange, and light green, respectively. Results of this experiment were compiled with those of Supplemental Figures S5 and S6 to produce Figure 3B. 259 Chiang135681_FigureS8 1 0.8 0.6 Correlation coefficient 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 10 102 103 104 105 106 107 108 109 Genomic distance Not clustered Clustered Supplemental Figure S8. Correlation of expression and genomic distance. The correlation of expression with clustering was calculated as previously (Baskerville and Bartel 2005), except that miRNAs that mapped to the same pre-mRNA transcript were considered clustered regardless of genomic distance. The clustered miRNAs (red) were more correlated than non-clustered miRNAs (blue). Some miRNA pairs more than 50,000 nt apart were categorized as clustered with each other due to joint proximity to intervening miRNAs, and their correlated expressions supported this clustering method. Other miRNAs that are within 50,000 nt of each other were not considered clustered because one mapped within a pre-mRNA, whereas the other one did not; each of these three pairs of miRNAs were not correlated in expression. Correlated expression observed for many miRNAs located ~130,000 nt apart was due to likely co-expression of two megaclusters on chr12. 260 Chiang135681_FigureS9 A B 12,000,000 350 300 10,000,000 250 miRNAs Reads 8,000,000 6,000,000 200 150 4,000,000 100 2,000,000 50 0 16 17 18 19 20 21 22 23 24 25 26 0 27 miRNA length Conserved 18 19 20 21 22 23 24 25 26 miRNA length Nonconserved Conserved Nonconserved Supplemental Figure S9. The distribution of lengths of conserved (red) and nonconserved (blue) mature miRNAs. (A) Size distribution plotted in terms of number of normalized reads. (B) Size distribution plotted in terms of the dominant read length for each miRNA. 261 Chiang135681_FigureS10 A UCAGUUG UCAGUUC 0.50 8mer 7mer-m8 7mer-A1 6mer No Site 0.25 -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) 0.75 0.50 8mer 7mer-m8 7mer-A1 6mer No Site 0.25 0.00 -0.75 0.75 Cummulative Fraction 0.75 0.00 -0.75 UCAGUUA 1.00 1.00 Cummulative Fraction Cummulative Fraction 1.00 -0.5 0.5 -0.25 0.0 0.25 Fold Change (log2) 0.75 0.50 0.25 0.00 -0.75 0.75 8mer 7mer-m8 7mer-A1 6mer No Site -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) 0.75 B AAUGCUU AAUGCUG 0.50 8mer 7mer-m8 7mer-A1 6mer No Site 0.25 -0.5 -0.25 0.0 0.25 0.5 Fold Change (log2) 0.75 1.00 Cummulative Fraction Cummulative Fraction Cummulative Fraction 0.75 0.00 -0.75 AAUGCUC 1.00 1.00 0.75 0.50 8mer 7mer-m8 7mer-A1 6mer No Site 0.25 0.00 -0.75 -0.5 0.5 -0.25 0.0 0.25 Fold Change (log2) 0.75 0.75 0.50 8mer 7mer-m8 7mer-A1 6mer No Site 0.25 0.00 -0.75 -0.5 0.5 -0.25 0.0 0.25 Fold Change (log2) 0.75 Supplemental Figure S10. Controls to ensure that observed mRNA derepression attributed to the minor isoform was not due to overlap of its sites with offset 6mer sites of the major isoform. (A) Lack of statistically significant derepression by the three control motifs that differed from the miR-223 minor site by a single nt at position 8. (B) Same as in A except for the miR-155 minor site. The mRNAs with 8mer and 7mer-A1 sites for the minor isoform were excluded from the analysis because these sites overlapped with 7mer-m8 sites for the major isoform. 262 Chiang135681_FigureS11 Distance from 3’ end of read Mismatch type T>C C>T (1.2%) C>G C>A One mismatch (12%) Distance from 3’ end of read T>C T>G T>A 1-2 nt (2.0%) Perfect match (86%) Brain miRNA-matching sequences Mismatch type 1-2 nt (13%) Perfect match (80%) G>T One mismatch (16%) G>T >2 nt (9.6%) G>C G>A A>T G>C G>A >2 nt (4.0%) A>T A>C A>G (0.32%) Three mismatches (0.92%) Two mismatches (2.0%) C Significant mismatch events Thresholds Fraction edited: >5% Edited reads: >10 250 A>G C>T All others 100 40 50 30 20 0 10 0 Single mismatches Read filters miRNA mature or * Brain – mir-381 chr12(+): 110965025 - 110965112 GTTTGGTACTTAAAGCGAGGTTGCCCTTTGTATATTCGGTTTATTGACATGGAATATACAAGGGCAAGCTCTCTGTGAGTATCAAACC ((((((((((((.((.(((.((((((((.((((((((.(((....)))...)))))))))))))))).))).)).)))))))))))). .............AGCGAGGTTGCCCTTTGTAA....................................................... .............GGCGAGGTTGCCCTTTGTATATT.................................................... .....................................................CTATACAAGGGCAAGCTCTCTGT............ .....................................................ATATACAAGGGCAAGCTCTCTGA............ ......................................................TATACAAGGGCAAGCTCTCTGC............ .......................................................ATGCAAGGGCAAGCTCTCTGT............ ......................................................TATACAAGGGCACGCTCTCTGT............ ......................................................TATACAAGGGCAAGCTCTCTGTT........... ......................................................TATACAAGGGCAAGCTCTCTGA............ ......................................................TATACAGGGGCAAGCTCTCTGT............ ......................................................TATACAAGGGCAAGCTCTCTGTA........... ......................................................TATGCAAGGGCAAGCTCTCTGT............ .............AGCGAGGTTGCCCTTTGTA........................................................ .............AGCGAGGTTGCCCTTTGTAT....................................................... .............AGCGAGGTTGCCCTTTGTATA...................................................... .............AGCGAGGTTGCCCTTTGTATAT..................................................... .............AGCGAGGTTGCCCTTTGTATATT.................................................... .............AGCGAGGTTGCCCTTTGTATATTC................................................... .....................................................ATATACAAGGGCAAGCTCTC............... .....................................................ATATACAAGGGCAAGCTCTCT.............. .....................................................ATATACAAGGGCAAGCTCTCTG............. .....................................................ATATACAAGGGCAAGCTCTCTGT............ ......................................................TATACAAGGGCAAGCTCTC............... ......................................................TATACAAGGGCAAGCTCTCT.............. ......................................................TATACAAGGGCAAGCTCTCTG............. ......................................................TATACAAGGGCAAGCTCTCTGT............ .......................................................ATACAAGGGCAAGCTCTCTG............. .......................................................ATACAAGGGCAAGCTCTCTGT............ 150 Unique alignments A>G (0.61%) 217 Sequences mapped (sequences with at least 5 reads shown) 200 Genome matching A>C Three mismatches (0.65%) Two mismatches (2.9%) B 300 T>G T>A C>T (0.26%) C>G C>A >2 nt from 3’ end Event filter 1mm 1mm 1mm 1mm 1mm 1mm 1mm 1mm 1mm 1mm 1mm 1mm Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Perf. Reads Spiked-in sequence controls Mismatches A 5 7 31 36 37 39 41 100 200 236 260 2054 9 43 19 57 96 32 5 90 57 212 33 407 547 11639 15 239 150 120 90 60 30 miR-337 3p arm (star strand) 24 32 40 p = 3.27e-13 Edited reads Edited reads 16 3000 2500 2000 1500 1000 500 miR-411 5p arm (star strand) GAGATAGTAGACCGTATAGCGTACG 0 CCATTCAGCTCCTATATGATGCCTTT 8 3500 Perfect match reads 180 p < 2.2e-16 300 600 900 1200 900 750 600 450 300 miR-376a 150 5p arm (star strand) AAAAGGTAGATTCTCCTTCTATGAGT Edited reads 210 Perfect match reads D Perfect match reads A>G Rate: 0.125 70 p < 2.2e-16 140 210 280 350 1500 Supplemental Figure S11. RNA editing. (A) An overview of mismatches from the sequences indicated. In the two spiked-in synthetic RNAs of known sequence, mismatches were distributed throughout the length of the sequence, with no preference for A-to-G mismatches. In miRNA-mapping small RNA sequences from brain, mismatches were concentrated in the last 2 nt of the read, probably due to cellular terminal-transferase activity. (B) Loss of most mismatch events after applying filters expected to distinguish editing events from background. Mismatch events were considered significant if a position had at least 10 reads corresponding to a particular mismatch, and these reads accounted for at least 5% of reads covering that position. As successive filters were applied to the genome-mapping reads, the number of significant A-to-G mismatch events remained relatively unaffected, whereas nearly all other mismatch events were eliminated. In particular, C-to-T mismatches were mostly eliminated, indicating that C-to-U RNA editing does not occur to any significant degree in miRNAs. A-to-G mismatch events that passed all filters were considered editing candidates and manually examined to see if other plausible models could explain the mismatches. (C) A display of most abundant perfectly-matching and single-mismatch reads from the mmu-mir-381 locus illustrates that inferred A-to-I editing accounts for essentially all mismatches at the edited position, and the great majority of all mismatched reads mapping to the miRNA or miRNA*. An analogous pattern was found for all 16 miRNAs that passed filters and manual validation. (D) Editing of a miRNA or miRNA* was associated with significantly altered 5' end specificity. In the cases of mmu-mir-337 and mmu-mir-411, edited reads had more homogeneous 5' ends than unedited reads. 263