Computational and experimental analysis of plant microRNAs by Matthew W. Jones-Rhoades B.A., Chemistry (1999) Grinnell College Submitted to the Department of Biology in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy in Biology MASSACHUSETTS INSTATE at the OF TECHNOLOGY Massachusetts Institute of Technology MAY 2 7 2005 June 2005 LIBRARIES © 2005 Massachusetts Institute of Technology All rights reserved Signature of Author .................................... ................. . . ................. Department of Biology May 20, 2005 '"---.~ x.. Certified by ......... · .................................................... David P. Bartel Professor of Biology Thesis Supervisor Acceptedby ... ......... StephenP. Bell Professor of Biology Chairman, Graduate Student Committee ,' tti v ;S Computational and experimental analysis of plant microRNAs Matthew W. Jones-Rhoades Submitted to the Department of Biology on May 20, 2005 in Partial Fulfillment of the Requirement for the Degree of Doctor of Philosophy in Biology ABSTRACT MicroRNAs (miRNAs) are small, endogenous, non-coding RNAs that mediate gene regulation in plants and animals. We demonstrated that Arabidopsis thaliana miRNAs are highly complementary (0-3 mispairs in an ungapped alignment) to more mRNAs than would be expected by chance. These mRNAs are therefore putative regulatory targets of their complementary miRNAs. Many miRNA complementary sites are conserved to the monocot Oryza sativa (rice), implying evolutionary conservation based on function at the nucleotide level. The majority of predicted miRNA targets encode for transcription factors and other proteins with known or inferred roles in developmental patterning, implying that the miRNAs themselves are high-level regulators of development. Our findings indicated that miRNAs are key components of numerous regulatory circuits in plants and set the stage for numerous additional experiments to investigate in depth the significance of miRNA-mediated regulation for particular target families and genes. We developed a comparative genomics approach to identify miRNAs and miRNA targets conserved between Arabidopsis and Oryza. Seven previously unknown miRNAs families were experimentally verified, bringing the total number of known miRNA genes in Arabidopsis to 92, representing 22 families. We expanded the range of functionalities known to be regulated by miRNAs to include F-box proteins, laccases, superoxide dismutases, and ATP-sulfurylases. The expression of miR395, which targets sulfate metabolizing enzymes, is induced by sulfatestarvation, demonstrating that miRNA expression can be responsive to growth conditions. We investigated the biological role of miR394-mediated regulation of Atlg27340, an F-box gene of previously unknown function. Transgenic plants expressing a miR394-resistant version of Atlg27340 displayed a range of developmental abnormalities, including radialized and fused cotyledons, absent shoot apical meristems, curled and radialized leaves, and abortive flowers. The severity of these abnormalities correlated with the overaccumulation of Atlg27340 mRNA. These findings confirm the biological relevance of the interaction between miR394 and Atlg27340, and represent the first insights into the roles of miRNA-mediated regulation of F-box genes. Our results establish that both MIR394 and Atlg27340 are important regulators of meristem identity, and suggest that Atlg27340 targets an activator of class III HD-ZIP function for ubiquitination and proteolysis. Thesis Supervisor: David Bartel Title: Professor of Biology 2 Acknowledgements I would like to thank David Bartel for being a tremendous advisor, role model, and friend throughout my time at MIT. I would like to thank all the past and present members of the Bartel lab, especially Mike Axtell, Scott Baskerville, Nelson Lau, Ben Lewis, Lee Lim, Allison Mallory, Ramya Rajagopalan, Brenda Reinhart, I-hung Shih, Herv6 Vaucheret, and Soraya Yekta. You have been a joy to work and play with, and a continual source of reagents, help, and advice. I would also like to thank Bonnie Bartel, without whom the analysis of plant miRNAs would have been ten times harder. I would like to thank my family, especially my parents, for always believing in me and supporting me. Most importantly, I would like to thank my wife Melinda for being my partner and my best friend. Thanks for putting up with all the late nights and odd-hours; you were always there to encourage me when things went well and to give me perspective when they didn't. 3 Table of contents Abstract 2 Acknowledgements 3 Table of contents 4 Introduction 5 Chapter I 42 Prediction of plant microRNA targets ChapterI has beenpublishedpreviouslyas: Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, and Bartel DP, "Predictionof plant microRNAtargets." Cell 110(4):513-520(2002)© CellPress. Chapter II 65 Computational identification of plant microRNAs and their targets, including a stress-induced miRNA ChapterII has beenpublishedpreviouslyas: Jones-RhoadesMW, and BartelDP, "Computationalidentificationof plant microRNAsand theirtargets,includinga stress-inducedmiRNA." MolecularCell 14(6): 787-799 (2004) © Cell Press. Chapter III 101 MicroRNA-mediated regulation of an F-box gene is required for embryonic, floral, and vegetative development Appendix A 120 Arabidopsis, Oryza, and Populus miRNA complementary sites 128 Appendix B Appendix B has been published previously as: ReinhartBJ, WeinsteinEG, RhoadesMW, BartelB, and BartelDP, "MicroRNAs in plants. "Genes & Development 16(13): 1616-1626 © Cold Spring HarborLaboratoryPress. 139 Appendix C Appendix C has been published previously as: Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, and Burge CB, "Prediction of mammalian microRNA targets. " Cell 115(7): 787-798 (2003) © Cell Press. 4 Introduction The biology of multicellular organisms requires a complex network of gene regulatory pathways. MicroRNAs (miRNAs) are key components of this network which had been overlooked until recently. Initially discovered as regulators of developmental timing in C. elegans, miRNAs are now known to serve in a variety of regulatory roles in both plants and animals. The hallmark of a miRNA is a short (-20-24 nt), endogenously expressed non-coding RNA which is processed by RNaseIII proteins such as Dicer from a longer ssRNA precursor that contains a stem-loop secondary structure (reviewed in (11)). MicroRNAs are chemically and functionally similar to short interfering RNAs (siRNAs), which are processed by Dicer from long dsRNA precursor and which are central to the related phenomenon of RNA interference (RNAi), post-transcriptional gene silencing (PTGS), and transcriptional gene silencing (TGS). Mature miRNAs are incorporated into RNAi-induced silencing complexes (RISCs), in which the miRNA guides repression of target genes. Although miRNAs are deeply conserved with both the plant and animal kingdoms, there are substantial differences in the mechanism and scope of miRNA-mediated gene regulation between the two kingdoms, several of which have been instrumental in the rapid increase in our understanding of plant miRNA biology. Plant miRNAs are highly complementary to conserved target mRNAs, a fact which has allowed for the rapid and confident bioinformatic identification of plant miRNA targets (46, 113). Plant miRNAs guide the cleavage of their complementary mRNA targets, an activity which is readily assayed in vitro and in vivo (49, 76, 126). In addition, Arabidopsis is a genetically tractable model organism, which has enabled the study of the genetic pathways which underlie miRNA-mediated regulation and the phenotypic consequences of perturbing miRNA-mediated gene regulation. The picture emerging from this recent research is that plant miRNAs are master regulators of genetic pathways: the majority of genes regulated by plant miRNAs are themselves regulators such as transcription factors, F-box proteins, and RNAi related proteins. MicroRNAs: like siRNAs, but different Before discussing miRNAs, it is useful to consider a highly similar class of small RNAs, the small interfering RNAs (siRNAs). In Arabidopsis, siRNAs are the majority of small RNAs (75, 112, 124, 138), and have been implicated in a variety of pathways, including defense against 5 viruses, the establishment of heterochromatin, silencing of transposons and transgenes, and the post-transcriptional regulation of mRNAs (reviewed in (13)). MicroRNAs and siRNAs have much in common; both types of small RNAs are 20-24 nucleotides long, and both are processed from longer RNA precursors by Dicer ribonucleases (15, 38, 43, 50). Both are incorporated into ribonucleoprotein (RNP) complexes in which the small RNAs, through their base pairing potential, guide repression of target genes, and, as discussed below, the mechanisms by which they repress target genes are also similar. The fundamental difference between the two classes, then, is nature of their precursors; siRNAs are processed from long, dsRNAs, whereas miRNAs are processed from RNAs that are single-stranded but contain imperfect stem-loop secondary structures. In addition, there are a number of general, if not absolute, characteristics that set miRNAs apart from siRNAs. Many miRNAs are conserved between related organisms, whereas most endogenously expressed siRNAs are not(57, 61, 63, 112). Many (but not all) siRNAs target the gene from which they are derived. In contrast, a miRNA regulates genes unrelated to the locus from which the miRNA was derived. In addition, although the proteins required for siRNA and miRNA biogenesis are overlapping, in many organisms, including Arabidopsis, the genetic requirements for miRNA and siRNA function are partially distinct. For example, many Arabidopsis siRNAs require RNA-dependent RNA polymerases (RDRPs) for their biogenesis, whereas miRNAs do not (14, 27, 95, 138). Conversely, most Arabidopsis miRNAs require processing by DICER-LIKE1 (DCL1), one of four dicer-like genes in Arabidopsis, whereas many siRNAs require DICERLIKE3 (DCL3) (35, 56, 112, 138). MicroRNA Biogenesis Like other types of cellular RNAs, miRNAs must be properly processed and localized in order to function. The steps through which a plant miRNA must pass include 1) transcription, 2) processing into a miRNA/miRNA* duplex, 3) covalent modification, 4) export from the nucleus, and 5) selective incorporation of the miRNA into RISC (Figure 1). Transcription of microRNAs In most cases, miRNAs have been initially discovered as the mature, 20-24 nucleotide form. Presumably, these mature miRNAs are initially transcribed as part of longer transcripts that must minimally include enough additional sequence to generate the stem-loop structures (typically -60-300 nucleotides in plants) that are recognized by Dicer. In several cases, miRNA 6 stem-loops have been shown to be contained within much longer transcripts, termed primary miRNAs (pri-miRNAs). The overexpression of 0.5 kb and 1.4 kb transcripts that contain miR319 and miR172 stem-loops, respectively, correlate with overaccumulation of the mature miRNA (7, 100). miR163 is contained within a 0.7 kb transcript that can be processed into mature miR163 (56). In addition, numerous miRNA precursors are found within ESTs from various plant species that contain additional sequence outside of the stem-loop (7, 46, 100). At least some of these longer pri-miRNA transcripts are spliced and appear to be poly-adenylated (7, 56). Indeed, two rice miRNAs are contained within transcripts that contain exon junctions within the presumptive stem-loop precursor, implying that in these cases splicing is a necessary prerequisite for recognition by Dicer (123). Because plant miRNAs are primarily found in genomic regions not associated with protein coding genes (112), it appears that most miRNA genes are their own transcriptional units. The fact that plant pri-miRNAs can be over 1 kb long, along with the fact that they can undergo canonical splicing and polyadenylation, strongly suggests that RNA polymerase II is responsible for transcribing most plant miRNAs, as has been shown to be the case for several animal miRNAs (66). Relatively little is known about the promoters of plant miRNAs or the regulation of miRNA transcription in plants. MicroRNA processing and export A central step in the maturation of miRNAs is the excision of the mature miRNA from the pri-miRNA by RNaseIII-type endonucleases such as Dicer. Although the observed sizes of Dicer products in plants range from around 20-25 nucleotides (126, 138), the plant miRNAs are primarily 20-21 nt in length (112). In contrast, 24mers are most abundant in the population of siRNAs cloned from Arabidopsis (126, 138). It has been suggested that different Dicer activities are responsible for the different sizes of small RNAs observed in Arabidopsis (126). This idea fits with genetic data, which suggests that the four Dicer-like genes in Arabidopsis have functionally distinct roles. DICER-LIKE3 (DCL3) and DICER-LIKE2 (DCL2) process certain endogenous siRNAs and viral derived siRNAs, respectively, but each is dispensable for miRNA accumulation (138). In contrast, partial loss-of-function alleles of DICER-LIKEI (DCLI) result in reduced accumulation of miRNAs and trans-acting siRNAs, without any obvious effect on the accumulation or function of various other classes of siRNAs (35, 112, 138) (131). 7 In animals, miRNAs are processed in a stepwise manner. A nuclear localized RNaseIII enzyme known as Drosha makes the initial cuts (one on each arm of the stem-loop) in the pri- miRNAs to liberate the miRNA stem-loop, the "pre-miRNA" from the flanking sequence of the pri-miRNA(65). After export to the cytoplasm, Dicer makes a second set of cuts, separating the miRNA, duplexed with its near reverse complement, the miRNA*, from the loop region of the pre-miRNA (65). The resulting miRNA/miRNA* duplex has two-nucleotide 3' overhangs, similar to the siRNA duplexes produced by Dicer from long double-stranded RNA (15, 31, 32, 55). The situation in plants appears to be somewhat different, as plants contain no clear ortholog to Drosha. Whereas most animal Dicers are thought to be localized to the cytoplasm (65), in Arabidopsis DCL1 is localized to the nucleus, and miRNA/miRNA* duplexes are excised from the pri-miRNAs within the nucleus (101, 103, 138). RNAs corresponding to the pre-miRNAs of animals are rarely, if ever, detected in plants (112), and it seems likely that both sets of cleavage events happen in rapid succession. It is uncertain if DCL1 makes both sets of cuts, or if additional nucleases are also involved. One key difference between the biogenesis of siRNAs and miRNA, other than the nature of the precursor, is the number of small RNA species produced per precursor. A single long, double-stranded RNA precursor can be processed into multiple siRNA duplexes by Dicer (Figure ib) (106, 131, 144). However, cloning and expression data show that a miRNA precursor produces predominately a single small RNA species, the mature miRNA (57, 61, 64, 71, 112, 1.24). Although there is some heterogeneity at the 5' and 3' ends of plant miRNAs, it is clear that DCL1 cuts preferentially at specific positions in the miRNA stem-loop precursor that result in the accumulation the appropriate mature miRNA (112). The mechanism by which DCL1 knows where to cut is largely a mystery, although there is evidence for the involvement of dsRNAbinding domain of DCL1. The dcll-9 allele, which disrupts the dsRNA-binding domain, cuts the miR163 stem-loop at aberrant positions (56). In addition to DCL1, several other genes have been shown genetically to be involved in miRNA biogenesis. Mutations in HYPONASTIC LEAVES1 (HYL1) or HUA ENHANCER1 (HEN1) result in reduced miRNA accumulation and function (18, 41, 104, 130, 138). HYL1 contains a NLS and a dsRNA binding domain, and has some homology to R2D2 in Drosophila and RDE-4 in C. elegans, proteins that are thought to function together with Dicer to load 8 siRNAs into the RISC (74, 125). HEN1 contains a methyltransferase domain, and is capable of methylating miRNA/miRNA* duplexes in vitro (143). Endogenous miRNAs are methylated on either the 2' or 3' ribose hydroxyl group of the 3' nucleotide in wild-type plants, but not in henl mutants (143). The function of this miRNA methylation is a mystery, and it remains possible that HEN1 may have additional activities that are important for miRNA biogenesis. In plants, it appears that most, if not all, processing and modification of miRNAs takes place in the nucleus. However, the majority of mature miRNAs are located in the cytoplasm (103), suggesting that a pathway exists for miRNA export. One component of this pathway is HASTY (HST), a member of the importin P family of nucleocytoplasmic transporters. hst mutants have reduced accumulation of most, but not all miRNAs, suggesting that HST is an important part of the miRNA export pathway, but that other components also exist (103). A similar pathway exists in animals; Exportin-5, the mammalian ortholog of HST, exports premiRNA hairpins from the nucleus to the cytoplasm (78, 142). As pre-miRNAs appear to be very short-lived in plants, it is likely that HST transports either miRNA/miRNA* duplexes or singlestranded miRNAs after they are fully excised by DCL1. Northern blot data suggests that miRNAs are primarily single-stranded in the nucleus (103), suggesting either that a fraction of functional miRNAs are located within the nucleus or that miRNAs are aleady single standed before transported to the cytoplasm by HST. It is unknown whether plant miRNAs are already associated with components of RISC when transported to the cytoplasm, or if loading into RISC takes place after transport. MicroRNA incorporation into RISC MicroRNAs are processed from their pri-miRNA precursors as duplexes with their miRNA* sequences. However, cloning and expression data indicate that the miRNA strand of this duplex accumulates at much higher levels in vivo than does the miRNA* (71, 112). This asymmetry of accumulation is achieved by the preferential loading of the miRNA strand into RISC, where it is presumably protected from degradation, whereas the miRNA* strand is preferentially excluded from RISC and consequentially subject to degradation. The key insight into understanding this asymmetry of RISC loading came from bioinformatic and biochemical studies of functional siRNA duplexes: the strand of siRNA duplex with less energetically strong pairing at its 5' end is selectively loaded into RISC, where it is competent to guide silencing, while the strand with the less stable 5' end is excluded from RISC (51, 116). Most 9 miRNA/miRNA* duplexes appear to have this energetic asymmetry; the 5' ends of most miRNAs are less stably paired than are the 5' ends of the corresponding miRNA*s (51, 116). The exact mechanism by which siRNA and miRNA duplexes are unwound and asymmetrically incorporated into RISC are still only partially understood, but they appear to involve R2D2-like proteins and perhaps an unidentified RNA helicase (reviewed in (128)). The final product of the miRNA/siRNA biogenesis pathway is a single-stranded RNA incorporated into a RNP complex. There are several varieties of these RNP complexes that vary at least partially in their composition and function; a RNP that mediates RNA cleavage and PTGS is generally referred to as a RISC, whereas a RNP that mediates chromatin modification and TGS is referred to as a RNAi-induced transcriptional silencing (RITS) complex. A central component of all these RNPs is a member of the Argonaute family of proteins. Argonaute proteins, which have been implicated in a broad range of RNAi-related mechanisms, contain two conserved domains, known as the PAZ and PIWI domains (21). The PAZ domain appears to be an RNA-binding domain (72, 118, 140), and the PIWI domain has structural similarity to RNase H enzymes (73, 119). Many organisms contain multiple members of the Argonaute family; in some of these cases, there is evidence for functional specificity of the different Argonautes. For example, only one of four mammalian Argonautes, Ago2, is capable of mediating RNA cleavage (73). Arabidopsis contains ten Argonaute proteins, four of which have been investigated experimentally. AGO4 is involved in the methylation of DNA associated with transposons and inverted-repeat transgenes (146, 147). PNH/ZLL/AGO1Oand ZIP/AGO7 are required for proper development, but the mechanism by which they act is not known (42, 79, 96, 97). Only one Argonuate gene, AGOI, has thus far been shown to be required for miRNA function in Arabidopsis. agol mutants have elevated levels of miRNA targets, consistent with AGO1 being needed for miRNA function (129). A null allele of AGO1 also shows a sharp decrease in accumulation of most miRNAs compared to wild-type (129). Although this reduction in miRNA levels may stem from AGO1 playing an early role in miRNA processing, it may also be due the loss of the RISC complexes needed to bind, and thus stabilize, the processed miRNAs. Mechanisms of miRNA function There are three basic mechanisms by which Dicer-produced small RNAs have been shown to regulate gene expression: RNA cleavage, translational repression, and transcriptional silencing. 10 MicroRNA-mediated RNA cleavage Directed RNA cleavage is perhaps the best studied mechanism by which small RNAs regulate gene expression. In this mechanism, an siRNA/miRNA guides RISC to cleave a single phosphodiester bond within a complementary RNA molecule. This so called "slicer" activity is thought to reside in the PIWI domains of certain Argonaute proteins (73, 119). Several lines of evidence indicate that plant miRNAs act to guide the cleavage of complementary mRNAs. MicroRNA-guided slicer activity is present in wheat germ lysate (126). MicroRNA targets are generally expressed at higher levels in plants that have impaired miRNA function as the result of either mutations in the miRNA pathway (e.g. henl, agol, and hyll) (18, 129, 130) or the expression of certain viral suppressors of RNA silencing (22, 24, 30, 49, 82), implying that miRNAs negatively regulate the stability of their targets. Moreover, the 3' cleavage products of many miRNA targets can be detected in vivo, either by Northern blot (49, 76, 80, 120) or by 5' RACE (46, 49, 76, 80, 81, 83, 100, 123, 139). MicroRNA-mediated translational repression The first miRNAs to be identified, the lin-4 and let-7 RNAs, regulate the expression of heterochronic genes that are critical for the timing of certain cell divisions during larval development of C. elegans (40, 64, 92, 111, 117, 136). However, the induction of these miRNAs at specified points in development does not greatly affect the mRNA levels of their targets, but rather the amount of protein produced from the targeted mRNAs (40, 117, 136). The exact mechanism by which this occurs is unclear, but it appears that functional translation of the targeted mRNAs is inhibited at some point after the initiation of translation (99). It is thought that this mode of target regulation is utilized by the majority of animal miRNAs. What determines whether a small RNA will guide the cleavage of its target, as opposed to directing its target for translational repression? To a certain extent, the outcome seems to depend on the degree of complementarity between the guide RNA and the target. An siRNA or miRNA that is perfectly complementary to a target RNA will generally lead to cleavage, whereas less perfect complementarity is generally associated with translational repression (28, 29). Indeed, the same small RNAs are capable of carrying out either mechanism. In mammalian cell culture, exogenous siRNAs which are competent to direct cleavage when presented with fully complementary targets can repress the translation of other targets which contain multiple, imperfectly complementary sites (28, 29). Conversely, the let-7 miRNA from Drosophila, which 11 is presumed to regulate its endogenous targets through translational repression, can guide cleavage of perfectly complementary RNAs in vitro (44). To a large extent, then, the tendency of plant miRNAs to cleave their targets is probably due to the fact that they are highly, sometimes perfectly, complementary to them, whereas few mRNAs have extensive complementarity to animal miRNAs. However, there are exceptions; miR-196 guides the cleavage of the highly complementary HoxB8 mRNA (84, 141). Furthermore, the expression of either miR-1 or miR-124 in HeLa cells slightly reduces the levels of over 100 mRNAs with complementarity to the 5' portion of the miRNA (70). It in unclear if these mRNAs are cleaved by RISC, albeit at a low efficiency, or if miRNA/RISC binding affects mRNA stability through some other mechanism. Conversely, one Arabidopsis miRNA, miR172, appears to effect the accumulation of target protein but not target mRNA, and thus appears to mediate translational repression (7, 25). Small RNA directed transcriptional silencing Sections of transcriptionally silent DNA, known as heterochromatic regions, are associated with certain covalent modifications of DNA and histones. Evidence from several organisms now shows that small RNAs are important for the establishment and/or maintenance of these heterochromatic modifications. In fission yeast, Dicer-produced small RNAs corresponding to heterochromatic repeats have been identified (110), and deletion of Dicer or Argonaute disrupts silencing at heterochromatic regions (133, 134). This transcriptional repression has been shown to involve the RITS complex, which, like the RISC, contains Argonaute and a single-stranded Dicer-produced siRNA, as well as Chpl and Tas3, which are not thought to be present in RISC (93, 98, 132). Small RNAs also guided repressive modifications of DNA and histones in plants (reviewed in (85)). For example, AGO4 is required for siRNA-guided transcriptional silencing of the SUPERMAN gene and the maintenance of transcriptional repression triggered by inverted repeats (146, 147). Do miRNAs guide transcriptional silencing in plants? Recent evidence suggests that they might (10). Dominant mutations within the miR166 complementary sites of the PHABULOSA (PHB) and PHAVOLUTA (PHV) mRNAs result in abnormal leaf development which correlates with a reduction in miR166-guided mRNA cleavage (83, 88). Curiously, these phb and phv mutants also correlate with a reduction of DNA methylation within the coding region of the mutant alleles (10). This reduction of methylation occurs only in cis; in heterozygous plants, 12 only the mutant copy of PHB is affected, whereas the wild-type copy is not (10). Because the miRNA complementary site in these mRNAs spans an exon-junction, miR166 is presumably not able to interact with the genomic DNA, which suggests that interaction between miR166 and the nascent, but spliced, PHB mRNA somehow results in DNA methylation (10). Although intriguing, the functional significance of this change in methylation is not yet clear. While methylated promoter regions are often associated with transcriptional silencing, the observed methylation in PHB and PHV is near the 3' end of the coding regions (10), and it is unknown what effect it is having on PHB or PHV transcription. It is not known if a reduction in miRNA complementarity generally correlates with a reduction in target gene methylation. Discovery of plant microRNAs Discovery of plant miRNAs: Cloning The most direct method of miRNA discovery has been to isolate and clone small cellular RNAs from biological samples. Quite a few groups have used this approach to identify small RNAs in animals, plants, and fungi (57, 61, 63, 71, 75, 89, 104, 110, 112, 123, 124) (58, 59, 94, 107-109, 122). Although the specifics of the protocols used by various groups differ in some details, all essentially involve the isolation of small RNAs, followed by ligation of adaptor oligos, reverse transcription, amplification, and sequencing. Some of these protocols incorporate methods to select for RNAs that are products of Dicer cleavage (i.e. that have a 5' phosphate and 3' hydroxyl) and to concatemerize the short cDNAs so that many can be analyzed in a single sequencing read (61). These cloning methods were first used to identify large numbers of miRNAs in animals (57, 61, 63). An initial round of cloning experiments in Arabidopsis identified nineteen miRNAs, as well as hundreds of endogenous siRNAs (75, 89, 104, 112). Subsequent cloning experiments have expanded our knowledge of both classes of small RNAs in Arabidopsis (124, 138), and more recently, Oryza sativa (rice) (123). The Carrington lab maintains an online database of small RNAs cloned from Arabidopsis (http://asrp.cgrb.oregonstate.edu/db/). Discovery of plant miRNAs: Forward genetics Given the abundance of miRNA genes in plants, and the mounting evidence that they are key regulators of developmental events, it is in some ways surprising that plant miRNAs were not discovered genetically long ago. Although it is something of a mystery as to why more miRNAs have not been identified in genetic screens, there are several notable examples where 13 they have. However, in plants at least, in none of these cases was it realized that miRNAs were involved until after cloning experiments had established that plant genomes contained numerous miRNAs. In a sense, the dominant mutations in the HD-ZIP genes PHB, PHV, and REVOLUTA (REV) in Arabidopsis and ROLLED LEAF1 (RLDI) in maize can be thought of as miRNArelated mutations; all result in adaxialization of leaves and/or vasculature as the result of mutations within miR166 complementary sites (33, 87, 88, 145). At least three miRNA genes, nziR319 (also known as miR-JAW), miR172 (also known as EAT), and miR166 were isolated as dominant overexpressors in enhancer trap screens for mutants with developmental abnormalities (7, 53, 100). To date, only a single loss of function allele at a miRNA gene has been identified genetically in plants; early extra petalsl is caused by the insertion of a transposon 160 b.p. upstream of miR164c, and results in flowers with extra petals (9). The fact that miRNA loss-offunction mutants have been recovered so rarely is perhaps due to redundancy; most miRNAs exist in multigene families that are likely to have overlapping function, buffering against a loss of function at any single miRNA gene. Discovery of plant miRNAs: Bioinformatics In both plants and animals, cloning has been the initial means of large-scale miRNA discovery. However, cloning is biased towards RNAs that are highly and broadly expressed. MicroRNAs that are expressed at low levels, or that are expressed only in specific cell types or in response to certain environmental stimuli, will be relatively difficult to clone. Any sequence specific biases in the cloning procedure might also cause certain miRNAs to be missed. Because of these limitations, bioinformatic approaches to identify miRNAs have been useful as a complement to cloning. A relatively straightforward use of bioinformatics has been to find homologs of cloned miRNAs, both within the same genome and in the genomes of other species (57, 61, 63, 105). A more difficult challenge is to identify miRNAs unrelated to previously known miRNAs. This was first done for animal miRNAs, using algorithms that search for conservation of sequence and secondary structure (i.e. miRNA stem-loop precursors) between animal species in patterns that are characteristic of miRNAs (6, 37, 60, 69, 71). Although these methods succeeded in identifying numerous potential animal miRNAs, many of which were subsequently confirmed experimentally, they are not directly useful in finding plant miRNAs because of the longer and more heterogeneous secondary structures of plant miRNA stem-loops. 14 To address this problem, several groups have devised bioinformatic approaches specific to the identification of plant miRNAs (2, 17, 46, 135). Like the algorithms for the identification of animal miRNAs, these approaches all use conservation of secondary structure as a filter, but are necessarily more relaxed in terms of the allowed structures. Some of these approaches take advantage of the high complementarity of plant miRNAs to target mRNAs; searching for conserved stem-loops with conserved complementarity to mRNAs not only helps to distinguish authentic miRNAs from false positives, but also identifies putative regulatory targets of the predicted miRNAs (2, 46). Genomics of plant microRNAs Taken in aggregate, cloning, genetics, and bioinformatics have identified 114 potential miRNA genes in Arabidopsis (Table 1, Table 2). These 114 miRNA loci can be grouped into 41 multigene families, with each family comprised of stem-loops with the potential to produce identical or highly similar mature miRNAs. 21 families are clearly conserved to additional plant species beyond Arabidopsis (Table 1), whereas for 20 families conservation outside of Arabidopsis has not been observed or is uncertain (Table 2). The following discussion will focus primarily on evolutionarily conserved families, as these generally have more reliable evidence for their expression and regulation of target genes. Expression of plant microRNAs Some miRNAs are among the most abundant cellular RNAs in animals, with individual miRNAs having up to 10,000-50,000 copies per cell (71). Although the expression levels of plant miRNAs have not been quantified, it is clear that many of them are abundantly expressed. Certain miRNAs have been cloned hundreds of times, and most miRNAs are readily detectable by Northern blot (3, 112)(http://asrp.cgrb.oregonstate.edu/db/). More recently, microarray technology has been adapted to rapidly survey the expression profile of plant miRNAs (8). Some miRNAs are expressed in a broad range of tissues, whereas others are expressed most strongly in particular organs or developmental stages (8, 112). More precise data on the localization of a few miRNAs in plants has come from in situ hybridization to miRNAs (25, 48, 52) or from miRNA-responsive reporter genes (102). Little is known about the transcriptional or post-transcriptional regulation of miRNA expression. The expression levels of several miRNAs are responsive to phytohormones or growth conditions; miR159 levels are enhanced by gibberellin signaling (1), and miR393 levels are increased by a variety of stress conditions (124). 15 The dependence of miR395 levels on growth conditions is even more striking. A regulator of sulfate metabolizing enzymes and sulfate transporters(2, 3, 46), miR395 is undetectable in plants grown on standard MS media, but induced over 100 fold in plants which are starved for sulfate (46). Conservation of plant microRNAs Twenty miRNA families have been identified so far that are conserved between all three sequenced plant genomes: Arabidopsis, Oryza sativa (rice), and Populus trichocarpa (Table 1). There are also several examples of miRNA families which are conserved within specific lineages; miR403 is present in the dicots Arabidopsis and Populus but absent from the monocot Oryza (124). An additional three families identified by cloning in Oryza are conserved to other monocots such as Maize, but are not evident in either sequenced dicot (123). Within each family, the mature miRNA is always located on the same arm of the stem-loop for each family member (5' or 3') (Figure 2). Although the sequence of the mature miRNA and, to a lesser extent, the miRNA*, are highly conserved between members of the same miRNA family (both within and between species), the sequence, the secondary structure, and even the length of the intervening "loop" region can be highly divergent between family members (Figure 2). The pattern of pairing and non-pairing nucleotides within the mature miRNA and miRNA* is often conserved between homologous miRNA stem-loops from different species (Figure 2). The significance of these conserved bulges is unknown; perhaps they serve to guide DCL1 cleavage to the appropriate positions along the stem-loop. Most small RNA cloning efforts in plants have focused on Arabidopsis, a dicot, or Oryza, a monocot, and bioinformatic methods have focused on miRNAs conserved between these two species. Both species are angiosperms (flowering plants), and diverged from each other -145 million years ago (23). Growing evidence shows that many angiosperm miRNA families, and their complementary sites in target mRNAs, are conserved in more basal land plants. Ten miRNA families have conserved target sites in ESTs from gymnosperms or more basal plants, and a miR159 stem-loop is present in an EST from the moss Physcomitrella patens (46). A cDNA containing a miR166 stem loop as been cloned from the lycopod Selaginella kraussiana, and miR166 mediates cleavage within the highly conserved miR166 complementary sites of HDZIP mRNAs from gymnosperms, ferns, lycopods, and mosses (36). A systematic search for miRNA expression using microarray technology revealed that at least 11 miRNA families have 16 detectable expression in gymnosperms, and at least 2 (miR160 and miR390) are detectable in moss (8). Furthermore, a clever approach to experimentally identify verify miRNA targets in plants without sequenced genomes found evidence that four miRNA families (miR160, miR167, miR171, and miR172) cleave target mRNAs in gymnosperms, ferns, or mosses that are homologous to the verified Arabidopsis miRNA targets (8). Some of these miRNA families have been shown to regulate development in Arabidopsis, being necessary for processes such as the proper specification of floral organ identity (miR172) or leaf polarity (miR166). It is curious then that these miRNA families regulate homologous mRNAs in basal plant that have very different reproductive structures and leaf morphology. It is tempting to speculate that these miRNAs are parts of ancient, conserved regulatory pathways which underlie seemingly different developmental outcomes. Gene count Counting only the 21 conserved families, the Arabidopsis genome contains at least 91 potential miRNA genes (http://www.sanger.ac.uk/Software/Rfam/mira/index.shtml, Table 1). These families are somewhat expanded in Oryza and Populus, containing 116 and 169 potential miRNA genes, respectively (Tablel). The number of members per family in one genome ranges from 1 to 32. It is unclear why plant genomes contain so many stem-loops encoding similar miRNAs. The number of members in each family seems to be correlated between species; certain families contain numerous members in all three species (e.g. miR156, miR166, miR169), whereas others consistently contain only a few genes (e.g. miR162, miR168, miR394) (Table 1). Although it is unclear why a plant would need, for example, 12 copies of miR156, this correlation suggests a functional significance in the sizes of the various miRNA families. Non-conserved microRNAs Although many miRNA families are conserved widely in plants, others are found only in a single genome, and thus appear to be of a more recent evolutionary origin (Table 2). Based on extended homology between non-conserved miRNAs and target genes, it has been proposed some of these young miRNAs arose as tandem duplications of target-gene segments (4). Although several non-conserved miRNAs have been shown to cleave target mRNAs (4, 130), it is difficult to confidently predict targets for many because it is not possible to use conservation of complementary sites as a filter against false positives. In fact, it is difficult to be confident that all annotated non-conserved miRNAs are in fact miRNAs rather than siRNAs. The 17 established minimal standard is that a small RNA with detectable expression and the potential to from a stem-loop when joined to flanking genomic sequence can be annotated as a miRNA (5). In practice, these requirements are too loose to be useful in categorizing small RNAs cloned from plants. Many plant siRNAs are detectable on blots (138), and hundreds of thousands of non-miRNA genomic sequences can be predicted to fold into secondary structures that resemble the structures of plant miRNA precursors (46). Therefore, without the conservation of a characteristic pattern of sequence and secondary structure, it can be difficult to know if a given cloned RNA originated from a single-stranded stem-loop (i.e. is a miRNA) or from a doublestranded RNA (i.e. is a siRNA). In fact, many of the thousands of cloned Arabidopsis siRNAs (http://asrp.cgrb.oregonstate.edu/) would probably meet the literal requirements for annotation as miRNAs. A few of these sequences probably are miRNAs, but others that might meet the literal criteria probably are not. Because of this difficulty in identifying non-conserved miRNAs, it is not possible to propose a meaningful estimate on the total number of miRNA genes in Arabidopsis or other plant genomes. Because stem-loop structures that resemble miRNA precursors are so common in genomic sequence, bioinformatic searches for plant miRNAs are also prone to identifying false positives. Nine annotated miRNA families that differ from other miRNAs in several key aspects were identified in a bioinformatic screen for miRNAs conserved between Arabidopsis and Oryza (135). Unlike all the other conserved families, each has a single locus in each genome, and none of these 9 families have clearly identifiable homologs in the Populus genome or in ESTs from other plant species (M.W. Jones-Rhoades, personal communication). Taken together with the fact that the stem-loops of many of these miRNAs have more unpaired nucleotides within the miRNA/miRNA* then is typical for miRNAs with more experimental evidence, it appears likely that these sequences are bioinformatic false positives rather than bonafide miRNAs. Regulatory roles of plant microRNAs Regulatory roles of animal microRNAs As cloning experiments in animals identified large numbers of miRNAs, their functions remained largely unknown. Experience from the founding miRNAs, the lin-4 and let-7 RNAs, suggested that many, in not all, cloned miRNAs were also likely to repress the translation of protein coding genes. However, there was a considerable lag between large scale miRNA identification in animals and reliable genome-wide prediction of miRNA regulatory targets. 18 Animal miRNA are capable of repressing mRNAs to which they have quite limited complementarity; 7 or 8 adjacent paired nucleotides in the 5' portion of the miRNA is sufficient for repression in vivo (20, 29). Animal mRNAs contain numerous matches with this degree of complementarity, not only to miRNAs but also to arbitrary sequences with similar dinucleotide composition as miRNAs (67, 68). The primary challenge in predicting animal miRNA targets has been to know which of these numerous potential targets are biologically significant. Algorithms which search for the conservation of potential target sites across multiple species and which take into account the pairing requirements for translational repression (e.g. emphasis on pairing to the 5' portion of the miRNA) have identified thousands of mRNAs as probable targets of animal miRNAs (20, 34, 45, 54, 67, 68, 121). The limited pairing required for translational repression in animals, as well as the large number of predicted targets, has lead to the "micromanager model" for animal miRNA-mediated regulation, whereby many, if not most, animal mRNAs have their expression modulated to a greater or lesser extent through interaction with miRNAs (12). Identification of plant miRNA targets In contrast to the delay in animals, the high degree of complementarity between Arabidopsis miRNAs and their target mRNAs allowed for the confident prediction of targets soon after the discovery of the miRNAs themselves. The first indication of this plant-specific paradigm for miRNA target recognition came from miR171. miR171 has 4 matches in the Arabidopsis genome: one is located between protein coding genes and has a stem-loop structure, whereas the other three are all antisense to SCARECROW-LIKE (SCL) genes and lack stem-loop structures (75, 112). The intergenic miR171 locus with the stem-loop produces a miRNA that guides the cleavage of the complementary SCL mRNAs (76). Although other Arabidopsis miRNAs are not perfectly complementary to mRNAs, most of them are nearly so. An initial genome-wide screen for miRNA targets searched for mRNAs containing ungapped, antisense alignments with 0-3 mismatches to miRNAs, a degree of complementarity highly unlikely to occur by chance (113). Using this cutoff, targets could be predicted for 11 out of 13 miRNA families known at the time, comprising 49 target genes in total (113). For conserved miRNAs, more sensitive predictions that allow for gaps and more mismatches can be made by identifying cases where homologous mRNAs in Arabidopsis and Oryza each have complementarity to the same miRNA family (46). 19 Because plant miRNAs affect the stability of their targets, mRNA expression arrays can be used in experimental genome-wide screens for miRNA targets. For example, expression array data found that five mRNAs encoding TCP transcription factors are down regulated in plants over-expressing miR319 (100). Expression arrays may be especially useful in identifying miRNA targets which have been missed by bioinformatics (i.e. targets with more degenerate or non-conserved complementarity which are nonetheless subject to miRNA-guided cleavage). However, analysis of mRNAs down-regulated in plants overexpressing one of four miRNAs identified only two potentially direct targets not related to those found through bioinformatics (115). Furthermore, evidence for miRNA-guided cleavage of these targets in wild-type plants was not detected by 5' RACE, suggesting that these mRNAs may only be cleaved in plants that ectopically express miRNAs (115). The scope of miRNA-mediated regulation in plants The identity of their predicted targets suggests that plant miRNAs are master regulators; many miRNA targets encode for regulatory proteins. The 21 conserved miRNA families have 90 confirmed or predicted conserved regulatory targets in Arabidopsis (Table 3). 65 (72%) of these encode for transcription factors, pointing to a role for miRNAs in control of transcriptional regulation. Another six (7 %) are F-box proteins or E2 ubiquitin conjugation enzymes thought to be involved in the selective targeting of proteins for degradation by the proteasome, implying a role for miRNAs in regulating protein stability. DCLI, AGO1 and AGO2 are also miRNA targets, suggesting that miRNAs regulate their own biogenesis and function. Other conserved miRNA targets, such as ATP-sulfurylases, superoxide dismutases, and laccases have less clear roles as regulators; although in vivo miRNA-mediated cleavage has been shown for many of these targets, the biological significance of their regulation by miRNAs is not known. All 20 miRNA families that are conserved between Arabidopsis, Populus, and Oryza have complementary sites in target mRNAs that are also conserved in all three species (Table 3). Although these miRNAs may also have targets which are not conserved, this conservation of target sites suggests that miRNAs play similar roles in different plant species. Indeed, mutations in class III HD-ZIP genes that reduce miR166 complementarity in Arabidopsis and Oryza have similar phenotypes (48, 88, 113). However, the expansion of certain miRNA families and target classes in different species suggests that some of these miRNA families may have speciesspecific roles. For example, the miR397 family is complementary to mRNAs of 26 putative 20 laccase genes in Populus, whereas it has comparable complementarity to only three in Arabidopsis. Although the roles that laccases play in the biology of plants is not well understood, there is speculation that they may be involved in lignification (86), a process which may be more critical in a woody plant such as Populus. Validation of plant miRNA targets While the majority of plant miRNA targets were initially predicted through bioinformatics, a growing number have been validated experimentally. One means of target validation has been to use Agrobacterium filtration to observe miRNA-dependent cleavage of targets in Nicotiana benthiama leaves (49, 76). Another has been to assay the endogenous miRNA-mediated cleavage activity that is present in wheat germ lysate (83, 126). Perhaps the most useful method of miRNA target validation has been to use 5' RACE to detect in vivo the products of miRNA mediated cleavage reactions (46, 49, 76, 80, 81, 83, 100, 123, 139). An adaptor oligo is ligated to the 5' end of the uncapped 3' portion of a cleaved miRNA target, followed by PCR with a gene specific primer (49, 76). Sequencing of the resulting PCR product maps the precise position of cleavage within the target, usually between the nucleotides that pair to positions 10 and 11 of the miRNA. A more informative level of target validation is to examine the biological significance of the miRNA-mediated regulation of that target. As discussed below, reverse genetic approaches have yielded information about the in vivo relevance of a growing number of miRNA-target interactions. Regulatory roles of plant microRNAs The first evidence that small RNAs play roles in plant development came from mutants impaired in small RNA biogenesis or function. Indeed, several genes central to miRNA function, including DCL1, AGO1, and HEN1, were initially identified based on the developmental consequences of their mutations before they were known to be important for small RNA biogenesis or function. Multiple groups isolated dcll mutants; the most severe mutations result in early embryonic arrests, and even partial loss-of-function mutants result in pleiotropic defects, including abnormalities in floral organogenesis, leaf morphology, and axillary meristem initiation (reviewed in (114)). agol, henl, hyll, and hst mutants all have pleiotropic developmental defects that overlap with those of dcll plants (16, 26, 77, 91, 127). In addition, plants that express certain viral inhibitors of small RNA processing or function, such as 21 HC-Pro and P19, also exhibit developmental defects reminiscent of dcll mutants (22, 24, 30, 49, 82). Although many or all of these developmental defects may be the result of impaired miRNA activity, they may also reflect disruption of other pathways in which these genes are active, such as in the generation and function of siRNAs. However, in contrast to mutations in genes needed for miRNA biogenesis, mutations in genes required for the accumulation of certain siRNAs, such as AGO4, RDR6, and DCL3, result in few or mild developmental abnormalities (27, 95, 131, 138, 146). Mutations that impair a fundamental step in miRNA biogenesis result in the misregulation of numerous miRNA targets (18, 130), making it difficult to assign the observed phenotypes to any particular miRNA family. Fortunately, the ease by which transgenic Arabidopsis can be generated has allowed the investigation of particular miRNA/target interactions through one of two reverse genetic strategies. The first strategy is to make transgenic plants that overexpress a miRNA, typically under the control of the strong double 35S promoter (Table 4). This approach has the potential to downregulate all mRNAs targeted by the overexpressed miRNA. The second strategy is to make transgenic plants that express a miRNAresistant version of a miRNA target, in which silent mutations have been introduced into the miRNA complementary site that disrupt miRNA-mediated regulation without altering the encoded protein product (Table 5). In total, eight miRNA families have been investigated in vivo by these strategies. As might have been expected from the identity of their target mRNAs, in all eight cases perturbation of miRNA-mediated regulation results in abnormal development. Taken together, they prove that miRNAs are key regulators of many facets of Arabidopsis development. One of the better studied families of miRNA targets are the class III HD-ZIP transcription factors. The importance of miR166-mediated regulation for the proper regulation of this gene class is underscored by the large number of dominant gain-of-function alleles that map to the miR166 complementary sites of HD-ZIP mRNAs (33, 48, 87, 88, 145). phb and phv mutants result in adaxialization of leaves and over-expression of phblphv mRNA(87, 88), whereas rev mutants result in radialized vasculature (33, 145). Similarly, mutations within the miR166 complementary site of the maize HD-ZIP gene RLD1 result in adaxialization of leaf primordia and overaccumulation of rldl mRNA (48). All of these HD-ZIP gain-of-function mutations result in a change in the amino acid sequence of the conserved START domain. Before the 22 discovery of miR166, it was hypothesized that the HD-ZIP mutants resulted from the loss of negative regulatory interaction mediated by the START domain (88). However, transgenic plants expressing miR166-resistant version of PHB, PHV, or REV result in plants that phenocopy their respective gain-of-function mutants, whereas transgenic plants containing additional wildtype copies of these genes have no or mild phenotypes (33, 83). This demonstrates that changes in the RNA sequence, rather than in the amino acid sequence, are sufficient to account for the developmental abnormalities observed in HD-ZIP gain-of-function mutants. miR172-mediated regulation of APETALA2 (AP2) and related AP2-like genes is needed for the proper specification of organs during flower development (7, 25). Plants that overexpress miR172 have floral defects, such as the absence of petals and the transformation of sepals to carpels, which resemble ap2 loss-of-function mutants (7, 25). Curiously, overexpression of miR172 substantially decreases the protein levels of target AP2-like genes without a commensurate change in target mRNA levels, suggesting that, unlike other known plant miRNAtarget interactions, miR172 is repressing translation of AP2-like mRNAs in a manner similar to that employed by animal miRNAs (7, 25). However, the extent of complementarity between miR172 and the AP2-like mRNAs is high, comparable to that of other plant miRNA targets that undergo robust miRNA-mediated cleavage, and 3' cleavage fragments of AP2-like mRNAs can be detected by 5' RACE (7, 49). Indeed, Schwab et al. found that cleavage of miR172 targets is increased in miR172 overexpressing plants, and postulated a feedback mechanism whereby AP2like proteins repress their own transcription, resulting in similar mRNA levels despite an increase in mRNA cleavage (115). It appears that miR172 mediated regulation of AP2-like genes is complex, and it is unclear how similar miR172-mediated regulation is to the miRNA-mediated translational repression observed in animals. Although most miRNA families are predicted to target a single class of targets, the miR159/319 family regulates both MYB and TCP transcription factors. Although miR159 and miR319 differ by only three nucleotides, they appear to be functionally distinct. Overexpression of miR319, which specifically downregulates TCP mRNAs, results in plants with uneven leaf shape and delayed flowering time (100). Expression of miR319-resistant TCP4 results in aberrant seedling that arrest with fused cotyledons and without forming apical meristems (100). Overexpression of miR159, which specifically reduces accumulation of MYB mRNAs, results in male sterility (1, 115), whereas plants that express miR159-resistant MYB33 have upwardly 23 curled leaves, reduced stature, and shortened petioles (90, 100). Thus miR159 and miR319 are related miRNAs that regulate unrelated mRNAs. In addition to the miRNAs that target transcription factors, two miRNAs families are known to target genes central to miRNA biogenesis and function; miR162 targets DCL1 (139) and miR168 targets AGOI (113, 129). The targeting of these genes suggests a feedback mechanism whereby miRNAs negatively regulate their own activity. Curiously, although plants expressing miR168-resistant AGO1 overaccumulate AGO1 mRNA as expected, they also overaccumulate numerous other miRNA targets and exhibit developmental defects which overlap with those of dcll, henl, and hyll loss-of-function mutants (129). This suggests that an overabundance of AGO1 inhibits, rather than promotes, RISC activity (129). MicroRNAs: plants vs. animals As our understanding of miRNA genomics and function in both plants and animals has grown, so has the realization that there are numerous differences between the kingdoms in terms of the ways miRNAs are made and carry out their regulatory roles. Indeed, the evolutionary relationship between plant and animal miRNAs is unclear. Did the last common ancestor of plants and animals possess miRNAs from which modem miRNA are descended, or did the plant and animal lineages independently adapt conserved RNAi machinery to use endogenously expressed stem-loop RNAs as trans regulators of other genes? Although miRNAs are deeply conserved within each kingdom (8, 36, 57, 61, 63, 71, 105), no particular miRNA is known to be conserved between kingdoms. There are several kingdom-specific differences in miRNA biogenesis. For one thing, the stem-loop precursors of plant miRNAs are markedly longer and more variable than their animal counterparts. The cellular localization of processing appears to differ between plant miRNAs, which are entirely processed within the nucleus (101, 103, 138), and animal miRNAs, which are processed both in the nucleus and in the cytoplasm (65). Perhaps more importantly, the scope and mode of regulation carried out by miRNAs appears to be drastically different between the two kingdoms. Most plant miRNAs guide the cleavage of target mRNAs (46, 49, 76, 126), and the predicted targets of Arabidopsis miRNAs, which comprise less than 1% of protein coding genes, are highly biased towards transcription factors and other regulatory genes (46, 113). Although at least some animal miRNAs guide cleavage of endogenous targets (84, 141), most appear to act through the repression of translation (19, 20, 68, 117, 136). Furthermore, the identification of conserved reverse complementary matches to the 24 5' "seed" portions of animal miRNAs suggests that a large percentage (20-30 % or more) of animal protein coding genes are conserved miRNA targets (20, 67, 137). Whatever the evolutionary relationship is between plant and animal miRNAs, the functional differences are striking. Summary Plant miRNAs were initially identified through cloning, without any indication as to their biological roles. In chapter one of this thesis, I describe the initial genome-wide bioinformatic screen for plant miRNA regulatory targets. We show that Arabidopsis miRNAs are complementary to far more mRNAs than would be expected by chance, and propose that mRNAs that can pair to miRNAs with less than three unpaired nucleotides are likely to be miRNA targets. Furthermore, many of these miRNA complementary sites are conserved to orthologous Oryza mRNAs, implying that miRNA-mediated regulation of many targets predates the divergence of dicots and monocots. It total, we identified 49 predicted targets, of which 34 encode for transcription factors. Our findings indicated that miRNAs are key components of numerous regulatory circuits in plants and set the stage for numerous additional experiments to investigate in depth the significance of miRNA-mediated regulation for particular target families and genes. Cloning is an efficient way to identify abundant miRNAs, but it is likely to miss those expressed at low levels or under specific conditions. In chapter two, I describe the development and implementation of a bioinformatic approach to identify conserved miRNAs unrelated to those discovered by cloning. In conjunction with this, I used the conservation of miRNA target sites to increase the sensitivity and selectivity of plant miRNA target prediction. Seven previously unknown families of miRNAs were identified computationally and verified experimentally. These newly identified families expanded the categories of genes known to be regulated by miRNAs to include F-box genes, sulfate metabolizing genes, laccases, and superoxide dismutases. Bioinformatic approaches have proven effective at identifying targets of plant miRNAs, and moderately high throughput methods such as 5' RACE can detect evidence for the interaction of many miRNA-mRNA pairs. However, our understanding of the biological significance of plant miRNAs has been greatly aided by reverse genetic approaches that allow for the disruption of miRNA-mediated regulation. In chapter three, I describe the role of 25 miR394 in the regulation of F-Box gene Atlg27340. Expression of miR394-resistant Atlg27340 results in numerous developmental abnormalities, including downwardly curved rosette leaves, radialized cauline leaves, abortive flowers, and arrested seedlings that lack shoot apical meristems, that correlate with an increase in Atig27340 mRNA levels. 26 Table 1. Genomic loci of conserved plant miRNA families miRNA family A.t. O.s. miR156 12 12 11 miR159/319 6 8 15 miR160 3 6 8 miR162 2 2 3 miR164 3 5 6 miR166 9 12 17 miR167 4 9 8 miR168 2 2 2 miR169 14 17 32 rniR171 4 7 10 miR172 5 3 9 miR390 2 1 4 miR393 2 2 4 miR394 2 1 2 miR395 6 19 10 miR396 2 5 7 miR397 2 2 3 miR398 3 2 3 miR399 6 11 12 miR408 1 1 1 miR403 1 0 2 miR437 0 1+ 0 miR444 0 1+ 0 miR445 0 9+ 0 P.t. Total 91 127 169 The number of identified genes in each family of miRNAs is indicated. Only miRNA families with strong evidence for conservation are listed. Otyza miRNA families which appear to be missing from Arabidopsis and Populus but are present in Maize are marked with a plus (+). 27 Table 2. Genomic loci of non-conserved plant miRNA families miRNA family A.t. O.s. P.t. miR158 2 0 0 miR161 1 0 0 miR163 1 0 0 miR173 1 0 0 miR400 1 0 0 :miR401 1 0 0 miR402 1 0 0 miR404 1 0 0 miR405 3 0 0 miR406 1 0 0 miR407 1 0 0 miR435 0 1 0 miR436 0 1 0 miR438 0 1 0 miR439 0 10 0 miR440 0 1 0 miR441 0 3 0 miR442 0 1 0 miR443 0 1 0 miR446 0 1 0 miR413 1 1 0 miR414 1 1 0 miR415 1 1 0 miR416 1 1 0 miR417 1 1 0 miR418 1 1 0 miR419 1 1 0 miR420 1 1 0 miR426 1 1 0 Total 23 29 0 The number of identified genes in each family of miRNAs is indicated. Only miRNA families without strong evidence for conservation are listed. As discussed in the text, miR413miR426 were identified bioinformatically as conserved between Arabidopsis and Oryza, but are not evident in Populus and it is unclear if they are truly miRNAs. 28 Table 3. Regulatory targets of plant miRNAs miRNA family Target family Validated targets Validation method miR156 SBP SPL2, SPL3, SPL4, SPL10(3, 24, 49, 131) 5' RACE 11 9 16 miR159/319 MYB MYB33, MYB65(1, 90, 100) 8 6 5 miR159/319 TCP TCP2, TCP3, TCP4, TCP10, TCP24(100) target, miRNA-resistant target, Agro-infiltration 5' RACE, miRNA-resistanttarget 5 4 7 miR160 ARF ARF10, ARF16, ARF17(3, 49, 80) 5' RACE, miRNA-resistant target 3 5 9 miR164 NAC CUC1,CUC2,NAC1,At5g07680, At5g61430(39, 49, 62, 81) 5' RACE, wheat germ lysate, miRNA-resistant target 6 6 6 miR166 HD-ZlPIII PHB, PHV, REV, A THB-8, ATHB-15(33, 53, 83,126) 5' RACE, wheat germ lysate, 5 4 9 miR167 ARF ARF6, ARF8(3, 49) 5' RACE 2 4 7 miR169 HAP2 Atlg17590, Atlg72830, Atlg54160, At3g05690, 5 RACE 8 7 9 miR171 SCL SCL6-111, SCL6-IV(49, 76) 5' RACE, Agro-infiltration 3 5 9 miR172 AP2 AP2, TOE1,TOE2,TOE3(7, 25, 49) 5' RACE, miRNA-resistant target 6 5 6 miR393 bZIP* Atlg27340(46) 5' RACE 1 1 1 miR396 GRF GRL1, GRL2, GRL3, GRL7, GRL8, GRL9(46) 5' RACE 7 9 9 total, transcription factors 65 65 93 miR161 PPR Atlg06580(4, 131) 5' RACE 9 0 0 miR162 Dicer DCL1(139) 5' RACE 1 1 1 miR163 SAMT Atlg66690, Atlg66700, Atlg66720, At3g44860(4) 5' RACE 5 0 0 miR168 ARGONAUTE AG01(129, 131) 5' RACE, miRNA-resistant target 1 6 2 miR393 F-box TIR1, Atlg12820, At3g26810 At4g03190, 5' RACE 4 2 5 miRNA-resistanttarget At5g06510(46) At3g23690(46) A.t. O.s. P.t. miR394 F-box Atlg27340(46, 47) 5' RACE, miRNA-resistant target 1 1 2 miR395 APS APS1,APS4(46) 5' RACE 3 1 2 miR395 S transporter AST68(3) 5' RACE 1 2 3 miR396 Rhodenase 1 1 1 miR397 Laccase At2g29130, At2g38080, At5g60020(46) 5' RACE 3 15 26 miR398 CSD* CSD1, CSD2(46) 5' RACE 2 2 2 miR398 CytC oxidase* At3g15640(46) 5' RACE 1 1 0 miR399 Ph transporter 1 4 4 miR399 E2-UBC 2 miR403 At2g33770(3) 5' RACE 1 1 ARGONAUTE AGO2 (3) 5' RACE 1 0 1 miR408 Laccase At2g30210(115) 5' RACE 3 2 3 miR408 Plantacyanin 7448.m00137(123) 5' RACE 1 3 1 total, non-transcription factors 39 42 55 Validated and predicted targets of Arabidopsis miRNAs are listed, grouped into those encoding transcription factors (top) and those encoding other functionalities (bottom). For each target family, the number of genes predicted to be targets in each of three plant species with sequenced genomes (A.t., Arabidopsis thaliana; O.s., Oryza sativa; P.t., Populus trichocarpa) is indicated. To be counted, a potential target must contain a complementary site to at least one member of the indicated miRNA family with a score of 3 or less (as described (46) ), with the exception of the target families marked with an asterisk, for which some targets with more relaxed complementarity were included. Non-validated target families are listed only if they are present in all three species. miR408-directed cleavage of plantacyanin mRNAs have been validated only in Oryza. Abbreviations: SBP, SQUAMOSA-promoter binding protein; ARF, AUXIN RESPONSE FACTOR; SCL, SCARECROW-LIKE; GRF, GROWTH REGULATING FACTOR; SAMT, SAM-dependant methyl transferase; APS, ATP-sulfurylase; CSD, COPPER SUPEROXIDE DISMUTASE; E2-UBC, E2 ubiquitinconjugating protein 29 'Table 4. miRNA overexpression affecting development miRNA target family Consequences of overexpression miR156 SPL transcription factors Increased leaf initiation, decreased apical dominance, delayed flowering time (115) miR159 MYB transcription factors Male sterility, delayed flowering time (1) miR319 TCP transcription factors Uneven leaf shape and curvature, late flowering (100) miR164 NAC domain transcription factors Organ fusion (62, 81) miR166 HD-ZIP transcription factors Seedling arrest, fasciated apical meristems, female sterility (53) miR172 AP2-like transcription factors Early flowering, lack of petals, transformation of sepals to carpels (7, 25) Table 5. miRNA-resistant target affecting development miRNA family miR159 miRNA-resistant target MYB33 promoter 35S Phenotype Upwardly curled leaves (100) miR159 MYB33 Endogenous Upwardly curled leaves, reduced stature, shortened petioles (90) miR319 TCP4 Endogenous and 35S Arrested seedlings, fused cotyledons, lack of SAM (100) miR319 TCP2 35S Longer hypocotyls, reduced stature and apical dominance (100) miR160 ARF17 Endogenous miR164 CUC1 Endogenous miR164 CUC2 Inducible and 35S Extra cotyledons (80) Shortened rosette leaf petioles, aberrant leaf shape, extra petals, missing sepals (81) Aberrant leaf shape, extra petals, increased sepal separation (62) miR164 NACI 35S Increased number of lateral roots (39) miR166 REV Endogenous Radialized vasculature, strands of leaf tissue attached to stem (33) miR166 PHB 35S Adaxialized leaves, ectopic meristems (83) miR168 AGO1 Endogenous Curled leaves, disorganized phyllotaxy, reduced fertility (129) miR172 AP2 35S Late flowering, excess of petals and stamens (25) miR394 Atlg27340 Endogenous Curled leaves, lack of SAM, abortive flowers, "spiked" cauline leaves (47) 30 Figure Legends Figure 1. Mechanisms of small RNA biogenesis and function (A) A model for miRNA biogenesis in Arabidopsis. Following transcription (step 1), the Pri- miRNA is processed by DCL1, and perhaps other factors, to a miRNA:miRNA* duplex (step 2). Pre-miRNAs, which are readily detectable in animals, appear to be very shortlived in plants. The 3' sugars of the miRNA:miRNA* duplex are methylated by HEN1, presumably within the nucleus (step3). The miRNA is exported to the cytoplasm by HST, probably with the aid of additional factors (step 4). The mature, methylated miRNA is separated from the miRNA*, perhaps with the aid of a helicase. The miRNA is incorporated into RISC, an Argonaute containing ribonucleoprotein complex, while the miRNA* is degraded (step 5). For plant miRNAs, unwinding of the miRNA:miRNA* duplex may occur before export to the cytoplasm. (B) A model for siRNA biogenesis. Long double-stranded RNA, perhaps generated through the action of an RNA-dependent RNA polymerase (RDRP), is iteratively processed by Dicer-like proteins to yield multiple siRNA duplexes. One strand from each siRNA duplex is stably incorporated into RISC, while the other is degraded. (C) MicroRNAs or siRNAs can guide RISC to cleave mRNAs with extensive complementarity to the small RNA. The complementarity to the small RNA can occur at any point within the target RNA. (D) MicroRNAs or siRNAs can repress functional translation of target mRNAs. In animals, most known example of translational repression involve multiple sites in the 3' UTR with imperfect complementarity to the small RNA. The one plant miRNA which has been reported to mediate translational repression, miR172, recognizes its targets through a single site with near perfect complementarity. (E) Small RNAs play roles in the establishment of transcriptionally silent heterochromatin. The exact role played by the small RNAs in this pathway is not clear, nor is it known if they base pair to DNA or RNA. Figure 2. Representative miR164 stem-loop precursor from Arabidopsis, Oryza, and Populus. The mature microRNAs are shown in red. 31 Figure 1 Exogenous RNA, transposon, virus,endogenous RNA miR NA gene Near-perfect complementarity in coding region or UTR Pri-miRNA Long dsRNA I1ll-02IIIwyl or I/ • - k~0 "" 014D Pre-maR NA PremriiR NAý siWRWAWI PlxiesIIII miRNA:miRNA* duplex 010 WAn r M e 1 Nucleus " IK I Mell"llllllIIIIIIIIlk methylated miRNA:miRNA Snort complementary segments ini-U It siR NA duplexes I 1PI 17110 P -=--An duplex = Active chromatin Histone methylation directed by heterochromatic siR NAs U- W~ WP Silent chromatin Mei Mature miRNA within RIS C Mature siRNAs within RISC ~TE Figure 2 AU A U G U G U C-G G U U G A U-A A-U U-A A-U U-A G G U-A G U C-G G'U U'G A-U G A U C-G A-U C-G AAU G -C C-G A-UC U C C A-U A-U A-U AC-GC A-U A-U A-U C-G G-C U-A G-C C-G A-U C A G-C G'U G'U A C 5' G-C A-U A-U G-C A-U G-C G-C U-A U-A G-C A A C U A-U C-G U-A C -G U U C-G A-U C-GA U A-UA G U-A A-U U'G A G U -A A-U C-G U-A G U A-U U-A C-G A-U U U U C A UU CG -C U-A G6U C-G A-U C-G G-C G -C A C A C U G-C A-U A-U A A G-C G-C U-A A-U G -C 3' MIR164a Arabidopsis 5' 3' MIR164b Arabidopsis CC U A C-G G-C A C U U C-G U U C -G C U U G C-G C-G U U U C-G C-G U-A U-A C-G U U U U U-A C-G U-A AUC AA CUAC -G U-A U-A G-C U A U C AC-G G-C U -AC A-U C-G G-C U-A C-G A-U C-G G G G U G-C A-U C C G-C A-U A-U G-C A U G-C G-C U-A U U G-C 5' 3' MIR164a Oryza C G C C-G C-G G-C UAC G U-A G-C A-U -C G-C CG - C20 nt loop C-G C C U-A U'G G-C A-U G-C G -C G-C U-A U C U C-G G-C C C U A-U C-C C -G U -A A-U C-G C-C A-U UC UU A UU CG.C U-A G'U C-G A-U C-G G-C G-C A G C U G-C A-U A-U G-C A-U G-C G-C U-A G-C G-C 5' 3' MIR164b Oryza UU A C G-C U -A G-C U-A C-G A-U A-U U-A A-U C-G G-C U-A A-U U U A C U A CG-CU U-A C-G A C G -C C-G G-C G-C G-C A-U C U G- C A-U G -C AG-U G U C A-U G-C AU G-C G-C AG C G U G -C U-A A C G-C U-A A-U G-C 5' 3' MIR164a Populus 5' 3' MIR164b Populus 1. 2. 3. 4. Achard P, Herr A, Baulcombe DC, Harberd NP. 2004. Modulation of floral development by a gibberellin-regulated microRNA. Development 131: 3357-65 Adai A, Johnson C, Mlotshwa S, Archer-Evans S, Manocha V, et al. 2005. Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res 15: 78-91 Allen E, Xie Z, Gustafson AM, Carrington JC. 2005. microRNA-Directed Phasing during Trans-Acting siRNA Biogenesis in Plants. Cell 121: 207-21 Allen E, Xie Z, Gustafson AM, Sung GH, Spatafora JW, Carrington JC. 2004. Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana.Nat Genet36: 1282-90 5. 6. 7. 8. 9. 10. 11. 12. 13. 1.4. 15. Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, et al. 2003. A uniform system for microRNA annotation. RNA9: 277-9 Ambros V, Lee RC, Lavanway A, Williams PT, Jewell D. 2003. MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr Biol 13: 807-18 Aukerman MJ, Sakai H. 2003. Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15: 2730-41 Axtell MJ, Bartel DP. 2005. Antiquity of MicroRNAs and Their Targets in Land Plants. Plant Cell 17 Baker CC, Sieber P, Wellmer F, Meyerowitz EM. 2005. The early extra petalsl mutant uncovers a role for microRNA miR164c in regulating petal number in Arabidopsis. Curr Biol 15: 303-15 Bao N, Lye KW, Barton MK. 2004. MicroRNA binding sites in Arabidopsis class III HD-ZIP mRNAs are required for methylation of the template chromosome. Dev Cell 7: 653-62 Bartel DP. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-97 Bartel DP, Chen CZ. 2004. Micromanagers of gene expression: the potentially widespread influence of metazoan microRNAs. Nat Rev Genet 5: 396-400 Baulcombe D. 2004. RNA silencing in plants. Nature 431: 356-63 Beclin C, Boutet S, Waterhouse P, Vaucheret H. 2002. A branched pathway for transgene-induced RNA silencing in plants. Curr Biol 12: 684-8 Bernstein E, Caudy AA, Hammond SM, Hannon GJ. 2001. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409: 363-6 16. Bohmert K, Camus I, Bellini C, Bouchez D, Caboche M, Benning C. 1998. AGO1 17. defines a novel locus of Arabidopsis controlling leaf development. EMBO J 17: 170-80 Bonnet E, Wuyts J, Rouze P, Van de Peer Y. 2004. Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes.Proc Natl Acad Sci U S A 101: 11511-6 18. 19. Boutet S, Vazquez F, Liu J, Beclin C, Fagard M, et al. 2003. Arabidopsis HENI: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13: 843-8 Brennecke J, Hipfner DR, Stark A, Russell RB, Cohen SM. 2003. bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in Drosophila. Cell 113: 25-36 34 20. 21. 22. Brennecke J, Stark A, Russell RB, Cohen SM. 2005. Principles of microRNA-target recognition. PLoS Biol 3: e85 Carmell MA, Xuan Z, Zhang MQ, Hannon GJ. 2002. The Argonaute family: tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis. Genes Dev 16: 2733-42 Chapman EJ, Prokhnevsky AI, Gopinath K, Dolja VV, Carrington JC. 2004. Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18: 1179-86 23. 24. Chaw SM, Chang CC, Chen HL, Li WH. 2004. Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. J Mol Evol 58: 424-41 Chen J, Li WX, Xie D, Peng JR, Ding SW. 2004. Viral virulence protein suppresses RNA silencing-mediated defense but upregulates the role of microrna in host gene expression. Plant Cell 16: 1302-13 25. Chen X. 2004. A microRNA as a translational repressor of APETALA2 in Arabidopsis 26. flower development. Science 303: 2022-5 Chen X, Liu J, Cheng Y, Jia D. 2002. HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129: 1085-94 27. 28. Dalmay T, Hamilton A, Rudd S, Angell S, Baulcombe DC. 2000. An RNA-dependent RNA polymerase gene in Arabidopsis is required for posttranscriptional gene silencing mediated by a transgene but not by a virus. Cell 101: 543-53 Doench JG, Petersen CP, Sharp PA. 2003. siRNAs can function as miRNAs. Genes Dev 17: 438-42 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. Doench JG, Sharp PA. 2004. Specificity of microRNA target selection in translational repression. Genes Dev 18: 504-11 Dunoyer P, Lecellier CH, Parizotto EA, Himber C, Voinnet 0. 2004. Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16: 1235-50 Elbashir SM, Lendeckel W, Tuschl T. 2001. RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev 15: 188-200 Elbashir SM, Martinez J, Patkaniowska A, Lendeckel W, Tuschl T. 2001. Functional anatomy of siRNAs for mediating efficient RNAi in Drosophila melanogaster embryo lysate. EMBO J 20: 6877-88 Emery JF, Floyd SK, Alvarez J, Eshed Y, Hawker NP, et al. 2003. Radial patterning of Arabidopsis shoots by class III HD-ZIP and KANADI genes. Curr Biol 13: 1768-74 Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. 2003. MicroRNA targets in Drosophila. Genome Biol 5: R1 Finnegan EJ, Margis R, Waterhouse PM. 2003. Posttranscriptional gene silencing is not compromised in the Arabidopsis CARPEL FACTORY (DICER-LIKE1) mutant, a homolog of Dicer-i from Drosophila. Curr Biol 13: 236-40 Floyd SK, Bowman JL. 2004. Gene regulation: ancient microRNA target sequences in plants. Nature 428: 485-6 Grad Y, Aach J, Hayes GD, Reinhart BJ, Church GM, et al. 2003. Computational and experimental identification of C. elegans microRNAs. Mol Cell 11: 1253-63 Grishok A, Pasquinelli AE, Conte D, Li N, Parrish S, et al. 2001. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106: 23-34 35 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. Guo HS, Xie Q, Fei JF, Chua NH. 2005. microRNA164 Directs NAC1 mRNA Cleavage to Downregulate Auxin Signals for Lateral Root Development. Plant Cell Ha I, Wightman B, Ruvkun G. 1996. A bulged lin-4/lin-14 RNA duplex is sufficient for Caenorhabditis elegans lin-14 temporal gradient formation. Genes Dev 10: 3041-50 Han MH, Goud S, Song L, Fedoroff N. 2004. The Arabidopsis double-stranded RNAbinding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci U S A 101: 1093-8 Hunter C, Sun H, Poethig RS. 2003. The Arabidopsis heterochronic gene ZIPPY is an ARGONAUTE family member. Curr Biol 13: 1734-9 Hutvagner G, McLachlan J, Pasquinelli AE, Balint E, Tuschl T, Zamore PD. 2001. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293: 834-8 Hutvagner G, Zamore PD. 2002. A microRNA in a multiple-turnover RNAi enzyme complex. Science 297: 2056-60 John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. 2004. Human MicroRNA targets. PLoS Biol 2: e363 Jones-Rhoades MW, Bartel DP. 2004. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell 14: 787-99 Jones-Rhoades MW, Bartel DP. 2005. MicroRNA-mediated regulation of an F-box gene is required for embryonic, floral, and vegetative development. In Preperation Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MC. 2004. microRNA-mediated repression of rolled leafl specifies maize leaf polarity. Nature 428: 84-8 Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, et al. 2003. P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA unction. Dev Cell 4: 205-17 Ketting RF, Fischer SE, Bernstein E, Sijen T, Hannon GJ, Plasterk RH. 2001. Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15: 2654-9 Khvorova A, Reynolds A, Jayasena SD. 2003. Functional siRNAs and miRNAs exhibit strand bias. Cell 115: 209-16 Kidner CA, Martienssen RA. 2004. Spatially restricted microRNA directs leaf polarity through ARGONAUTE. Nature 428: 81-4 Kim J, Jung JH, Reyes JL, Kim YS, Kim SY, et al. 2005. microRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. Plant J 42: 84-94 Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, et al. 2004. A combined computational-experimental approach predicts human microRNA targets. Genes Dev 18: 1165-78 Knight SW, Bass BL. 2001. A role for the RNase III enzyme DCR-1 in RNA interference and germ line development in Caenorhabditis elegans. Science 293: 2269-71 Kurihara Y, Watanabe Y. 2004. Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci U S A 101: 12753-8 Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294: 853-8 Lagos-Quintana M, Rauhut R, Meyer J, Borkhardt A, Tuschl T. 2003. New microRNAs from mouse and human. Rna 9: 175-9 36 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. 2002. Identification of tissue-specific microRNAs from mouse. Curr Biol 12: 735-9 Lai EC, Tomancak P, Williams RW, Rubin GM. 2003. Computational identification of Drosophila microRNA genes. Genome Biol 4: R42 Lau NC, Lim LP, Weinstein EG, Bartel DP. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294: 858-62 Laufs P, Peaucelle A, Morin H, Traas J. 2004. MicroRNA regulation of the CUC genes is required for boundary size control in Arabidopsis meristems. Development 131: 4311-22 Lee RC, Ambros V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294: 862-4 Lee RC, Feinbaum RL, Ambros V. 1993. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843-54 Lee Y, Ahn C, Han J, Choi H, Kim J, et al. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415-9 Lee Y, Kim M, Han J, Yeom KH, Lee S, et al. 2004. MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23: 4051-60 Lewis BP, Burge CB, Bartel DP. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120: 1520 Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. 2003. Prediction of mammalian microRNA targets. Cell 115: 787-98 Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP. 2003. Vertebrate microRNA genes. Science 299: 1540 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, et al. 2005. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433: 769-73 Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, et al. 2003. The microRNAs of Caenorhabditis elegans. Genes Dev 17: 991-1008 Lingel A, Simon B, Izaurralde E, Sattler M. 2003. Structure and nucleic-acid binding of the Drosophila Argonaute 2 PAZ domain. Nature 426: 465-9 Liu J, Carmell MA, Rivas FV, Marsden CG, Thomson JM, et al. 2004. Argonaute2 is the catalytic engine of mammalian RNAi. Science 305: 1437-41 Liu Q, Rand TA, Kalidas S, Du F, Kim HE, et al. 2003. R2D2, a bridge between the initiation and effector steps of the Drosophila RNAi pathway. Science 301: 1921-5 Llave C, Kasschau KD, Rector MA, Carrington JC. 2002. Endogenous and silencingassociated small RNAs in plants. Plant Cell 14: 1605-19 Llave C, Xie Z, Kasschau KD, Carrington JC. 2002. Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297: 2053-6 Lu C, Fedoroff N. 2000. A mutation in the Arabidopsis HYLI gene encoding a dsRNA binding protein affects responses to abscisic acid, auxin, and cytokinin. Plant Cell 12: 2351-66 Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U. 2004. Nuclear export of microRNA precursors. Science 303: 95-8 Lynn K, Fernandez A, Aida M, Sedbrook J, Tasaka M, et al. 1999. The PINHEADIZWILLE gene acts pleiotropically in Arabidopsis development and has overlapping functions with the ARGONAUTE1 gene. Development 126: 469-81 37 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. Mallory AC, Bartel DP, Bartel B. 2005. microRNA-Directed Regulation of Arabidopsis AUXIN RESPONSE FACTOR17 Is Essential for Proper Development and Modulates Expression of Early Auxin Response Genes. Plant Cell 17 Mallory AC, Dugas DV, Bartel DP, Bartel B. 2004. MicroRNA regulation of NACdomain targets is required for proper formation and separation of adjacent embryonic, vegetative, and floral organs. Curr Biol 14: 1035-46 Mallory AC, Reinhart BJ, Bartel D, Vance VB, Bowman LH. 2002. A viral suppressor of RNA silencing differentially regulates the accumulation of short interfering RNAs and micro-RNAs in tobacco. Proc Natl Acad Sci U S A 99: 15228-33 Mallory AC, Reinhart BJ, Jones-Rhoades MW, Tang G, Zamore PD, et al. 2004. MicroRNA control of PHABULOSA in leaf development: importance of pairing to the microRNA 5' region. EMBO J 23: 3356-64 Mansfield JH, Harfe BD, Nissen R, Obenauer J, Srineel J, et al. 2004. MicroRNAresponsive 'sensor' transgenes uncover Hox-like and other developmentally regulated patterns of vertebrate microRNA expression. Nat Genet 36: 1079-83 Matzke M, Aufsatz W, Kanno T, Daxinger L, Papp I, et al. 2004. Genetic analysis of RNA-mediated transcriptional gene silencing. Biochim Biophys Acta 1677: 129-41 Mayer AM, Staples RC. 2002. Laccase: new functions for an old enzyme. Phytochemistry 60: 551-65 McConnell JR, Barton MK. 1998. Leaf polarity and meristem formation in Arabidopsis. Development 125: 2935-42 McConnell JR, Emery J, Eshed Y, Bao N, Bowman J, Barton MK. 2001. Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411: 709-13 Mette MF, van der Winden J, Matzke M, Matzke AJ. 2002. Short RNAs can identify new candidate transposable element families in Arabidopsis. Plant Physiol 130: 6-9 Millar AA, Gubler F. 2005. The Arabidopsis GAMYB-Like Genes, MYB33 and MYB65, Are MicroRNA-Regulated Genes That Redundantly Facilitate Anther Development. Plant Cell 17: 705-21 Morel JB, Godon C, Mourrain P, Beclin C, Boutet S, et al. 2002. Fertile hypomorphic ARGONAUTE (agol) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14: 629-39 Moss EG, Lee RC, Ambros V. 1997. The cold shock domain protein LIN-28 controls developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88: 637-46 Motamedi MR, Verdel A, Colmenares SU, Gerber SA, Gygi SP, Moazed D. 2004. Two RNAi complexes, RITS and RDRC, physically interact and localize to noncoding centromeric RNAs. Cell 119: 789-802 94. 95. 96. Mourelatos Z, Dostie J, Paushkin S, Sharma A, Charroux B, et al. 2002. miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 16: 720-8 Mourrain P, Beclin C, Elmayan T, Feuerbach F, Godon C, et al. 2000. Arabidopsis SGS2 and SGS3 genes are required for posttranscriptional gene silencing and natural virus resistance. Cell 101: 533-42 Moussian B, Haecker A, Laux T. 2003. ZWILLE buffers meristem stability in Arabidopsis thaliana. Dev Genes Evol 213: 534-40 38 97. 98. 99. 100. 101. Moussian B, Schoof H, Haecker A, Jurgens G, Laux T. 1998. Role of the ZWILLE gene in the regulation of central shoot meristem cell fate during Arabidopsis embryogenesis. EMBO J 17: 1799-809 Noma K, Sugiyama T, Cam H, Verdel A, Zofall M, et al. 2004. RITS acts in cis to promote RNA interference-mediated transcriptional and post-transcriptional silencing. Nat Genet 36: 1174-80 Olsen PH, Ambros V. 1999. The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the initiation of translation. Dev Biol 216: 671-80 Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, et al. 2003. Control of leaf morphogenesis by microRNAs. Nature 425: 257-63 Papp I, Mette MF, Aufsatz W, Daxinger L, Schauer SE, et al. 2003. Evidence for nuclear processing of plant micro RNA and short interfering RNA precursors. Plant Physiol 132: 1382-90 102. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O. 2004. In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18: 2237-42 103. Park MY, Wu G, Gonzalez-Sulser A, Vaucheret H, Poethig RS. 2005. Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A 102: 3691-6 :104. Park W, Li J, Song R, Messing J, Chen X. 2002. CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12: 1484-95 105. 106. 1.07. 108. 109. 110. Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, et al. 2000. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408: 86-9 Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS. 2004. SGS3 and SGS2/SDE1/RDR6 are required for juvenile development and the production of transacting siRNAs in Arabidopsis. Genes Dev 18: 2368-79 Pfeffer S, Sewer A, Lagos-Quintana M, Sheridan R, Sander C, et al. 2005. Identification of microRNAs of the herpesvirus family. Nat Methods 2: 269-76 Pfeffer S, Zavolan M, Grasser FA, Chien M, Russo JJ, et al. 2004. Identification of virusencoded microRNAs. Science 304: 734-6 Poy MN, Eliasson L, Krutzfeldt J, Kuwajima S, Ma X, et al. 2004. A pancreatic isletspecific microRNA regulates insulin secretion. Nature 432: 226-30 Reinhart BJ, Bartel DP. 2002. Small RNAs correspond to centromere heterochromatic repeats. Science 297: 1831 111. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, et al. 2000. The 21nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403: 901-6 112. 113. 114. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP. 2002. MicroRNAs in plants. Genes Dev 16: 1616-26 Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP. 2002. Prediction of plant microRNA targets. Cell 110: 513-20 Schauer SE, Jacobsen SE, Meinke DW, Ray A. 2002. DICER-LIKE : blind men and elephants in Arabidopsis development. Trends Plant Sci 7: 487-91 39 115. 116. 117. 118. Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D. 2005. Specific Effects of MicroRNAs on the Plant Transcriptome. Dev Cell 8: 517-27 Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. 2003. Asymmetry in the assembly of the RNAi enzyme complex. Cell 115: 199-208 Slack FJ, Basson M, Liu Z, Ambros V, Horvitz HR, Ruvkun G. 2000. The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol Cell 5: 659-69 Song JJ, Liu J, Tolia NH, Schneiderman J, Smith SK, et al. 2003. The crystal structure of the Argonaute2 PAZ domain reveals an RNA binding motif in RNAi effector complexes. Nat Struct Biol 10: 1026-32 119. 120. 121. 122. 123. 124. 125. Song JJ, Smith SK, Hannon GJ, Joshua-Tor L. 2004. Crystal structure of Argonaute and its implications for RISC slicer activity. Science 305: 1434-7 Souret FF, Kastenmayer JP, Green PJ. 2004. AtXRN4 degrades mRNA in Arabidopsis and its substrates include selected miRNA targets. Mol Cell 15: 173-83 Stark A, Brennecke J, Russell RB, Cohen SM. 2003. Identification of Drosophila MicroRNA targets. PLoS Biol 1: E60 Suh MR, Lee Y, Kim JY, Kim SK, Moon SH, et al. 2004. Human embryonic stem cells express a unique set of microRNAs. Dev Biol 270: 488-98 Sunkar R, Girke T, Jain PK, Zhu JK. 2005. Cloning and Characterization of microRNAs from Rice. Plant Cell 17: 666-999 Sunkar R, Zhu JK. 2004. Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16: 2001-19 Tabara H, Yigit E, Siomi H, Mello CC. 2002. The dsRNA binding protein RDE-4 interacts with RDE-1, DCR-1, and a DExH-box helicase to direct RNAi in C. elegans. Cell 109: 861-71 126. 127. 128. 129. 130. 131. Tang G, Reinhart BJ, Bartel DP, Zamore PD. 2003. A biochemical framework for RNA silencing in plants. Genes Dev 17: 49-63 Telfer A, Poethig RS. 1998. HASTY: a gene that regulates the timing of shoot maturation in Arabidopsis thaliana. Development 125: 1889-98 Tomari Y, Zamore PD. 2005. MicroRNA biogenesis: drosha can't cut it without a partner. Curr Biol 15: R61-4 Vaucheret H, Vazquez F, Crete P, Bartel DP. 2004. The action of ARGONAUTE in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18: 1187-97 Vazquez F, Gasciolli V, Crete P, Vaucheret H. 2004. The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14: 346-51 Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, et al. 2004. Endogenous trans-acting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16: 6979 132. Verdel A, Jia S, Gerber S, Sugiyama T, Gygi S, et al. 2004. RNAi-mediated targeting of 133. heterochromatin by the RITS complex. Science 303: 672-6 Volpe T, Schramke V, Hamilton GL, White SA, Teng G, et al. 2003. RNA interference is required for normal centromere function in fission yeast. Chromosome Res 11: 137-46 40 134. Volpe TA, Kidner C, Hall IM, Teng G, Grewal SI, Martienssen RA. 2002. Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by RNAi. Science 297: 135. Wang XJ, Reyes JL, Chua NH, Gaasterland T. 2004. Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol 5: R65 Wightman B, Ha I, Ruvkun G. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855-62 Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, et al. 2005. Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. 1833-7 136. 137. Nature 434: 338-45 138. 139. Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, et al. 2004. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2: E104 Xie Z, Kasschau KD, Carrington JC. 2003. Negative feedback regulation of Dicer-Likel in Arabidopsis by microRNA-guided mRNA degradation. Curr Biol 13: 784-9 140. Yan KS, Yan S, Farooq A, Han A, Zeng L, Zhou MM. 2003. Structure and conserved 141. RNA binding of the PAZ domain. Nature 426: 468-74 Yekta S, Shih IH, Bartel DP. 2004. MicroRNA-directed cleavage of HOXB8 mRNA. Science 304: 594-6 142. Yi R, Qin Y, Macara IG, Cullen BR. 2003. Exportin-5 mediates the nuclear export of pre- microRNAs and short hairpin RNAs. Genes Dev 17: 3011-6 143. Yu B, Yang Z, Li J, Minakhina S, Yang M, et al. 2005. Methylation as a crucial step in plant microRNA biogenesis. Science 307: 932-5 144. Zamore PD, Tuschl T, Sharp PA, Bartel DP. 2000. RNAi: double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101: 25-33 145. Zhong R, Ye ZH. 2004. Amphivasal vascular bundle 1, a gain-of-function mutation of the IFLI REV gene, is associated with alterations in the polarity of leaves, stems and carpels. Plant Cell Physiol 45: 369-85 146. Zilberman D, Cao X, Jacobsen SE. 2003. ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science 299: 716-9 1.47. Zilberman D, Cao X, Johansen LK, Xie Z, Carrington JC, Jacobsen SE. 2004. Role of Arabidopsis ARGONAUTE4 in RNA-directed DNA methylation triggered by inverted repeats. Curr Biol 14: 1214-20 41 Prediction of Plant MicroRNA Targets Matthew W. Rhoades l 2 , Brenda J. Reinhart, Lee P. Lim1 2, Christopher B. Burge2 , Bonnie Bartel3 ' 4, and David P. Bartel' 4 'Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142 2Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 3 Department of Biochemistry and Cell Biology, Rice University, 6100 Main St., Houston, Texas 77005 4 To whom correspondence should be addressed. dbartel@wi.mit.edu, 617-258-5287, fax 617-258-6768 bartel@rice.edu, 713-348-5602 42 Summary We predict regulatory targets for 14 Arabidopsis microRNAs (miRNAs) by identifying mRNAs with near complementarity. Complementary sites within predicted targets are conserved in rice. Of the 49 predicted targets, 29 are members of transcription factor gene families involved in developmental patterning or cell differentiation. The near-perfect complementarity between plant miRNAs and their targets suggests that many plant miRNAs act similarly to small interfering RNAs and direct mRNA cleavage. The targeting of developmental transcription factors suggests that many plant miRNAs function during cellular differentiation to clear maternal regulatory transcripts from daughter cell lineages. Introduction Nearly 200 genes for tiny, noncoding RNAs termed microRNAs (miRNAs) have been identified in animals and plants (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001; Lagos-Quintana et al., 2002; Llave et al., 2002; Mourelatos et al., 2002; Reinhart et al., 2002). Two miRNAs, lin-4 and let-7 RNAs, have been studied in detail; both control developmental timing in C. elegans through a mechanism that involves imperfect base pairing to the 3' UTRs of target mRNAs (Lee et al., 1993; Wightman et al., 1993; Ha et al., 1996; Moss et al., 1997; Reinhart et al., 2000; Slack et al., 2000). The remaining miRNAs have unknown functions. Nonetheless, their sequences are typically conserved among different species, and many have intriguing expression patterns in different tissues or stages of development, indicating that these other miRNAs have important functions and might also modulate gene expression. This idea is supported by the observation that Dicer and Argonaute proteins, which are known to be crucial for normal plant and animal development, are needed for proper miRNA accumulation (Robinson-Beers et al., 1992; Ray et al., 1996; Ray et al., 1996; Jacobsen et al., 1999; Grishok et al., 2001; HutvAgneret al., 2001; Ketting et al., 2001; Knight and Bass, 2001; Reinhart et al., 2002). The major challenge in determining miRNA functions is to identify their regulatory targets. By analogy to lin-4 and let-7 RNAs, it is reasonable to suppose that miRNAs generally recognize their regulatory targets through base pairing. However, the small size of the mature miRNAs (20-24 nt) and the imperfect nature of miRNA:mRNA base pairing have hampered the general prediction of mRNA targets for animal miRNAs. Thus far, prediction of animal miRNA 43 targets has been achieved only after experimental evidence narrowed the number of candidate mRNAs to a small set, either by placing the mRNAs within the same regulatory pathway as the miRNA or by identifying regulatory elements within mRNA 3'-UTRs (Lee et al., 1993; Wightman et al., 1993; Moss et al., 1997; Reinhart et al., 2000; Slack et al., 2000; Lai, 2002). An indication that target prediction for certain plant miRNAs might be more straight-forward came with the recent identification of miR171, a plant miRNA with perfect antisense complementarity to the mRNAs of three SCARECROW-like transcription factors (Llave et al., 2002; Reinhart et al., 2002). Here we report that near complementarity to mRNAs, particularly transcription factor mRNAs, is a general trend for plant miRNAs. We have been able to identify potential regulatory targets for 14 of the 16 miRNAs studied by searching for mRNAs capable of base pairing with three or fewer mismatches to one of the miRNAs. The fact that many of these potential targets are members of gene families with roles in plant development supports the idea that the function of miRNAs in mediating development is conserved across kingdoms. Particularly compelling targets include the PHABULOSA and PHAVULOTA mRNAs, for which the identification of miRNA complementary sites may explain the ectopic expression previously described for mutations in these genes (McConnell et al., 2001). Similar analysis of animal miRNAs did not predict animal regulatory targets, suggesting mechanistic differences between plant and animal miRNA function. Results and Discussion Plant MicroRNAs Have Significant Complementarity to Messenger RNAs To identify potential regulatory targets, we searched for Arabidopsis mRNAs that were complementary, with four or fewer mismatches, to at least one of 16 recently identified Arabidopsis miRNAs (Reinhart et al., 2002). Gaps were not allowed, and G:U and other noncanonical pairs were treated as mismatches. To evaluate the significance of these hits to annotated mRNAs, parallel analyses were performed using cohorts of randomly permuted sequences that had identical sizes and base compositions as the set of authentic miRNAs. There were substantially more antisense hits to the authentic miRNAs than to the randomized sequences (Figure 1). This difference was especially striking at higher stringency; when summing the hits with two or fewer mismatches, the number of hits to the authentic miRNA set 44 outnumbered those to the randomized cohorts by a ratio of 30:0.2 (Figure 1). Considering the low probability of so many antisense hits occurring by chance, we suggest that these complementary sites reflect a functional relationship between the miRNAs and the identified mRNAs-that these protein-coding genes are regulatory targets of the miRNAs to which they can potentially base pair. At lower stringencies, there were also significantly more hits with the authentic set of miRNAs than with the randomized cohorts. Most of the 31 hits with three mismatches are viable miRNA target candidates, although a few are likely to be mRNAs with fortuitous complementarity, as judged by the observation that on average the randomized cohorts hit 4.2 mRNAs when three mismatches were permitted (Figure 1). Some hits with four mismatches might also be genuine targets. However, they are not included in the present analysis because of the greater likelihood that their complementarity is fortuitous or occurs because they are targets of unidentified miRNAs related to our query set of 16 miRNAs. Potential regulatory targets with three or fewer mismatches were found for 14 of the 16 miRNAs (Table 1). Targets for the other two miRNAs might be identified through slight changes in the search algorithm. For example, miR163, one of the two miRNAs without predicted targets in Table 1, has extensive complementarity to members of the AtPP-like gene family (Atlg66690, Atlg66700, Atlg66720, At3g44860, At3g44870), which have unknown functions (Cui et al., 1999). All 24 nucleotides of this miRNA paired to complementary sites within these mRNAs when a single-nucleotide gap was permitted near the 3' terminus of the miRNA. Nonetheless, when searching for miRNA targets, permitting gaps did not substantially increase the number of targets predicted for the other miRNAs (data not shown). Perhaps a bulge is accommodated near the miRNA terminus more readily for miR163 because this miRNA is 24 nt in length, which is 3 nt longer than the other miRNAs queried. In all cases where an miRNA was complementary to more than one mRNA, most of the potential targets were members of the same gene family (Table 1). The fraction of the gene family members with miRNA complementary sites varied considerably. Of the 16 Squamosa Promoter Binding Protein (SBP)-like genes in Arabidopsis (Riechmann et al., 2000), 10 have miR156 complementary sites. In contrast, the MYB and NAC families each have over 100 members in Arabidopsis (Riechmann et al., 2000), of which five in each case have sites complementary to miR159 or miR164, respectively. As more miRNAs are identified it will be 45 interesting to learn whether remaining members of these gene families have complementary sites to other miRNAs. In support of this possibility, unrelated miRNAs can be complementary to different members of the same gene family, as illustrated by miR160 and miR167, which apparently target different members of the Auxin Response Factor family (Ulmasov et al., 1999). When considering the significance of multiple hits to the same gene family, it is important to address the possibility that these hits are merely the consequence of complementarity to a nucleotide sequence that encodes a critical protein motif. Indeed, for :miR161,miR165, miR170, and miR171, the miRNA complementary sites were within the context of a domain strongly conserved among family members, as shown for the miR165 complementary sites (Figure 2A). Therefore, we can not rule out the possibility that only a subset of the hits for these miRNAs are authentic targets. This possibility is less likely in the cases of miR156, miR157, miR159, miR160, miR164, and miR169. The complementary sites for these miRNAs fell outside the conserved domains that define the families and instead fell within sequence contexts that were only weakly conserved among the family members, as shown for the miR156 sites within SBP-like mRNAs (Figure 2B). Indeed, there are examples where the conservation of the miRNA complementary sites among family members must be independent of conserved protein function. In the case of the MYB genes with miR159 complementary sites, four genes translate the complementary site in the same reading frame, while the fifth gene translates the site in a different reading frame. In four other cases (miR156/157 to Atlg53160, miR156 to At2g33810, and miR169 to Atlg17590 and Atlg54160), the miRNA complementary sites are not in the coding regions at all but rather in the 3'-UTRs, as illustrated for miR156 and its complementary sites (Figure 2B). MicroRNA Complementary Sites Are Conserved Among Flowering Plants Many complementary sites observed in Arabidopsis are conserved in rice (Oryza sativa). Analysis of rice homologs focused on the seven miRNAs perfectly conserved in Oryza (Reinhart et al., 2002) for which complementary sites had been identified in Arabidopsis (Table 1). When using a three-mismatch cutoff, six of the seven conserved miRNAs (miR156, miR160, miR164, miR167, miR169, miR171) have at least one potential target gene in Oryza homologous to a corresponding Arabidopsis target. In an analogous control study using 44 hits to the randomized 46 cohorts, there were no miRNA complementary sites in rice homologs of the Arabidopsis hits, even when four mismatches were allowed. The location of the miRNA complementary sites within the mRNAs was conserved between Arabidopsis and rice. Importantly, when there were differences between Arabidopsis and rice complementary sites within homologous genes, these differences were distributed evenly across the three codon positions (Table 2). Homologous regions under selection only at the protein level tend to exhibit a higher frequency of differences at codon position 3. Thus, the even distribution of mismatches across the codon positions indicates selection occurring at the nucleic acid level, in addition to any selection at the protein level, as would be expected if these segments act in miRNA recognition. Most Predicted MicroRNA Targets Are Members of Transcription Factor Families Involved in Development Perhaps the most intriguing evidence that these genes are regulatory targets of the miRNAs is the identity of the genes themselves. MicroRNA complementary sites were found in 61 mRNAs, which, due to overlap between similar miRNAs, represent 49 unique genes (Table 1). Of these 49 predicted targets, 29 are known or putative transcription factors (Table 1), even though transcription factors are thought to represent only 6% of protein-coding genes in Arabidopsis (Riechmann et al., 2000). Many of these genes specify shoot and floral meristem development or, for those with unknown functions, are in a family that has members involved in meristem development. For example, the predicted targets of miR164 include CUP-SHAPED COTYLEDON2 (CUC2), which is required for shoot apical meristem formation (Aida et al., 1.997),and miR165 predicted targets include PHABULOSA (PHB) and PHAVOLUTA (PHV), which encode HD-Zip transcription factors that regulate axillary meristem initiation and leaf development (McConnell et al., 2001). A miR159 predicted target, AtMYB33, can bind to the promoter of the floral meristem identity gene LEAFY (Gocal et al., 2001). Homologs of the SBPs, which are thought to regulate the Antirrhinum floral meristem identity gene SQUAMOSA (Klein et al., 1996), may in turn be regulated by miR156 and miR157. Genetic evidence supports the regulatory roles of miR165 complementary sites within PHB and PHV (Figure 2A). Multiple gain-of-function alleles have been isolated for both genes, and each of these mutations disrupts the miR165 complementary site, usually as a single- 47 nucleotide substitution (McConnell et al., 2001). In the mutant examined, phb mRNA expression extends more broadly than in wild type (McConnell et al., 2001), suggesting that complementarity to miR165 is required for confining PHB mRNA accumulation to the proper cell types. A connection between miRNAs and meristem development is consistent with the phenotypes of the Arabidopsis carpelfactory (caj) mutant. Dicer and CAF are homologous RNaseIII-domain proteins required for the accumulation of mature miRNAs in animals and plants, respectively (Hutvdgner and Zamore, 2002; Reinhart et al., 2002). Mutant alleles of CAF, which is also known as SHORT INTEGUMENT] (SIN1), delay the meristem switch from vegetative to floral development and cause over-proliferation of the floral meristem (Ray et al., 1996; Jacobsen et al., 1999). Other genes required for miRNA accumulation in animals are homologs of the Arabidopsis gene ARGONAUTE (AGOI), which is required for axillary shoot meristem formation and leaf development in Arabidopsis (Bohmert et al., 1998). While AGO1 has not yet been reported to influence miRNA accumulation in plants, it is a predicted target of miR168 (Table 1), suggesting a negative-feedback mechanism for controlling expression of the AGO1 gene. Other predicted targets of miRNAs do not have direct roles in meristem identity but rather could have roles in cell division or differentiation. For example, miR160 and miR167 are predicted to target auxin response factors, DNA-binding proteins that are thought to control transcription in response to the phytohormone auxin (Ulmasov et al., 1999). Transcriptional regulation is important for many of the diverse developmental responses to auxin signals, which include cell elongation, division, and differentiation in both roots and shoots (Rogg and Bartel, 2001; Liscum and Reed, 2002). The predicted targets of miR170 and miR171 are three SCARECROW-like proteins, a family of transcription factors whose members have been implicated in radial patterning in roots, signaling by the phytohormone gibberellin, and light signaling (Di Laurenzio et al., 1996; Peng et al., 1997; Silverstone et al., 1998; Bolle et al., 2000; Helariutta et al., 2000). Overall, the high percentage of predicted miRNA targets that act as developmental regulators suggest that miRNAs are involved in a wide range of cell division and cell fate decisions throughout the plant. 48 Mechanistic and Functional Models for Regulation by MicroRNAs in Plants The success in identifying potential miRNA targets in Arabidopsis prompted us to examine whether our simple computational approach could also identify miRNA targets in C. elegans and D. melanogaster. In both organisms, the miRNAs had few mRNA hits with complementary sites--essentially the same number of hits as seen for randomized cohorts (data not shown). While the possibility that a few animal miRNAs will recognize their targets with near-perfect complementarity cannot be excluded, the general phenomenon of near-perfect complementarity appears to be specific to plants. Two other key differences emerge when comparing the predicted target sites of plant miRNAs with those of the C. elegans lin-4 and let-7 miRNAs. First, the plant complementary sites are primarily, though not exclusively, within the ORFs, whereas the only proposed lin-4 and let-7 sites are within 3' UTRs (Lee et al., 1993; Wightman et al., 1993; Moss et al., 1997; Reinhart et al., 2000; Slack et al., 2000). Second, multiple sites 'withinthe same target mRNA are not detected in plants, whereas there are typically multiple lin4 and let-7 sites within each mRNA target (Lee et al., 1993; Wightman et al., 1993; Ha et al., 1996; Reinhart et al., 2000; Slack et al., 2000). These differences observed between plant and animal miRNA target recognition have intriguing mechanistic implications for plant miRNA function (Figure 3A). Namely, plant miRNA target recognition appears to resemble that of small interfering RNAs (siRNAs) much more than that of animal miRNAs. During RNA interference (RNAi), long double-stranded RNA is processed by Dicer into -22-nt siRNAs, which serve as guide RNAs to target homologous mRNA sequences for cleavage (Bernstein et al., 2001; HutvAgnerand Zamore, 2002). Importantly, targeting either the ORF or the UTRs is effective (McManus et al., 2002), provided that the siRNA has near-perfect complementarity to the targeted mRNA (Elbashir et al., 2001). Plants also have siRNAs. Indeed, these tiny RNAs were first observed in plants and are associated with a process related to RNAi, known as posttranscriptional gene silencing (PTGS), which leads to the destruction of mRNA from plant viruses and transgenes (Hamilton and Baulcombe, 1999; Matzke et al., 2001). Plant miRNAs resemble animal miRNAs in their biogenesis, in that they are derived from endogenous, evolutionarily conserved genes and are processed from stem-loop precursors by a Dicer homolog, with accumulation of mature miRNA from only one arm of the precursor stem-loop (Reinhart et al., 2002). However, plant miRNAs resemble siRNAs in their target recognition, suggesting that they might also resemble siRNAs in 49 their mechanism of action (Figure 3A). We propose that many plant miRNAs hybridize to mRNAs with near-perfect complementarity and target the mRNAs for cleavage. A function in mediating RNA cleavage might allow the plant miRNAs to target any region of the mRNA, whereas the animal miRNAs that mediate translational attenuation might be relegated to 3'UTRs in order to avoid the mRNA-clearing activity of ribosomes. The efficiency and finality of mRNA cleavage might require only a single complementary site in each message, whereas the regulatory mechanism of lin-4 and let-7 miRNAs, which leaves the mRNA intact, might generally require multiple target sites. In presenting this hypothesis, we leave open the possibility that some plant miRNAs might not specify cleavage of their regulatory targets, and some might specify cleavage of some targets but employ other mechanisms to regulate other targets. Targets with many mismatches, analogous to the targets of lin-4 and let-7 miRNAs, would not have been detected in our analysis. Furthermore, some mismatches for the predicted targets are near the center of the complementary sites (Table 2, data not shown) and might be expected to abrogate siRNA-mediated mRNA cleavage (Elbashir et al., 2001). However, it is difficult to know whether these mismatches are incompatible with mRNA cleavage because the types and locations of mismatches permissive for siRNA-mediated cleavage are still being determined in animals and have not yet been explored in plants. In those cases where the miRNAs might not be mediating mRNA cleavage, they might attenuate translation (Olsen and Ambros, 1999), act as guide RNAs for mRNA modifications (Kiss, 2002), or target DNA for epigenetic modifications, such as methylation (Matzke et al., 2001). Although DNA targeting cannot be excluded as an additional miRNA function for some miRNAs, two observations argue strongly for a role in targeting mRNAs in addition to any possible role in targeting DNA. First, plant miRNAs are complementary to the sense rather than antisense strands of mRNAs (data not shown). Second, the complementary sites for miR165 and miR166 span a splice junction within each of the HD-Zip mRNAs. The observation that many plant miRNAs potentially target the mRNAs of transcription factors involved in development suggests that some miRNAs might function to clear maternal regulatory transcripts from certain daughter-cell lineages (Figure 3B). Through the action miRNAs, these inherited mRNAs could be eliminated without relying on constitutively unstable messages. Now that potential miRNA binding sites in some of these developmentally important transcription factor mRNAs have been identified, it should be possible to test this speculative 50 model by disrupting the miRNA complementarity site in the mRNA without changing the protein sequence of the transcription factor. The miRNAs analyzed here are likely to be only a small fraction of the miRNAs in Arabidopsis (Llave et al., 2002; Reinhart et al., 2002). Nonetheless, the discovery that so many of these plant miRNAs appear to have readily identifiable regulatory targets will greatly facilitate experimental investigation of the functions of these tiny noncoding RNAs and the many other miRNAs remaining to be found in plants. With the ability to computationally identify candidate targets, the presumed roles of miRNAs in development can be more readily explored, and roles of miRNAs in other processes can be more readily uncovered. Experimental Procedures Identification of miRNA Complementary Sites in Annotated mRNAs The set of annotated Arabidopsis mRNA sequences was extracted from the genomic GenBank files, January 2002 release (Arabidopsis Genome Initiative, 2000). This set was searched for complementary sites to any of 16 miRNAs (GenBank accession numbers AJ493620-AJ493656) using Patscan (Dsouza et al., 1997). When the miRNA was cloned as either a 20- or 21-nt RNA, the 21-nt RNA was used (Reinhart et al., 2002). Thus, the miR158 sequence was 20 nt, the miR163 sequence was 24 nt, and the remaining 14 miRNA sequences were 21 nt. One mismatch was added to all miR158 complementary sites to compensate for their smaller size and the correspondingly greater chance of fortuitous complementarity. Complementary sites were also found for 10 cohorts of 16 randomly permuted sequences that had identical sizes and base compositions to the authentic miRNAs. One mismatch was added to the sites complementary to the randomly permuted versions of miR158. Analogous searches for animal miRNA complementary sites queried annotated D. melanogaster mRNAs (GenBank October 2000 release) and annotated C. elegans coding regions (GenBank April 1999 release). Identification of Homologous miRNA Complementary Sites in Oryza mRNAs For each Arabidopsis target mRNA, the mRNAs of up to 10 homologous Oryza proteins were predicted from the unannotated Oryza contigs (Yu et al., 2002) by GenomeScan, a program which identifies genes within genomic sequence using homology to input protein sequences combined with an ab initio gene-finding algorithm (Yeh et al., 2001). Complementary sites in 51 this dataset were identified by PatScan searches, and homology to the Arabidopsis targets was confirmed by alignment of the inferred protein sequences (ClustalX). One additional target homolog (TC79868) was found by searching the TIGR Rice Gene Index (9.0). Acknowledgments We thank Earl Weinstein and Ru-Fang Yeh for computer scripts and helpful discussions. This work was supported by grants from David H. Koch Cancer Research Fund (DBP), the Alexander and Margaret Stewart Trust (DBP), the Robert A. Welch Foundation (BB), and the NIH (CBB). 52 Table 1. Potential Regulatory Targets of Arabidopsis miRNAs _. . MicroRNA miR156 Target protein family SQUAMOSA-PROMOTER BINDING PROTEIN (SBP) like proteins Target gene names (number of mismatches) At3g57920 (1), At2g42200/SPL9 (1), At5g50570 (1), At5g50670 (1), At1g53160/SPL4 (2), At2g33810/SPL3 (2), At1g27370/SPL10 (2), At5g43270/SPL2 (2), Atlg69170/SPL6 (2), Atlg27360/SPL11 (2) miR157 SQUAMOSA-PROMOTER BINDING PROTEIN (SBP) like proteins Atlg27370/SPL10 (1), At3g57920 (1), At2g42200/SPL9 (1), At5g43270/SPL2 (1), At1 g27360/SPL11 (1), At1g69170/SPL6 (2), At5g50570 (2), At5g50670 (2), At1g53160/SPL4 (3) Putative RNA helicase At5g08620 (3) Unknown proteins At3g47170 (3), Atl g22000 (3) miR158 Unknown protein Atlg64100 (3) miR159 MYB proteins At2g32460/AtMYB101 (2), At3g60460 (3), At2g26950/AtMYB104 (3), At5g06100/AtMYB33 (3), At3g11440/AtMYB65 (3) Unknown protein Atlg29010 (3) miR160 Auxin Response Factors At1g77850/ARF17 (1), At2g28350/ARF10 (2), At4g30080 (3) miR161 PPR repeat proteins At1 g63150 (3), At1g63400 (3), Atlg06580 (3), At1g64580 (3), At5g16640 (3), Atlg62670 (3), Atlg62720 (3), At5g41170 (3), Atlg63080 (3) miR164 NAC domain proteins At5g61430 (2), At5g07680 (2), Atlg56010/NAC1 (2), At3g15170 (3), At5g53950/CUC2 (3) miR165 HD-Zip transcription factors At5g60690/REV (3), At3g34710/PHB (3), At4g32880/ATHB-8 (3), At1g30490/PHV (3) miR166 HD-Zip transcription factor At1 g52150/ATHB-15 (3) miR167 Auxin Response Factor At5g37020/ARF8 (3) miR168 ARGONAUTE Atlg48410/AGO (3) miR169 CCAAT Binding Factor (CBF)-HAP2-like proteins At1g17590 (3), Atlg54160 (3) miR170 GRAS domain proteins (SCARECROW-like) At2g45160 (2), At3g60630 (2), At4g00150/SCL6 (2) miR171 GRAS domain proteins At2g45160 (0), At3g60630 (0), At4g00150/SCL6 (0) (SCARECROW-like) For each gene, the number of mismatches between the miRNA and the mRNA is indicated in parentheses. The sequences of three pairs of miRNAs (miR156/miR157, miR165/miR166, and miR170/miR171) are closely related and therefore sometimes complementary to the same sites within the target mRNAs. Sites complementary to miR158 had an additional mismatch added to compensate for the fact that miR158 is at least 1 nt shorter than the other miRNAs. 53 Table2. MicroRNAComplementary Sitesin PotentialmRNA TargetsConservedBetweenArabidopsisand Oryza Target RNAsequenceof Peptide gene complementarysite sequence uU GCUCAc ucU cUu CUG UCA miR156 At5g50570(1) UGU GCU CuC UCU CUU CUG UCA At5g50670(1) UGU GCU CuC UCU CUU CUG UCA At3g57920 (1) UGU GCU CuC UCU CUU CUG UCA At2g42200 (1) UGU GCU CuC UCU CUU CUG UCA Atl g27370(2) aGU GCU CuC UCU CUU CUG UCA At1g27360(2) cGU GCU CuC UCU CUU CUG UCA At5g43270 (2) gGU GCU CuC UCU CUU CUG UCA At1g69170(2) cGU GCU CuC UCU CUU CUG UCA At2g33810(2) UuU GCU uAC UCU CUU CUG UCA At1g53160(2) UcU GCU CuC UCU CUU CUG UCA Os20095(1) UGU GCU CuC UCU CUU CUG UCA Os06618(1) UGU GCU CuC UCU CUU CUG UCA Os02878(1) UGU GCU CuC UCU CUU CUG UCA Os 25470(2) gGU GCU CuC UCU CUU CUG UCA CALSLLS CALSLLS CALSLLS CALSLLS SALSLLS RALSLLS GALSLLS RALSLLS 3' UTR 3' UTR CALSLLS CALSLLS CALSLLS GALSLLS miR160 U GGC AUA CAG GGAGCC AGG CA Atlg77850 (1) U GGCAUg CAG GGAGCCAGG CA AGMQGARQ At2g28350(2) a GGa AUA CAG GGA GCC AGG CA AGIQGARQ At4g30080 (3) g GGu uUA CAG GGAGCC AGG CA OsTC73519 (1)a GGC AUA CAG GGA GCC AGG CA OsTC70631 (1)a GGC AUA CAG GGA GCC AGG CA Os17478(1) a GGC AUA CAG GGA GCC AGG CA Os02679(1) a GGC AUA CAG GGA GCCAGG CA VGLQGARH AGIQGARH AGIQGARH AGIQGARH AGIQGARH miR164 UG CAC GUG CCC UGC UUC UCC A Atlg56010(2) aG CAC GUaCCC UGC UUC UCC A EHVPCFSN At5g07680 (2) Uu uAC GUG CCC UGC UUCUCC A VYVPCFSN At5g61430(2) At3gl 5170(3) At5g53950(3) Os 00116(2) Uc uAC aG CAC aG CAC cG CAC GUG CCC UGC UUC UCC A GUG uCC UGu UUC UCC A GUG uCC UGu UUC UCC A GUG aCC UGC UUC UCC A VYVPCFSN EHVSCFSN EHVSCFST AHVTCFSN miR167 U AGA UCA UGC UGG CAG CUU CA (3) U AGA UCA gGC UGG CAG CUl gu LRSGWQLV At5g37020 OsTC79868 (3)U AGA UCA gGC UGG CAG CUU gu DRSGWQLV UCG GCA AGU miR169 At1g17590(3) aaG GgA AGU At1g54160 (3) aCG GgAAGU Os 04048(3) UaG GCA AcU Os09843(3) UaG GCA AuU CAU CCU UGG CUG CAU CCU UGG CUG CAU CCU UGG CUa CAU uCU UGG CUG CAU CCU UGG CUu 3' UTR 3' UTR 3' UTR 3' UTR G AUA UUG GCG CGG CUC AAU CA miR171 At2g45160 (0) G AUA UUG GCG CGG CUC AAU CA GILARLNH At3g60630(0) G AUA UUG GCG CGG CUC AAU CA GILARLNH At4g00150(0) G AUA UUG GCG CGG CUC AAU CA GILARLNQ OsTC76755 (0)G AUA UUG GCG CGG CUC AAU CA EILARLNQ OsTC81772 (0) G AUA UUG GCG CGG CUC AAU CA EILARLNH Os00711(0) G AUA UUG GCG CGG CUC AAU CA EILARLNQ Os 12185(0) G AUA UUG GCG CGG CUC AAU CA EILARLNQ OsTC75254(1) G AUA UUG GCGCGG CUC AAU uA EILARLNY For eachgene,the nucleotidesequenceof the miRNA complementary site is brokenintocodonscorresponding to the readingframe of the mRNA.The reversecomplementis shownfor eachmiRNA,andfor each complementary site,mismatchesare shownin lowercase. Thepeptidesequenceof the miRNA complementary site is shown. Oryzagenesare labeledeitherby their tentativeconsensus(TC)numbersfromthe TIGRrice gene index(version9.0)or by the genomiccontigof the mRNApredicted by GenomeScan. 54 Figure Legends Figure 1. Antisense Hits Between Arabidopsis miRNAs and Annotated mRNAs Annotated Arabidopsis mRNAs were searched for sites complementary to 16 Arabidopsis miRNAs with 0-4 mismatches (solid bars). Identical searches with cohorts of 16 randomized RNAs were also performed (open bars, mean values from 10 cohorts; error bars, one standard deviation). Note that two hits by similar miRNAs to the same complementary site within an mRNA were counted as separate hits (Table 1). Figure 2. Sequence Context of miRNA Complementary Sites (A) The four miR165 complementary sites. These complementary sites lie within the START domain present in a subfamily of HD-Zip transcription factors. The altered protein sequences of the reported phv and phb gain-of-function alleles are indicted (McConnell et al., 2001). Each of these lesions also disrupts the miR165 complementary site. Amino acids conserved in a majority of the proteins are highlighted. (B) The miR156 complementary sites. All ten predicted targets contain the Squamosa Promoter Binding (SBP) box, but the complementary sites are downstream of this conserved domain, within a poorly conserved protein-coding context or the 3'-UTR. Amino acids conserved in a majority of the proteins are highlighted. Figure 3. Models for the Biogenesis, Action, and Roles of miRNAs in Plants (A) Although plant miRNAs are apparently generated through the classical miRNA pathway (Reinhart et al., 2002), we propose that many act as classical siRNAs, pairing with near-perfect complementarity to their mRNA targets to specify mRNA cleavage. (B) Plant miRNAs might target transcription factor mRNAs for cleavage following cell divisions that require rapid implementation of new transcription factor programs. Following cell division, the daughter cells inherit transcription factor mRNAs from the precursor cell. At the onset of differentiation, one daughter might express not only new transcription factor mRNAs (green) but also miRNAs (red) complementary to mRNAs of key maternal transcription factors (blue). The miRNAs might direct the cleavage of the inherited transcription factor mRNA, preventing the inappropriate expression of the transcription factor protein, thus enabling the rapid differentiation of the daughter cell. 55 Rhoades et. al., Figure 1 100 80 Cn 60 4-. zcr E I 40 17 20 I' 3 0 Ai -- l O0 1 2 3 4 0 Stringency of pairing (# of mismatches) Rhoades et. al., Figure 2 A rn-rn 165 compleme ,/1miR REV ATHB-8 PHV PHB i 1 si i e 11 aa insertion in phb gain-of-function allele tN -j LG to D mutation in phv and phb gain-of-function alleles B m m - X miR156 complementary site AtSg50570 At5g50670 AtIg27370 At1g27360 SRTASLC ISRTASLC PDKGVGEC HGEDVGEY At5g43270 FSKEKVTI At3g57920 At2g42200 At1g69170 SSFTTCP IPEIMDTK ITEVSSIW LQPPLSLSQEA LQPPLSLSQEA HTPVAEPPPIF HVQPFSLLCSY DQPRRFTLDHH LQTPTNTWRPS NNNNNNNNNNN FPNTTFSITQP At2g33810 miR156 complementary site in 3'-UTR Atlg53160 miR156 complementary site in 3 -UTR Rhoades et. al., Figure 3 MicroRNA Pathway PTGS/RNAi Pathway miRNA precursor Long double-stranded RNA Many plant miRNA Cap .A Attenuated translation Cap I '- - AAA A Near-perfect complementarity in coding region or UTR Short segments of complementarity in 3 -UTR CapnAAA. siRNAs miRNAs AAA. A Precursor cell expressing transcription factor mRNAs Cap •_*MSN Cleaved mRNA AAA AA I! Daughter cells with distinct transcription factor profiles I',' S References Aida, M., Ishida, T., Fukaki, H., Fujisawa, H., and Tasaka, M. (1997). Genes involved in organ separation in Arabidopsis: an analysis of the cup-shaped cotyledon mutant. Plant Cell 9, 841857. The Arabidopsis Genome Initiative. (2000). Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796-815. Bernstein, E., Denli, A. M., and Hannon, G. J. (2001). The rest is silence. RNA 7, 1509-1521. Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO J. 17, 170-180. Bolle, C., Koncz, C., and Chua, N. H. (2000). PAT1, a new member of the GRAS family, is involved in phytochrome A signal transduction. Genes Dev. 14, 1269-78. Cui, Y., Brugiere, N., Jackman, L., Bi, Y.-M., and Rothstein, S. (1999). Structural and transcriptional comparative analysis of the S locus regions in two self-incompatible Brassica napus lines. Plant Cell 11, 2217-2231. Di Laurenzio, L., Wysocka-Diller, J., Malamy, J. E., Pysh, L., Helariutta, Y., Freshour, G., Hahn, M. G., Feldmann, K. A., and Benfey, P. N. (1996). The SCARECROW gene regulates an asymmetric cell division that is essential for generating the radial organization of the Arabidopsis root. Cell 86, 423-33. Dsouza, M., Larsen, N., and Overbeek, R. (1997). Searching for patterns in genomic data. Trends in Genetics 13, 497-498. 59 Elbashir, S., Martinez, J., Patkaniowska, A., Lendeckel, W., and Tuschl, T. (2001). Functional anatomy of siRNAs for mediating efficient RNAi in Drosophila melanogaster embryo lysate. EMBO J. 20, 6877-6888. Gocal, G. F., Sheldon, C. C., Gubler, F., Moritz, T., Bagnall, D. J., MacMillan, C. P., Li, S. F., Parish, R. W., Dennis, E. S., Weigel, D., and King, R. W. (2001). GAMYB-like genes, flowering, and gibberellin signalling in Arabidopsis. Plant Physiol. 127, 1682-1693. Grishok, A., Pasquinelli, A. E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D. L., Fire, A., Ruvkun, G., and Mello, C. C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34. Ha, I., Wightman, B., and Ruvkun, G. (1996). A bulged lin-4/lin-14 RNA duplex is sufficient for Caenorhabditis elegans lin-14 temporal gradient formation. Genes Dev. 10, 3041-3050. Hamilton, A. J., and Baulcombe, D. C. (1999). A novel species of small antisense RNA in posttranscriptional gene silencing. Science 286, 950-952. Helariutta, Y., Fukaki, H., Wysocka-Diller, J., Nakajima, K., Jung, J., Sena, G., Hauser, M. T., and Benfey, P. N. (2000). The SHORT-ROOT gene controls radial patterning of the Arabidopsis root through radial signaling. Cell 101, 555-67. Hutvdgner, G., McLachlan, J., Pasquinelli, A. E., Balint, E., Tuschl, T., and Zamore, P. D. (2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293, 834-838. Hutvdgner, G., and Zamore, P. D. (2002). RNAi: nature abhors a double-strand. Curr. Opin. Genet. Dev. 12, 225-232. 60 Jacobsen, S. E., Running, M. P., and Meyerowitz, E. M. (1999). Disruption of an RNA helicase/RNAseIII gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126, 5231-5243. Ketting, R. F., Fischer, S. E. J., Bernstein, E., Sijen, T., Hannon, G. J., and Plasterk, R. H. A. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev. 15, 2654-2659. Kiss, T. (2002). Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell 109, 145-148. Klein, J., Saedler, H., and Huijser, P. (1996). A new family of DNA binding proteins includes putative transcriptional regulators of the Antirrhinum majus floral meristem identity gene SQUAMOSA. Mol. Gen. Genet. 250, 7-16. Knight, S., and Bass, B. (2001). A Role for the RNase III Enzyme DCR-1 in RNA Interference and Germ Line Development in Caenorhabditis elegans. Science 293, 2269-2271. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858. Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. (2002). Identification of tissue-specific microRNAs from mouse. Curr. Biol. 12, 735-739. Lai, E. C. (2002). MicroRNAs are complementary to 3'UTR motifs that mediate negative posttranscriptional regulation. Nat. Gen. 30, 363-364. Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. 61 Lee, R. C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864. Lee, R. C., Feinbaum, R. L., and Ambros, V. (1993). The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-854. Liscum, E., and Reed, J. W. (2002). Genetics of Aux/IAA and ARF action in plant growth and development. Plant Mol. Biol. 49, 387-400. Llave, C., Kasschau, K., Rector, M., and Carrington, J. (2002). Endogenous and silencingassociated small RNAs in plants. Plant Cell 14, 1-15. Matzke, M. A., Matzke, A. J., Pruss, G. J., and Vance, V. B. (2001). RNA-based silencing strategies in plants. Curr. Opin. Genet. Dev. 11, 221-227. McConnell, J. R., Emery, J., Eshed, Y., Bao, N., Bowman, J., and Barton, M. K. (2001). Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411, 709713. McManus, M. T., Petersen, C. P., Haines, B. B., Chen, J., and Sharp, P. A. (2002). Gene silencing using micro-RNA designed hairpins. RNA 8, 842-850. Moss, E., Lee, R., and Ambros, V. (1997). The cold shock domain protein LIN-28 controls developmental timing in C. elegans and is regulated by the lin-4 RNA. Cell 88, 637-646. Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev. 16, 720-728. 62 Olsen, P. H., and Ambros, V. (1999). The lin-4 regulatory RNA controls developmental timing in Caenorhabditis elegans by blocking LIN- 14 protein synthesis after the initiation of translation. Dev. Biol. 216, 671-680. Peng, J., Carol, P., Richards, D. E., King, K. E., Cowling, R. J., Murphy, G. P., and Harberd, N. P. (1997). The Arabidopsis GAI gene defines a signaling pathway that negatively regulates gibberellin responses. Genes Dev. 11, 3194-205. Ray, A., Lang, J. D., Golden, T., and Ray, S. (1996). SHORT INTEGUMENT (SIN1), a gene required for ovule development in Arabidopsis, also controls flowering time. Development 122, 2631-2638. Ray, S., Golden, T., and Ray, A. (1996). Maternal effects of the short integument mutation on embryo development. Dev. Biol. 180, 365-369. Reinhart, B., Weinstein, E., Rhoades, M., Bartel, B., and Bartel, D. (2002). MicroRNAs in plants. Genes Dev. 16, 1616-1626. Reinhart, B. J., Slack, F. J., Basson, M., Bettinger, J. C., Pasquinelli, A. E., Rougvie, A. E., Horvitz, H. R., and Ruvkun, G. (2000). The 21 nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403, 901-906. Riechmann, J. L., Heard, J., Martin, G., Reuber, L., Jiang, C.-Z., Keddie, J., Adam, L., Pineda, O., Ratcliffe, O. J., Samaha, R. R., Creelman, R., Pilgrim, M., Broun, P., Zhang, J. Z., Ghandehari, D., Sherman, B. K., and Yu, G.-L. (2000). Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290, 2105-2110. Robinson-Beers, K., Pruitt, R. E., and Gasser, C. S. (1992). Ovule development in wild-type Arabidopsis and two female-sterile mutants. Plant Cell 4, 1237-1249. 63 Rogg, L. E., and Bartel, B. (2001). Auxin signaling: derepression through regulated proteolysis. Dev. Cell 1, 595-604. Silverstone, A. L., Ciampaglio, C. N., and Sun, T. (1998). The Arabidopsis RGA gene encodes a transcriptional regulator repressing the gibberellin signal transduction pathway. Plant Cell 10, 155-69. Slack, F. J., Basson, M., Liu, Z., Ambros, V., Horvitz, H. R., and Ruvkun, G. (2000). The lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol. Cell 5, 659-669. Ulmasov, T., Hagen, G., and Guilfoyle, T. (1999). Dimerization and DNA binding of auxin response factors. Plant J. 19, 309-319. Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75, 855-862. Yeh, R., Lim, L., and Burge, C. (2001). Computational inference of homologous gene structures in the human genome. Genome Res. 11, 803-16. Yu, J., Hu, S., Wang, J., Wong, G. K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al. (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79-92. 64 Computational identification of plant microRNAs and their targets, including a stress-induced miRNA Matthew W. Jones-Rhoades 1 and David P. Bartel l 2 1Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142 2Correspondence: dbartel@wi.mit.edu Running title: Plant microRNAs Key words: computational gene prediction, microRNA regulatory targets, noncoding RNAs 65 Summary MicroRNAs (miRNAs) are -21-nucleotide RNAs, some of which have been shown to play important gene-regulatory roles during plant development. We developed comparative genomic approaches to systematically identify both miRNAs and their targets that are conserved in Arabidopsis thaliana and rice (Oryza sativa). Twenty-three miRNA candidates, representing seven newly identified gene families, were experimentally validated in Arabidopsis, bringing the total number of reported miRNA genes to 92, representing 22 families. Nineteen newly identified target candidates were confirmed by detecting mRNA fragments diagnostic of miRNA-directed cleavage in plants. Overall, plant miRNAs have a strong propensity to target genes controlling development, particularly those of transcription factors and F-box proteins. However, plant miRNAs have conserved regulatory functions extending beyond development, in that they also target superoxide dismutases, laccases, and ATP sulfurylases. The expression of miR395, the sulfurylase-targeting miRNA, increases upon sulfate starvation, showing that miRNAs can be induced by environmental stress. Introduction MicroRNAs are endogenous 20- to 24-nucleotide RNAs, some of which are known to play important post-transcriptional regulatory roles in plants and animals (Bartel and Bartel, 2003; Lai, 2003; Bartel, 2004). MicroRNAs are initially transcribed as much longer RNAs that contain imperfect hairpins, from which the mature miRNA is excised by Dicer-like enzymes (Grishok et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001; Lee et al., 2002; Park et al., 2002; Reinhart et al., 2002; Lee et al., 2003). The mature miRNA derives from the double-stranded portion of the hairpin and is initially excised as a duplex comprising two -22-nt RNAs, one of which is the mature miRNA while the other, known as the miRNA*, comes from the opposite arm of the hairpin (Lau et al., 2001; Reinhart et al., 2002; Khvorova et al., 2003; Lim et al., 2003b; Schwarz et al., 2003). The miRNA of this miRNA:miRNA* duplex is preferentially loaded into the RNA-induced silencing complex (RISC(Hammond et al., 2000)), where it functions as a guide RNA to direct the posttranscriptional repression of mRNA targets, while the miRNA* is degraded (Hutvagner and Zamore, 2002; Mourelatos et al., 2002; Khvorova et al., 2003; Schwarz et al., 2003). 66 The primary method of identifying miRNA genes has been to isolate, reverse transcribe, clone, and sequence small cellular RNAs (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001; Llave et al., 2002a; Park et al., 2002; Reinhart et al., 2002). However, molecular cloning is biased towards finding miRNAs that are relatively abundant. In animals, miRNA gene discovery by molecular cloning has been supplemented by systematic computational approaches that identify evolutionarily conserved miRNA genes by searching for patterns of sequence and secondary structure conservation that are characteristic of metazoan miRNA hairpin precursors (Ambros et al., 2003; Grad et al., 2003; Lai et al., 2003; Lim et al., 2003a; Lim et al., 2003b). The most sensitive of these methods indicate that miRNAs constitute nearly 1% of all predicted genes in nematodes, flies, and mammals (Lai et al., 2003; Lim et al., 2003a; Lim et al., 2003b). Methods developed in one animal lineage work well when extended to another animal lineage (Lim et al., 2003a), but cannot be directly applied to plants because the hairpins of plant miRNAs are more heterogeneous than those of animal miRNAs (Reinhart et al., 2002). Because the miRNAs recognize their regulatory targets through base pairing, computational methods have been invaluable for identifying these targets. The extensive complementarity between plant miRNAs and mRNAs makes systematic target identification easier in plants than in animals (Rhoades et al., 2002). A search for targets of 13 Arabidopsis miRNA families predicted 49 unique targets, with a signal-to-noise ratio exceeding 10:1, simply by looking for Arabidopsis messages with three or fewer mismatches (Rhoades et al., 2002). Evolutionary conservation of the miRNA:mRNA pairing in rice (Rhoades et al., 2002), together with experimental evidence showing that these miRNAs direct cleavage of their predicted mRNA targets (Llave et al., 2002b; Kasschau et al., 2003; Palatnik et al., 2003; Tang et al., 2003; Mallory et al., 2004; Vazquez et al., 2004) supports the validity of these predictions. Because metazoan miRNAs only rarely recognize their targets with such extensive complementarity (Yekta et al., 2004), more sophisticated methods that search for short segments of conserved complementarity to the miRNAs are required to identify metazoan miRNA targets (Enright et al., 2003; Lewis et al., 2003; Stark et al., 2003). The previously identified plant miRNAs have a remarkable propensity to target genes involved in development, particularly those of transcription factors (Rhoades et al., 2002). In all cases where disruption of plant miRNA regulation has been reported, striking developmental abnormalities are observed. Dominant gain-of-function mutations in HD-ZIP transcription factor 67 genes PHABULOSA, PHAVULOTA, and REVOLUTA that destabilize pairing to miR165/miR166 cause loss of adaxial/abaxial polarity in developing leaves (McConnell et al., 2001; Rhoades et al., 2002; Emery et al., 2003; Kidner and Martienssen, 2003). In maize, similar mutations in the HD-ZIP gene ROLLED LEAF1 also cause adaxilization of the abaxial surface of leaves, indicating that the miR165/miR166 family has a conserved role in determining leaf polarity despite the morphological differences between Arabidopsis and maize leaves (Juarez et al., 2004). Transgenic plants with silent mutations in the miR-JAW complementary sites of TCP transcription factors arrest as seedlings with fused cotyledons and lack shoot apical meristems, while those with mutations in the miR159 complementary site of MYB33 have upwardly curled leaves(Palatnik et al., 2003). Plants deficient in miR172-mediated regulation of APETALA2 have altered patterns of floral organ development (Chen, 2004). Plants deficient in miR164mediated regulation of CUP-SHAPED COTYLEDON1 have altered patterns of embryonic, vegetative, and floral development (Mallory et al., 2004). Finally, silent mutations in the miR168 complementary site of ARGONAUTE] lead to misregulation of miRNA targets and numerous developmental defects (Vaucheret et al., 2004). To gain a more complete understanding of plant miRNAs and their regulatory targets, we devised a computational procedure to identify conserved miRNA genes that were missed in previous cloning efforts, and we refined our computational method for identifying mRNA targets to increase its sensitivity. Using criteria that retain all 11 of the previously identified miRNA gene families conserved between Arabidopsis thaliana and Oryza sativa, we found 13 additional families of candidates. Molecular evidence showed that at least seven of these newly identified families of candidate miRNAs are authentic, and that at least six out of the seven mediate the cleavage of their predicted mRNA targets. These seven newly identified families are represented by 23 loci. When these are added to those identified by cloning, we count 92 miRNA loci in the Arabidopsis genome. Our updated analysis of the plant miRNA targets indicates a continued very strong overall bias toward transcription factors and genes involved in development. Some targets of the newly identified miRNAs, such as F-box proteins and GRL transcription factors, represent genes with demonstrated or probable roles in controlling developmental processes. Nonetheless, other newly identified miRNA targets, such as ATP sulfurylases, laccases, and superoxide dismutases, show that the range of functionalities regulated by miRNAs is broader than previously known. Furthermore, the expression of miR395, which targets genes involved in 68 sulfate assimilation, is responsive to the sulfate concentration of the growth media, demonstrating that miRNA expression can be modulated by levels of external metabolites. Results Identification of 20mers in conserved miRNA-like hairpins Our computational approach to identify plant miRNAs was based upon six characteristics that describe previously known plant miRNAs. 1) The base pairing of the mature miRNA to its miRNA* within the hairpin precursors is relatively consistent. In contrast, both the size of the foldback and the extent of base pairing outside of the immediate vicinity of the miRNA are highly variable among the hairpins of plant miRNAs, even among those of miRNAs from the same gene family. 2) The majority of known Arabidopsis miRNAs have identifiable homologs in the Oryza sativa genome, in which the predicted mature Oryza miRNAs have 0-2 base substitutions relative to their Arabidopsis homologs. 3) The secondary structures of known miRNA hairpins are robustly predicted by RNAfold if given a sequence sufficiently long to contain both the miRNA and the miRNA*. 4) The sequences of the Arabidopsis and Oryza hairpins are generally more conserved in the miRNA and miRNA* than in the segment joining the miRNA and miRNA*. 5) All matches to known miRNAs in the Arabidopsis genome, with the exception of those antisense to coding regions, have potential miRNA-like hairpins and are thus annotated as miRNA genes. 6) Most known Arabidopsis miRNAs are highly complementary to target mRNAs, and this complementarity is conserved to Oryza. As the first step to identifying miRNAs in the genomes of Arabidopsis thaliana and Oryza sativa, we considered only those genomic portions contained in imperfect inverted repeats as defined by EINVERTED (Figure la, step 1). Within these 133,864 Arabidopsis and 410,167 Oryza inverted repeats were 73 of 86 reference set loci corresponding to the 24 previously reported miRNAs (refsetl, Table Si1). Secondary structures for the inverted repeats were predicted with RNAfold, and all 20mers within the inverted repeats were checked against MIRcheck, an algorithm written to identify 20mers with the potential to encode miRNAs (Figure la, step 2). MIRcheck takes as input a) the sequence of a putative miRNA hairpin, b) a secondary structure of the putative hairpin, and c) a 20mer sequence within the hairpin to be considered as a potential miRNA. MIRcheck takes into account the total number of unpaired nucleotides (no more than 4 in the putative miRNA), the number of bulged or asymmetrically 69 unpaired nucleotides (no more than 1 in the putative miRNA), the number of consecutive unpaired nucleotides (no more than 2 in the putative miRNA) and the length of the hairpin (at least 60 nucleotides inclusive of the putative miRNA and miRNA*). In contrast to the algorithms designed to identify metazoan miRNAs, MIRcheck has no requirements pertaining to the pattern or extent of base pairing in other parts of the predicted secondary structure. Even though these parameters were chosen to be relatively stringent, only 7 of the 73 remaining Arabidopsis and Oryza refsetl loci were lost at this step. After removal of 20mers that overlap with repetitive elements, or which have highly biased sequence compositions, 389,648 Arabidopsis 20mers (AtSetl) and 1,721,759 Oryza 20mers (OsSetl) had at least 1 locus that passed MIRcheck. We used Patscan to identify 20mers in AtSetl that matched at least one 20mer in OsSetl with 0-2 base substitutions, considering only 20mers on the same arm of their putative hairpins (Figure la, step 3). 3,851 Arabidopsis 20mers had at least 1 Oryza match (AtSet2), and 5,438 Oryza 20mers were matched at least once (OsSet2). For the previously known plant miRNAs, RNAfold predicts a secondary structure in which the miRNA is paired to the miRNA*, provided that the flanking sequence is sufficiently long to contain the miRNA*. The presence of additional flanking sequence does not interfere with the prediction of a miRNA-like secondary structure. This robustly predicted folding is observed for all of the loci of each cloned miRNA, even though they have widely divergent flanking sequences. While recognizing that the predicted folds are unlikely to be correct in all their details, it is reasonable to propose that the overall robustness of the predicted folding might reflect an evolutionary optimization for defined folding in the plant. To eliminate candidates that do not fold as robustly as the previously known miRNAs, we required AtSet2 and OsSet2 20mers to pass MIRcheck a second time after being computationally folded in the context of sequences flanking the hairpin. Patscan was used to find all matches of AtSet2 and OsSet2 to their respective genomes, RNAfold was used to predict the secondary structure of each match in the context of a 500 nt genomic sequence centered on the 20mer, and each match was evaluated by MIRcheck (Figure la, step 4). 2,588 Arabidopsis 20mers (AtSet3) and 3,083 Oryza 20mers (OsSet3) had at least one locus that passed MIRcheck. Because EINVERTED misses some hairpins and because this second MIRcheck evaluation used more relaxed cutoffs (up to 6 70 unpaired nt each in the putative miRNA and miRNA*), this step also recovered paralogs that were missed in steps 1 or 2. 'The genomic matches to known Arabidopsis miRNAs are all either in hairpins or antisense to coding regions. To ensure that computationally identified miRNAs met this criterion, Arabidopsis 20mers were removed from the analysis if less than 50% of intergenic matches passed MIRcheck, or if more than 50% of genomic matches overlapped with repetitive sequence elements (Figure la, step 5), resulting in 2,506 20mers (AtSet4). Because gene annotation in Oryza is poor, we could not reliably define matches as genic or intergenic. The 2,780 Oryza 20mers that had at least 1 locus pass MIRcheck and had no more than 50% of genomic matches in repetitive sequence elements were included in OsSet4. The next step in our analysis was to identify pairs of Arabidopsis and Oryza hairpins that have miRNA-like patterns of sequence conservation (Figure la, step 6). MicroRNA precursors are generally most conserved in the miRNA:mRNA* portion of the hairpin, a characteristic that has been used to help identify insect miRNA genes (Lai et al., 2003). In our procedure, we retained homologous pairs for which both the miRNA and miRNA* 20mers were more conserved than any 20mer from the loop regions. Doing pairwise comparisons of the hairpins of AtSet4 against those of OsSet4 resulted in 1,145 20mers (AtSet5) with at least 1 acceptable Oryza homolog. AtSet5 was mapped to the Arabidopsis genome, and overlapping 20mers were joined together to form 379 sequences with miRNA encoding potential. A single miRNA gene could be represented by up to four of these potential miRNA sequences, representing the miRNA, the miRNA*, the antisense miRNA, and the antisense miRNA*. After accounting for multiple potential miRNAs mapping to a single locus, the 379 potential miRNAs represented 228 potential miRNA loci. These 228 loci were grouped into 118 families of potential miRNA loci based on sequence similarity as determermined by blastn. Many of these newly identified miRNA candidates had patterns of secondary structure conservation resembling those of previously known plant miRNAs (Figure lb,c). For many of the miRNA loci corresponding to previously reported miRNAs, the computationally identified sequences extended 1-9 nt on either side of the cloned miRNAs, although in a few cases the actual miRNA overlapped with but extended beyond the predicted sequence. 71 A refined procedure for predicting miRNA targets We previously identified mRNAs containing ungapped, antisense matches to miRNAs with 0-3 mispairs (counting G:U pairs as mispairs) as probable miRNA targets (Rhoades et al., 2002). Although the majority of validated plant miRNA targets are captured by this cutoff, there are several authentic targets which are missed. For example, miR162 has a bulged nucleotide as it basepairs to the mRNA of DCL1, and miR-JAW has 4-5 mispairs to the mRNAs of several TCP transcription factors (Palatnik et al., 2003; Xie et al., 2003). In order to more thoroughly assess the mRNA targeting potential of both known and predicted miRNAs, we developed a more sensitive computational approach to identify target candidates. It allows for gaps and more mismatches in the mRNA:miRNA duplex but requires that the miRNA complementarity be conserved between homologous Arabidopsis and Oryza mRNAs. Each miRNA complementary site was scored, with perfect matches given a score of 0, and points were added for each G:U wobble (0.5 points), each non-G:U mismatch (1 point) and each bulged nucleotide in the miRNA or target strand (2 points). To allow the same cutoffs to be applied more evenly to miRNAs of different lengths and to avoid penalizing mismatches at the ends of longer miRNAs, those miRNAs that were longer than 20 nt were broken into overlapping 20mers, with the mRNA:miRNA pair receiving the score of the most favorable 20mer. This scoring was tested using a set of 10 unrelated miRNAs that are highly conserved (0-1 substitutions) between Arabidopsis and Oryza (refset2, Table S1). As a control, we generated 5 cohorts of permuted miRNAs, in which each permuted miRNA has the same dinucleotide composition as the corresponding miRNA in refset2. For all 20mers from the sets of real and permuted miRNAs we searched for complementary sites in Arabidopsis and Oryza mRNAs. Compared to their shuffled cohorts, the real miRNAs had many more complementary Arabidopsis mRNAs with scores < 2 (Figure 2a), which was in agreement with our previous results (Rhoades et al., 2002). Filtering the miRNA-complementary mRNAs to include only those conserved to Oryza showed that nearly all the complementary sites to authentic miRNAs with scores of < 2 are conserved (Figure 2b). For the permuted miRNAs, requiring conservation reduced to nearly zero the number of complementary sites with scores of 2-3.5, whereas for the authentic miRNAs a small but significant number of sites scoring in this range were conserved (Figure 2b). Thus, adding a requirement for conservation raised the threshold at which spurious matches were found, thereby enabling confident prediction of targets that were less extensively 72 paired to the miRNAs - in some cases forming Watson-Crick pairs to only 15 of 20 miRNA nucleotides. Each of the conserved miRNAs had at least one predicted target with score < 3.0, suggesting that the possession of predicted targets could be a criterion for screening the newly identified miRNA candidates. For each 20mer in AtSet5 and OsSetS, miRNA complementary sites were found and scored (Figure la, step 7). As would be expected even for permuted sequences, nearly all of the AtSet4 20mers (1,124 out of 1,145) had a complementary score of 3.0 to at least 1 Arabidopsis mRNA. Of these, 278 20mers (AtSet6) had at least one homologous Oryza 20mer with complementarity to a homologous Oryza mRNA. AtSet6 represented 24 families of potential miRNAs, which account for 100 potential miRNA loci. Eleven of these families, represented by 60 loci (including 41 refsetl loci), corresponded to all previously known miRNA families with identifiable Oryza homologs, suggesting that our method also identified most of the previously unknown families that have extensive conserved complementarity in Oryza. Newly identified miRNAs are expressed Our computational screen identified 13 previously unreported families of conserved miRNA candidates with conserved complementarity to mRNAs. To determine which of these putative miRNAs are expressed, we used a PCR based assay (Lim et al., 2003a; Lim et al., 2003b) to search for the predicted miRNAs in a library of small cDNAs (Reinhart et al., 2002). In addition to verifying the expression of the miRNAs, this assay maps the 5' ends of the miRNAs (Table 2). Each PCR reaction used one common primer corresponding to the adaptor oligo attached to the 5' end of all members of the library and one primer specific the 3' portion of the predicted miRNA. For seven miRNA families, PCR reactions resulted in products in which the specific primer was extended by at least 3 nucleotides that matched the predicted miRNA sequence. In sum, the seven newly identified miRNA families comprised 23 genomic loci in Arabidopsis (Table 2). All clones for families 393, 396, 397, and 398 had the same 5' end, while for families 394, 395, and 399 miRNAs were detected with differing 5' ends that could result from inconsistent processing of precursors transcripts from a single locus, or from differential processing of precursors from different loci. Several of these miRNA families include loci that 73 would encode distinct but highly similar miRNAs (Table 2). Because the PCR primers overlapped with the residues that differ, it is not possible to know which variants were detected. ;Sixfamilies of putative miRNAs passed all computational checks but were not validated by the IPCRassay. Five of these families had a single locus in Arabidopsis, whereas the sixth had 14 Arabidopsis loci and 52 Oryza loci and likely represented a repetitive element not identified by RepeatMasker. Although the possibility that some of these non-validated predicted candidates are authentic cannot be ruled out, we consider it unlikely that they represent miRNA sequences. The expression of newly identified miRNAs was also tested by Northern blot analysis. Hybridization probes were designed for representative members of the 7 miRNA families detected by the PCR assay. Probes complementary to miR393, miR394, miR396a, miR398b detected 20-21 nt RNAs in samples from wild-type, soil-grown Columbia plants (Figure 3a), whereas probes complementary to miR395a, miR397b, and miR399b did not detect expressed small RNAs in these samples. These miRNAs that are difficult to detect on a Northern blot are likely to be expressed only at low levels or only in a subset of tissues or growth conditions. Because miR395 is complementary to mRNAs of ATP sulfurylase (APS) proteins (Figure 5), amdbecause the expression levels of numerous sulfate metabolizing genes are responsive to sulfate levels (Takahashi et al., 1997; Lappartient et al., 1999; Maruyama-Nakashita et al., 2003), we hypothesized that the expression of miR395 might be dependent on cellular sulfate levels. To test this, we probed RNA samples from plants grown in modified MS media containing various amount of sulfate. As seen for plants grown in soil, miR395 was not detected in the samples from plants grown in 2 mM SO42-. However, miR395 was readily detected in the samples grown in very low sulfate (Figure 3b, 0.2 or 0.02 mM SO4 2 ). Induction of miR395 by low external sulfate concentrations is somewhat reminiscent of the starvation-associated miR-234 increase that has been observed in nematodes (Lim et al., 2003b), although the miR395 induction (greater than 100 fold) is much more striking than that of miR-234 (twofold). We examined whether APS1 expression changed in the conditions that induced miR395, and found that its expression decreased when miR395 increased, as would be expected if APS1 was a cleavage target of miR395 (Figure 3c). 74 Experimental verification of miRNA targets MicroRNAs, like small interfering RNAs (siRNAs (Elbashir et al., 2001)), can direct the cleavage of their mRNA targets when these messages have extensive complementarity to the miRNAs (Hutvagner and Zamore, 2002; Llave et al., 2002b; Tang et al., 2003; Yekta et al., 2004). This miRNA-directed cleavage can be detected by using a modified form of 5'-RACE (rapid amplification of cDNA ends) because the 3' product of the cleavage has two diagnostic properties: 1) a 5' terminal phosphate, making it a suitable substrate for ligation to an RNA adaptor using T4 RNA ligase, and 2) a 5' terminus that maps precisely to the nucleotide that pairs with the tenth nucleotide of the miRNA (Llave et al., 2002b; Kasschau et al., 2003). To examine whether any of the newly identified miRNAs can direct cleavage of their predicted targets in vivo, we isolated RNA from vegetative and floral tissues and performed the 5'-RACE procedure using primers specific to the predicted targets. For 19 predicted targets the 5'-RACE PCR yielded a distinct band of the predicted size on an agarose gel, which was isolated, cloned and sequenced. In all 19 cases the most common 5' end of the mRNA fragment mapped to the nucleotide that pairs to the tenth nucleotide of one of the miRNAs validated by PCR (Figure 4), indicating cleavage at sites precisely analogous to those seen for other miRNA targets (Llave et al., 2002b; Aukerman and Sakai, 2003; Kasschau et al., 2003; Palatnik et al., 2003; Xie et al., 2003; Vazquez et al., 2004)(Mallory et al., 2004), as well as for RNAs complementary to siRNAs and metazoan miRNAs (Elbashir et al., 2001; Hutvagner and Zamore, 2002; Yekta et al., 2004). These observations also corroborate the 5' ends of the miRNAs as mapped by PCR ('Table 2). Identification of miRNA paralogs Our computational approach found 81 miRNA loci from 18 miRNA families (Table 1, Table S2). We searched for additional members of these families by searching the Arabidopsis genome for near matches (0-3) to the miRNAs of these 81 loci (Figure la, step 9). After manual inspection for potential hairpin-like secondary structures, this identified six additional loci in miRNA families that are conserved to Oryza. Together with the five loci in miRNA families without apparent Oryza homologs, this brings to 92 the total number of Arabidopsis loci that meet the criteria for designation as miRNA genes (Ambros et al., 2003) (Table S2). As is generally the case with computational gene prediction, some of these might be pseudogenes. 75 Our de-novo miRNA-finding algorithm found 88% of these, and 93% of those with Oryza homologs. These Arabidopsis genes correspond to 122 Oryza miRNA genes, of which 111 (91%) were found de-novo by our algorithm (Figure la, step 9; Table S3). As has been previously observed for numerous animal miRNAs (Lagos-Quintana et al., 2001; Lau et al., 2001;), we find that some plant miRNA genes are clustered in the genome, most strikingly the genes of the 395 family. In Arabidopsis, miRNAs of the 395 family are located in two clusters, each containing three hairpins within 4 kb (Figure Sla). In each cluster, two MIR395 hairpins are on one strand while the third is on the opposite. Thus each cluster could not be expressed as a single primary transcript, but could be expressed as two transcripts sharing common regulatory elements. The Oryza MIR395 hairpins are also clustered, but with a different arrangement than in Arabidopsis. The two largest Oryza MIR395 clusters contain seven and six hairpins, respectively, within 1 kb, with all hairpins encoded on the same strand of DNA (Figure Slb). These clusters are likely expressed as transcripts containing multiple miRNAs, an idea supported by Oryza EST CA764701, which contains four miR395 hairpins. Prediction of conserved miRNA targets Having refined our computational method to more sensitively predict plant miRNA targets, we applied it to the prediction of conserved mRNA targets of all known Arabidopsis and Oryza miRNAs (Figure 5). Control experiments with refset2 and 5 sets of permuted miRNAs suggested that a score cutoff of < 3.5 was appropriate to identify conserved miRNA targets with high sensitivity and selectivity. However, when searching for targets of the entire set of miNRAs, this cutoff identified a number of mRNAs for which miRNA mediated cleavage products could not be found by 5'-RACE. Thus, a cutoff of < 3 was chosen to minimize the number of non-authentic targets. All previously validated targets miRNA targets are identified at this level of sensitivity, although several newly validated targets have scores of 3.5 in one or both species and are not retained using this cutoff. Thus, there is still a threshold at which it is difficult to distinguish authentic targets from potentially spurious complementarity without experimental verification. Nonetheless, a score of <3.0 in our refined method identifies targets with very high confidence (Figure 2). Plant miRNAs are deeply conserved 76 MicroRNAs conserved between the dicot Arabidopsis thaliana and the monocot Oryza sativa are likely to be found in most flowering plants. Homologs of miR-JAW and miR-JAW complemenary sites have been found in ESTs from numerous angiosperms (Palatnik et al., 2003). To look for evidence of other miRNAs in additional plant species, we searched for ESTs representing potential homologs of Arabidopsis and Oryza miRNAs, defined here as having ;19/20nt matches and a predicted foldback that passes MIRcheck. This search identified 187 putative miRNA homologs in the ESTs (Table S4). A large majority of these appear to be authentic, in that the 10 miRNAs in refset2 each had on average 9.7 EST matches that passed MIRcheck, whereas the set of 50 permuted miRNAs averaged only 0.04 matches that passed MIRcheck. For all 18 miRNA families that are conserved between Arabidopsis and Oryza, potential miRNA precursors were found in at least one additional angiosperm species (Table S4). For miRNAs that are not conserved between Arabidopsis and Oryza, no homologous miRNAs in additional species were identified, suggesting that the lack of conservation in Oryza is a consequence of recent emergence rather than loss in the Oryza lineage. We also searched for matches to experimentally confirmed miRNA complementary sites in ESTs encoding proteins homologous to Arabidopsis targets (blastx score >10-6). For all miRNA families with validated miRNA targets, conserved miRNA complementary sites (19/20 nt matches) were found in at least one additional angiosperm (Table S5). On average, the miRNA complementary sites from 17 unrelated Arabidopsis miRNA targets were each conserved in 191 homologous ESTs, representing 14 species. This is far more than would be expected by chance; when repeating the analysis using 170 sites chosen at random from the same Arabidopsis mRNAs, the average number of ESTs and species were 2.6 and 0.5, respectively. MicroRNAs of the 166 family, as well as their binding sites in mRNAs of HD-ZIP proteins, predate the emergence of seed plants (Floyd and Bowman, 2004). We found nine miRNA families (156, 160, 166, 167, 393, 395, 396, 397 and 398) that had complementary sites conserved in gymnosperms, while a miR171 complementary site was conserved in a SCL mRNA from a fern (Ceratopteris richarii). In addition, a potential miRNA hairpin of the 159/JAW family was present in an EST from moss (Physomitrella patens). These data suggest that multiple miRNAs have deep origins in plant phylogeny. Discussion 77 The scope of miRNAs conserved between dicots and monocots A combination of computational prediction and experimental verification identified seven families of sequences that had not previously been identified as miRNAs. A set of 2088 small RNAs from Arabidopsis was recently reported (Xie et al., 2004) (http://gac.bcc.orst.edu/smallRNA/). Sequences corresponding to miR397a, miR398b and miR399b were contained in this dataset, each having been cloned a single time, although none were annotated as miRNAs. The cloning of miR397 and miR399, which were not detected by Northern blot, corroborates their expression as determined by PCR. Families 393, 394, 395 and 396 are absent from the reported sets of cloned, sequenced small RNAs. These are each detectable by Northern analysis, and as with families 397, 398 and 399 were detected by PCR in our library of small cDNAs used for cloning. Therefore they would have been found eventually by sequencing enough small cDNAs. However, given that other miRNAs have been cloned hundreds of times (Xie et al., 2004), it seems that all seven newly identified miRNA families are relatively rare in the tissues and growth conditions from which small RNAs have been cloned. They may represent miRNAs that are needed at low levels, or whose expression is limited to rare cell types or particular growth conditions. The expression of miR395 is greatly increased by sulfate starvation; other miRNAs with seemingly low expression may also be inducible by metabolite levels or environmental stimuli. It is the identification of these difficult to clone but potentially important miRNAs that makes computational prediction a useful complement to cloning of small RNAs. The sensitivity of our computational approach, which found all 11 conserved miRNA families previously identified through cloning, suggests that most plant miRNAs with properties similar to previously cloned miRNAs have been identified. MicroRNA genes not found by our analysis are likely to fall into several categories. One set will be those without apparent conservation to Oryza. This describes four families of currently known Arabidopsis miRNAs (158, 161, 163, and 173). It is difficult to estimate how many additional non-conserved miRNA families exist in either species, but the observation that most of the cloned plant miRNAs have readily identified Oryza homologs indicates either that there are no more than a handful of non-conserved miRNAs remaining to be identified or that non-conserved miRNAs are disproportionally poorly expressed in plants. 78 Another set of false negatives will be miRNA families that are conserved between Arabidopsis and Oryza but were missed by our analysis. Most steps in our analysis have the potential to lose authentic miRNA genes. The parameters and cutoffs we used were chosen to be slightly more relaxed than what was needed to retain most loci corresponding to the 11 previously-known miRNAs families with Oryza homologs in refsetl. They found at least one member of each family and 92% (59/64) of all loci in these families. A similar percentage of loci, 96%, were correctly identified for newly discovered miRNA families (22/23), suggesting that our parameters are not over-fitted. Relaxing the parameters of MIRcheck (Figure 1, steps 2 & 4) to allow up to two asymmetric bulges, shorter hairpins (as short as 54 nt), and an additional mismatch did not identify any additional verifiable miRNAs (data not shown). Nonetheless, the low number of previously identified Arabidopsis miRNA gene families (15) precluded splitting the miRNAs into a training set and test set, as was done in our metazoan analysis to evaluate the degree of overtraining and enable firm estimates of the number of genes remaining to be identified (Lim et al., 2003b). MicroRNA families with few members would be more prone to being missed. For example, MIR393 and MIR394 each have only one identified locus in Oryza; either would have been missed if their Oryza locus had been among the fraction of authentic miRNA loci not identified as an inverted repeat or that did not pass MIRcheck, whereas miRNAs that were members of larger gene families that have multiple Oryza homologs were identified even though some Oryza homologs were missed. The observation that some miRNA primary transcripts are spliced (Aukerman and Sakai, 2003) raises the possibility that some miRNA transcripts might have an intron within the hairpin precursor, which could prevent their identification in our analysis of genomic DNA. Furthermore, any unknown miRNA family that systematically had a pattern of base pairing that failed MIRcheck would also have been lost, but there is no reason to suspect that this was a widespread problem. More significant uncertainty in plant miRNA gene number arises from the 94 families of candidate miRNAs that had conserved miRNA-like hairpins but lacked extensive and conserved complementarity to mRNAs. Some of these candidates may be authentic miRNAs with different modes of target recognition. For example, any plant miRNA that recognizes all its target mRNAs in a manner similar to that of most animal miRNAs, that is, by recognizing its targets predominantly through "seed matches" (Lewis et al., 2003), would have been missed. Therefore, further analysis will be required before a meaningful upper bound on the number of plant 79 miRNA genes can be estimated. The 92 loci tabulated to date, when considered together with the assumption that a few others might remain undetected because they are refractory to both cloning and computation, places a lower bound on the number of Arabidopsis miRNA genes at -100, or -0.4% of the predicted Arabidopsis genes-a percentage somewhat lower than that of animals. The plant miRNAs are generally in larger, more highly related families, further reducing the relative complexity of known miRNA sequences when compared to those of animals. Of course, when considering the vast number of distinct -22 nt RNAs that have been cloned from plants, which might be endogenous siRNAs but are not miRNAs, the diversity of small RNA silencing in plants could exceed that in animals. The targets of newly identified miRNAs The detection of the RNA fragments diagnostic of miRNA-directed cleavage confirms in planta these 19 newly identified miRNA-target interactions. However, these 5'-RACE results do not rule out the possibility that the predominant mode of silencing is translational inhibition. 5'RACE experiments demonstrate that miR172 directs the cleavage of some APETALA2 mRNA molecules, even though the predominate mode of repression appears to be translational inhibition (Aukerman and Sakai, 2003; Kasschau et al., 2003; Chen, 2004). Nonetheless, for all the other plant miRNA targets examined, inhibition of the miRNA pathway leads to increased accumulation of target mRNA (Kasschau et al., 2003; Vaucheret et al., 2004; Vazquez et al., 2004), suggesting that mRNA cleavage typically plays a significant regulatory role, although in these cases augmentation by translational repression cannot be ruled out. The same is likely to be true for our newly identified targets. Some of the newly identified targets resemble those of previous predictions with regard to their proven or inferred roles in regulating developmental processes (Figure 5). miR396 targets seven Growth Regulating Factor genes, which are putative transcription factors that regulate cell expansion in leaf and cotyledon (Kim et al., 2003). miR393 and miR394 both target the messages of F-box proteins, which in turn target specific proteins for proteolysis by making them substrates for ubiquitination by SCF E3 ubiquitin ligases (Vierstra, 2003). At2g27340, targeted by miR394, is in the same subfamily of F-box genes as UNUSUAL FLORAL ORGANS (UFO) (Gagne et al., 2002), which is involved in floral initiation and development (Wilkinson and Haughn, 1995; Samach et al., 1999). miR393 targets four closely related F-box genes, including 80 TRANSPORT INHIBITOR RESPONSE1 (TIR1), which targets AUX/IAA proteins for proteolysis in an auxin-dependent manner and is necessary for auxin-induced growth processes (Ruegger et al., 1998; Gray et al., 2001). These five F-box genes constitute a newly identified biochemical class of miRNA targets. The identification of TIR1 as a miRNA target implies that miRNAs regulate auxinresponsiveness at multiple points. Other auxin related miRNA targets include Auxin Response Factors (miR160 and miR167) (Rhoades et al., 2002; Kasschau et al., 2003), which are thought to regulate transcription in response to auxin(Ulmasov et al., 1999), and NAC1 (miR164) (Rhoades et al., 2002; Mallory et al., 2004), which promotes auxin-induced lateral root growth downstream of TIR1 (Xie et al., 2000). Finally, in addition to targeting F-box genes, miR393 also targets At3g23690, a basic helix-loop-helix transcription factor with homology to GBOF-1 from tulip, which Genbank annotates as auxin-inducible. Other newly identified miRNA targets have less obvious connections to the control of developmental patterning (Figure 5). miR397 targets putative laccases, members of a family of enzymes with numerous described roles in fungal biology but without well defined roles in plant biology (Mayer and Staples, 2002). miR399 targets two copper superoxide dismutases, CSD1 and CSD2, enzymes which protect the cell against radicals and whose expression patterns respond to oxidative stress (Kliebenstein et al., 1998). The most definitive example of a plant miRNA operating outside the gene regulatory circuitry controlling development is miR395. miR395 targets the ATP sulfurylases, APS1, APS3 and APS4, enzymes that catalyze the first step of inorganic sulfate assimilation (Leustek, 2002). The observations that the expression of miR395 depends on sulfate concentration and that APS1 expression declines with increasing miR395 corroborate the idea that this miRNA regulates sulfate metabolism (Figure 3). Our systematic analysis, which probably has identified most plant miRNAs with conserved and extensive complementarity to plant messages, including those that are expressed at very low levels during lab growth conditions, allows us to revisit the question of what this class of tiny regulatory RNAs is generally doing in plants. As before (Rhoades et al., 2002), we find an overwhelming propensity for targeting messages of known or suspected plant transcription factors (63 of 83, or 76% of genes in Figure 5) and similar propensity for targeting messages of genes with known or suspected roles in plant development (70 of 83, or 84% of 81 genes in Figure 5). A propensity to target developmental regulators differs from what has been seen in mammals (Lewis et al., 2003). Nonetheless, the conserved targets of plant miRNAs extend beyond the regulatory circuitry of development. The discovery that miRNAs regulate genes such as ATP sulfurylases, laccases, and superoxide dismutases shows that miRNAs also have an ancient role in regulating other aspects of plant biology. Experimental procedures Details of the computational miRNA prediction method and sequences of primers used are available online at http://www.molecule.org/. PCR validation of miRNAs We used a PCR based assay to detect expression and map the 5' ends of predicted miRNAs (Lim et al., 2003b). miRNAs were PCR amplified out of a library of small cDNAs from leaf, flower, and seedling flanked by 5' and 3' adaptor oligos (Reinhart et al., 2002). Each PCR reaction used one common primer corresponding the 5' adaptor oligo and one specific primer antisense to the 3' portion of the predicted miRNA. RNA purification and Northern hybridization RNA was isolated as previously described (Vance, 1991). For developmental Northerns, 30 ,/g per lane of total RNA from soil grown Colombia plants was separated by 15% polyacrylamide electrophoresis and blotted to a nylon membrane. For plants grown on media, Columbia plants were grown in long-day conditions on modified MS/agarose media, containing 0.8% Agarose-LE (USBiochem), in which the S0 42 -containing salts of minimal MS media were replaced with their chloride counterparts and the media supplemented with 20/zM to 2 mM 2(NH4)SO4. RNA was harvested from 2-week old plants. For miRNA Northerns, 40 jig per lane was used in Northern blots as above. For miR393, miR394, miR396a and miR398b, end-labeled antisense DNA probes were used. For miR395a, miR397b, and miR399b, higher specific activity Starfire (Integrated DNA technologies) probes were used. MicroRNA Northerns were hybridized and washed as previously described (Lau et al., 2001). For mRNA Northerns, 10 g per lane were separated by agarose electrophoresis and blotted as described (Mallory et al., 2001). Probes to exon 1 of APS1 were made using the Megaprime DNA labeling system (Amersham). 82 5'-RACE analysis 5'-RACE was performed on poly(A)-selected RNA from Columbia inflorescences and rosette leaves using the GeneRacer Kit (Invitrogen) as described (Kasschau et al., 2003), except that nested PCR was done for each gene, with each round of PCR using one gene-specific primer and the GeneRacer 5' Nested Primer. For each gene we designed gene-specific primers that were 180-450 bp away from the predicted miRNA binding site. PCR reactions were separated by agarose gel electrophoresis, and distinct bands of the appropriate size for miRNA-mediated cleavage were purified (excised gel slices corresponded to a size range of - 100 basepairs), cloned, and sequenced. Acknowledgements We thank M. Axtell for the 5'-RACE library, R. Rajagopalan for the library of 18-28 nt cDNAs and Allison Mallory and other Bartel lab members for helpful discussions. This work was supported by grants from the NIH. 83 Table 1. Sensitivity of computational identification of plant miRNA loci Family At loci Os loci Newly identified families 393 2/2 1/1 394 2/2 1/1 395 6/6 16/19 396 2/2 3/3 397 1/2 1/2 398 3/3 2/2 399 6/6 10/11 Previously identified conserved families 12/12 12/12 156a 159/JAW a,b,c 3/6 7/8 160 a 3/3 6/6 162 ad 164 a 166 a 167 a,b,d 2/2 3/3 8/9 3/4 2/2 5/5 10/12 168 a 2/2 1/2 169 a 14/14 15/17 7/7 3/3 171 ad 4/4 172 b 5/5 9/9 Previously identified non-conserved families 158a 0/2 0 161 a 0/1 0 163 a 0/1 0 173 b 0/1 0 All newly identified and previously known miRNA families are tallied. The number of loci found by de novo computational prediction (Figure la, through step 8) is shown (numerator) as fraction of total found by searching for near paralogs to miRNAs with verified expression (denominator). Additional details regarding the miRNA loci are reported in Tables S2 and S3 (Arabidopsis and Oryza loci, respectively). Citations for previously identified families: aReinhart et al. (2002). bPark et al. (2002). CMette et al. (2002).dLlave et al. (2002b). 84 Table 2. Newly miRNA family 393 (PCR,N,R) 394 (PCR,N,R) 395 (PCR,N,R) identified miRNA gene families in Arabidopsis miRNA Chr. Arm miRNA sequence gene MIR393a 2 5' UCCAAAGGGAUCGCAUUGAUC MIR393b 3 5' "" MIR394a 1 5' uUCUUUGGCAUUCUGUCCACC MIR394b 1 5' ". " MIR395a 1 3' cUGAAGUGUUUGGGGGAACUC MIR395b 1 3' " MIR395c 1 3' MIR395d 1 3' cUGAAGUGUUUGGGGGGACUC MIR395e 1 3' " MIR395f 1 3' 396 MIR396a 2 5' UUCCACAGCUUUCUUGAACUG (PCR,N,R) MIR396b 5 5' UUCCACAGCUUUCUUGAACUU 397 MIR397a 4 5' UCAUUGAGUGCAGCGUUGAUG (PCR,R) MIR397b 4 5' UCAUUGAGUGCAUCGUUGAUG 398 MIR398a 2 3' UGUGUUCUCAGGUCACCCCUU (PCR,N,R) MIR398b 5 3' UGUGUUCUCAGGUCACCCCUG MIR398c 5 3' " . 399 MIR399a 1 3' UGCCAAAGGAGAUUUGCCCUG (PCR) MIR399b 1 3' ccUGCCAAAGGAGAGUUGCCCUG MIR399c 5 3' MIR399d 2 3' UGCCAAAGGAGAUUUGCCCCG MIR399e 2 3' UGCCAAAGGAGAUUUGCCUCG MIR399f 2 3' UGCCAAAGGAGAUUUGCCCGG Newly identified miRNA families are listed with summary of experimental validation (PCR, PCR validation of miRNA; N, Northern blot of miRNA; R, 5'RACE of target mRNA). The chromosome of each locus is indicated (Chr.), as is the arm of the predicted stem-loop that contains the miRNA (arm). 5' ends of miRNAs were determined from PCR of small cDNAs, and lengths of miRNAs were inferred from mobility on Northern blots. For miRNAs not detected on Northem blots (families 397 and 399), lengths of 21 nt were assumed. For miRNA families for which multiple 5' ends were detected by PCR, nucleotides present in some but not all clones are listed in lower case. 85 Figure legends Figure 1. Prediction of conserved plant miRNAs. (A) Outline of the computational approach used to identify conserved plant miRNAs. See text for description. In steps 1-8, the sensitivity is reported (blue) as the fraction of miRNA loci retained with perfect matches to previously identified miRNAs (refsetl). In step 9, this fraction extends to imperfect matches to previously identified miRNAs. In the later steps, the total numbers of predicted miRNA loci are also reported (red). (B,C) Predicted hairpin secondary structures of two newly identified miRNA families, 393 (B) and 394 (C) that target mRNAs of F-box proteins. Nucleotides in red comprise the sequence of the most common mature miRNA as deduced from PCR validation and Northern hybridization. Nucleotides in blue indicate additional portions of the hairpins predicted to have miRNAencoding potential after identification of conserved 20mers in miRNA-like hairpins (Figure la, step 6), but before identification of conserved complementarity to mRNAs or experimental evaluation. For all three MIR393 loci, sequences antisense to the validated miRNA were also identified as potentially miRNA-encoding, but the miRNA* segments were not. Figure 2. The utility of incorporating evolutionary conservation when predicting plant miRNA targets. (A) Arabidopsis mRNAs with sites complementary to a set of 10 diverse miRNAs conserved between Arabidopsis and Oryza (refset2) were found and scored such that lower scores indicate fewer mismatches (see text for details). The number of mRNAs with each of the indicated scores is graphed (solid bars). Complementary sites were found and scored in the same manner for 5 cohorts of permuted miRNAs with the same dinucleotide composition as the authentic miRNAs (open bars, average number of complementary mRNAs per cohort; error bars, 2 standard deviations). (B) mRNAs complementary to 10 miRNAs were found as in (A), with the additional requirement that at least one homologous Oryza mRNA be complementary to the same miRNA (solid bars). Each conserved miRNA complementary site is counted as having the either the Arabidopsis or Oryza score, whichever is higher (i.e. less complementary). Messenger RNAs with conserved complementarity to cohorts of dinucleotide shuffled miRNAs were found in the 86 same manner (open bars, average number of complementary mRNAs; error bars, 2 standard deviations). Figure 3. Expression of newly identified miRNAs. (A) Total RNA (30 g) from seedlings (S), rosette leaves (L), flowers (F), and roots (R) were analyzed on a Northern blot, successively using radio-labeled DNA probes complementary to newly identified miRNAs. The lengths of 5'-phosphorulated radio-labeled RNA size markers (M) are indicated. As a loading control, he blot was probed for the U6 snRNA. (B) miR395 is induced with low sulfate. Total RNA (40 ttg) from 2-week-old Columbia plants grown on modified MS media containing the indicated concentrations of S04-2 were analyzed by Northern blot, probing for the indicated miRNAs as in (A). (C) APSi mRNA decreases in low sulfate. Total RNA (10 /tg) from 2-week-old plants grown on modified MS media containing the indicated concentrations of S04-2were analyzed by Northern hybridization using randomly primed body-labeled DNA probes corresponding to exon 1 of the APS1 mRNA. Normalized ratios of APS1 mRNA to U6 splicosomal RNA are indicated. Figure 4. Experimental verification of predicted miRNA targets. Each top strand (black) depicts a miRNA complementary site, and each bottom strand depicts the miRNA (red). Watson-Crick pairing (vertical dashes) and G:U wobble pairing (circles) are indicated. Arrows indicate the 5' termini of mRNA fragments isolated from plants, as identified by cloned 5'-RACE products, with the frequency of clones shown. Only cloned sequences that matched the correct gene and had 5' ends within a 100 nt window centered on the miRNA complmentary site are counted. The miRNA sequence shown corresponds to the most common miRNA suggested by miRNA PCR validation (Table 2). For miR394, the 5' end of a less common variant (1 out of 4 PCR clones) is indicated in lower case and corresponds to the most commonly cloned cleavage product. Figure 5. Conserved predicted miRNA targets. All predicted miRNA targets with scores of 3.0 or less in both Arabidopsis and Oryza are listed. The score of the best scoring 20mer from any member of the miRNA family to each gene is given in parentheses. Predicted targets with scores greater than 3.0 in either Arabidopsis or 87 Oryza but have been validated by 5'-RACE are also listed and marked with an astrisks. Genes in red were validated as miRNA targets by 5'-RACE experiments in this work. Genes in blue are validated as miRNA targets by previous work. Additional information on these genes can be found at www.arabidopsis.org. a Vazquez et al. (2004). b Kasschau et al. (2003). cPalatnik et al. (2003). dXie et al. (2003). e A. Mallory et al. (2004)f Tang et al. (2003). g Emery et. al. (2003). Vaucheret et al. (2004) Llave et al. (2002a). i Aukerman and Sakai (2003). k Chen (2003). 88 h Jones-Rhoades and Bartel Figure 1 BAA GC-A C-G U-A CAA-U- uIUA~ c-G C CCA U U U-G C-G G-C C-G U G C U G-C C-G",9 nt -GU 18 nt."U-A ,14 loop G-C loop loop U-A-o 26 nt-G U opC-GC-G C G-C I loop U-A A-U loAC-G A-U U-G C -CU U-A U-G U C A-U A-U A-U A-U C-G A-U U U-A UU U U C CC C-G C-G C-G C-G C-G C-G U-A U-A U-A A-U A-U A-U G-C UG-C -C UU-A U-A A-U A-U A-U C-G C-G C-G G-C G-C G-C C U C-G C-G U-A U-A U-A A-U A-U A-U G-C G-C G-C G-U G-C G'U G-C G'U G-C A-U A-U A-U A-U A-U A-U A-U A-U A-U C-G C-G C-G c-G C-G C-G U-A U-A U-A A-U A-U A G G'U C-G G-U G-C G-C A A G-C AU A A A-U A-U A-U G-C CAC A U A A A A A A U-A A-U A-U 3. Identify Arabidol mers with potential homologs 6. Identify miRNAof conservation bel Arabidopsisand Oi C-G U-A UU-A UU-A UU-G U-A U-A U-A U-A U-A G-C U-A -C G-CA AG-UC 5' 3' MIR393a Chr 2 Arabidopsis C G A U-A G-C UG G-U UA8-U U A-U 5' 3' 5' 3' MIR393b MIR393 Chr 3 Contig 4493 Arabidopsis Oryza U-A UU G U-A U-G A-U U-G A-U G-U U-A G-U U-A A-U C-G G-C U.G GA A A A C A-U G-C A-U U UA-U C-G C C C C UAA-uGU A-U AGA ucU-A-23 UCkGU nlt loopn AUGA -AA CUC-G C-AA Ci-i C-, U-A G-U UG U-A C-G C-G U-A C-G C-G A-U C-G C-G U.G G-C U-A C U U-A U C A-U C-G G-C G-C U-A U-A U-A C-G C-G U-A C-G C-G A-U C-G C-G U G G U U-A C U U-A U C A-U C-G G-C G-C U C-G C-G U-A C-G C-G A-U C-G C-G U-G G-C U-A C U U-A U C A-U C-G G-C G-C U-A C-G U-A A G G A A-U U-AU C-G U-A AC-G U-A C-G U-A U'G G-C A-U 5' 3' MIR394a Chr 1 Arabidopsis U U 5' 3' MIR394b Chr 1 Arabidopsis G-C A-U C-G A-U U U G-C A-U C-G A-U G G A-U G-TP A-U C-G 5' 3' MIR394 Contig 15318 Oryza Jones-Rhoades and Bartel Figure 2 A 80 B HAl _- 70 60 znC 50 N1 L E c 40 E a) E 30 8 0 1 20 I NC,4 N1i] V . IN 10 0 0 (A m C', I __ 0 0.5 | | ' E _ B _ 1 _ 1.5 2 I I I 2.5 3 0 0.5 1 1.5 2 2.5 Score (0 = 20 contiguous complementary nucleotides) 3 3.5 4 Jones-Rhoades and Bartel Figure 3 SO4 2 - (mM) B S L F R M .. , -24 miR393 -21 ~irQ miR394 0.02 0.2 2.0 -24 miR395 -21 -21 -18 -24 miR156 miR396 -21 miR398 U6 t a AL -24 miR159 s"l SO 42-(mM) C 0.02 0.2 APS1 2.0 *0 U6 APS1/U6 0.33 0.63 1.0 Jones-Rhoades and Bartel figure 4 TIR1 .. 1700 ACAAAGCUGGAGAUGUCUU 3' CUAIA= A5AA& miR393a 1874 j At1g12820 ... GGUAGGUACGAAA 3' miR393a At3926810 ... At4g03190 ,6 7 UGUCGUCUUG ... CCU CUAGACGCUA 5' T AGCAAGUAUGAAAAA 1989 I 1577 UGUUUCAUG. 4 ... GCCAAGCUAGA UGUCAUCUUG... 3' CUA A&&jA6GYAAAC U 5' miR393a 396 At3923690 ... CUACCUUUG 81 J GA ... UGGCAAUG 5' 3' CUAGUUACGCUAGGGAAACCU miR393a 1345 At1g27340 ... CUGUUGUGGAA miR394a 3' 2/!f/IO UI ,J ,,CAUAUGGUG... CCUCCACUGUUUACGGUUu 5' 1/10 .•/10 1/111•1911/10 3/1 355 APS4 ... GAGACAGUCA U AAA A miR395a UUUAACCGU... GUGAAGUC 5' GG 3 ' CUCAA• 758 GRL1 ... GAGGCCGCCAUCAUAG If C g CCAAAAU. . 5gA U 5' 3' GUCAAGUUCUUUCG-ACACCUU miR396a 853 GRL2 ... GAGCCGUCCU AUCCAAUCU ... 5' 3' GUCAAGUUCUUUCG-ACACCUU miR396a 732 .. GRL3 . . GUGGCCGCACCGUUCAAGAAAGCAUGUGGAAACUCCAACC miR396a 3' GUCAAGUUCUUUCG-ACACCUU 5' 4194f CCC A GRL7 ... GAGGUCGUCCU miR398a 3 ' 656111 GUAGUUG CACIAGUUACU 5 ' 12 miR396a 3' GUCAAGUCUUUCG-ACACCUU 5' GRL8 ... 8275i AGAGCCG1~.jiU A 3' GUUC CACUG miR396a CU aIUCUUG UU 5' CUUCU U GRL9 ... CUAAUCGUAAAC•UAýI CUUUC-CACtUU 3' GUCAAAUU miR396a AU.. ... 5' 671 At2g29130 ... UACUACGAUUAy1E CGAACUCUUC... r4A1 MUUIAI A 3' GUA MUG.d miR397a 737 5 If At2g38080 ... UGCUACGACUAGUCACGUGACUAUGAGAACUCUUU... 3' GUAUUGCGACUU ALUUACU 5' 1112 miR397a 656 AM2g60020 ... uucucAGcuAAUAAuC miR397a I I9 3' AAUACGAGCUCUUU 1 0ol cGUGuUC 82 CSD1 ... AuucuuuccArg 3' miR398a UUC AAAGGCCAAGU... 5' TrA&6AU6IU 10/I /12 CSD2 ... AGuGccGuCAU AUAAAUGCCAAU... = 3'1UUCCCCACU miR398a 5' -G 1 U 5' 85 At3g15640 ... CUAAUCCU miR398a 3' UC r UC IAU ,AA CAAAAC... UA-G 5' Jones-Rhoades and Bartel Figure 5 - miRNA family 393 394 395 396 397 398 399 156 159/JAW Target protein class F-box proteins bHLH transcription factor F-box protein ATP sulfurylases Growth Regulating Factor (GRL)transcription factors Rhodenase-like protein Kinesin-likeprotein B Laccases Beta-6 tubulin Copper superoxide dismutases CytochromeC oxidase subunit V Phosphate transporter Squamosa-promoterBinding Protein (SBP)-liketranscription factors MYB transcription factors TCP transcription factors 160 162 164 166 167 168 169 171 172 Auxin Response Factors (ARF transcription factors) DICER-LIKE 1 NAC domain transcription factors HD-Ziptranscription factors Auxin ResponseFactors (ARF transcription factors) ARGONAUTE CCAAT Binding Factor (CBF) HAP2-like transcription factors SCARECROW-like transcription factors APETALA2-liketrancription factors Target genes At1g12820(1), At3g26810(1), At3g629801TIR1(1.5), At4g03190(2.5) At3g23690(2)* At1g27340(1) At3g228901APS1(1.5), At4g146801APS3(1 .5), At5g43780/APS4(0.5) At2g22840/GRL1(3), At2g364001GRL3(3), At2g45480/GRL9(3), At3g529101GRL4(3), At4g241501GRL8(3), At4g377401GRL2(3), At5g536601GRL7(3) At2g40760(2.5) At4g271801ATK2(3) At2g29130(0.5), At2g38080(1), At5g60020(1 ) At5g12250(3) At lgO88301CSD1(3),At2g28190/CSD2(3.5)* At3g15640(3)* At3g54700(2) Atlg273601SPL11(1),Atlg27370/SPL10(1)a,Atlg53160/SPL4(2), Atlg691701SPL4(1), At2g338101SPL3(1.5), At2g422001SPL9(1), At3g152701SPL5(3), At3g57920(1), At5g43270/SPL2(1)b, At5g50570(1), At5g50670(1) At2g26950/MYB104(1.5), At2g26960/MYB81(2.5), C At2g324601MYB101(1.5), At3g114401MYB65(1.5)a , At3g60460(1.5), At4g269301MYB97(2.5), At5gO6100/MYB33(1 .5)C, At5g550201MYB120(2) Atlg302101TCP24(2.5)c, AtIg53230/TCP3(3), At2g310701TCP10(2.5)c, At3gI5030/TCP4(2.5)c, At4g183901TCP2(2.5)c Atlg778501ARF17(0.5) b, At2g28350/ARFIO(1 )b At4g300801ARF16(1 .5) AtIg010401DCL1(2)d Atlg56010NAC1(1 )e, At3g151701CUC(1 )b'e,At5907680(1.5)e, At5g39610(2),At5g539501CUC2()'b , At5g61430(1.5)e At1g304901PHV(1. 5 )f, Atlg52150/ A THB-15(1.5), At2g34710PHB(1 .5 )f, At4g328801A THB-8(1.5), At5g60690/REV(1 .5)9 AtIg30330/ARF6(2), At5g370201ARF8(2)a Atlg484101AGO(2.5)ah Atlg17590(1.5), Atlg54160(2),At1g72830(1.5),At3g05690(1.5), At3g20910(2), At5g06510(1.5), At5g12840(1.5) At2g45160(0), At3g60630(0)a", At4g001501SCL6(O) At2g28550/TOE1(1 5 )bJ,At2g39250(1), At4g36920/AP2(0.5)b' k, At5601201TOE2(O.5)bJ, At5g671801TOE3(1. 5)b - Ambros, V., Lee, R. C., Lavanway, A., Williams, P. T., and Jewell, D. (2003). MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr Biol 13, 807-818. Aukerman, M. J., and Sakai, H. (2003). Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15, 2730-2741. Bartel, B., and Bartel, D. P. (2003). MicroRNAs: at the root of plant development? Plant Physiol 132, 709-717. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-297. Chen, X. (2004). A MicroRNA as a Translational Repressor of APETALA2 in Arabidopsis Flower Development. Science 303, 2022-2025. Elbashir, S. M., Lendeckel, W., and Tuschl, T. (2001). RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes Dev 15, 188-200. Emery, J. F., Floyd, S. K., Alvarez, J., Eshed, Y., Hawker, N. P., Izhaki, A., Baum, S. F., and Bowman, J. L. (2003). Radial patterning of Arabidopsis shoots by class III HD-ZIP and KANADI genes. Curr Biol 13, 1768-1774. Enright, A. J., John, B., Gaul, U., Tuschl, T., Sander, C., and Marks, D. S. (2003). MicroRNA targets in Drosophila. Genome Biol 5, R1. Floyd, S. K., and Bowman, J. L. (2004). Gene regulation: ancient microRNA target sequences in plants. Nature 428, 485-486. Gagne, J. M., Downes, B. P., Shiu, S. H., Durski, A. M., and Vierstra, R. D. (2002). The F-box subunit of the SCF E3 complex is encoded by a diverse superfamily of genes in Arabidopsis. Proc Natl Acad Sci U S A 99, 11519-11524. 94 Grad, Y., Aach, J., Hayes, G. D., Reinhart, B. J., Church, G. M., Ruvkun, G., and Kim, J. (2003). Computational and experimental identification of C. elegans microRNAs. Mol Cell 11, 12531263. Gray, W. M., Kepinski, S., Rouse, D., Leyser, O., and Estelle, M. (2001). Auxin regulates SCF(TIR1)-dependent degradation of AUX/IAA proteins. Nature 414, 271-276. Grishok, A., Pasquinelli, A. E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D. L., Fire, A., Ruvkun, G., and Mello, C. C. (2001). Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34. Hammond, S. M., Bernstein, E., Beach, D., and Hannon, G. J. (2000). An RNA-directed nuclease mediates post-transcriptional gene silencing in Drosophila cells. Nature 404, 293-296. Hutvagner, G., McLachlan, J., Pasquinelli, A. E., Balint, E., Tuschl, T., and Zamore, P. D. (2001). A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293, 834-838. Hutvagner, G., and Zamore, P. D. (2002). A microRNA in a multiple-turnover RNAi enzyme complex. Science 297, 2056-2060. Juarez, M. T., Kui, J. S., Thomas, J., Heller, B. A., and Timmermans, M. C. (2004). microRNA- mediated repression of rolled leafl specifies maize leaf polarity. Nature 428, 84-88. Kasschau, K. D., Xie, Z., Allen, E., Llave, C., Chapman, E. J., Krizan, K. A., and Carrington, J. C. (2003). P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA unction. Dev Cell 4, 205-217. 95 Ketting, R. F., Fischer, S. E., Bernstein, E., Sijen, T., Hannon, G. J., and Plasterk, R. H. (2001). Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15, 2654-2659. Khvorova, A., Reynolds, A., and Jayasena, S. D. (2003). Functional siRNAs and miRNAs exhibit strand bias. Cell 115, 209-216. Kidner, C. A., and Martienssen, R. A. (2003). Macro effects of microRNAs in plants. Trends Genet 19, 13-16. Kim, J. H., Choi, D., and Kende, H. (2003). The AtGRF family of putative transcription factors is involved in leaf and cotyledon growth in Arabidopsis. Plant J 36, 94-104. Kliebenstein, D. J., Monde, R. A., and Last, R. L. (1998). Superoxide dismutase in Arabidopsis: an eclectic enzyme family with disparate regulation and protein localization. Plant Physiol 118, 637-650. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858. Lai, E. C. (2003). microRNAs: runts of the genome assert themselves. Curr Biol 13, R925-936. Lai, E. C., Tomancak, P., Williams, R. W., and Rubin, G. M. (2003). Computational identification of Drosophila microRNA genes. Genome Biol 4, R42. Lappartient, A. G., Vidmar, J. J., Leustek, T., Glass, A. D., and Touraine, B. (1999). Inter-organ signaling in plants: regulation of ATP sulfurylase and sulfate transporter genes expression in roots mediated by phloem-translocated compound. Plant J 18, 89-95. Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. 96 Lee, R. C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S., and Kim, V. N. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415-419. Lee, Y., Jeon, K., Lee, J. T., Kim, S., and Kim, V. N. (2002). MicroRNA maturation: stepwise processing and subcellular localization. Embo J 21, 4663-4670. Leustek, T. (2002). Sulfate Metabolism. The Arabidopsis Book, 1-16. Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787-798. Lim, L. P., Glasner, M. E., Yekta, S., Burge, C. B., and Bartel, D. P. (2003a). Vertebrate microRNA genes. Science 299, 1540. Lim, L. P., Lau, N. C., Weinstein, E. G., Abdelhakim, A., Yekta, S., Rhoades, M. W., Burge, C. B., and Bartel, D. P. (2003b). The microRNAs of Caenorhabditis elegans. Genes Dev 17, 9911008. Llave, C., Kasschau, K. D., Rector, M. A., and Carrington, J. C. (2002a). Endogenous and silencing-associated small RNAs in plants. Plant Cell 14, 1605-1619. Llave, C., Xie, Z., Kasschau, K. D., and Carrington, J. C. (2002b). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056. Mallory, A. C., Dugas, D. V., Bartel, D. P., and Bartel, B. (2004). MicroRNA regulation of NAC-domain targets is required for proper formation and separation of adjacent embyonic, vegatative, and floral organs. Curr Biol In press. 97 Mallory, A. C., Ely, L., Smith, T. H., Marathe, R., Anandalakshmi, R., Fagard, M., Vaucheret, H., Pruss, G., Bowman, L., and Vance, V. B. (2001). HC-Pro suppression of transgene silencing eliminates the small RNAs but not transgene methylation or the mobile signal. Plant Cell 13, 571-583. Maruyama-Nakashita, A., Inoue, E., Watanabe-Takahashi, A., Yamaya, T., and Takahashi, H. (2003). Transcriptome profiling of sulfur-responsive genes in Arabidopsis reveals global effects of sulfur nutrition on multiple metabolic pathways. Plant Physiol 132, 597-605. Mayer, A. M., and Staples, R. C. (2002). Laccase: new functions for an old enzyme. Phytochemistry 60, 551-565. McConnell, J. R., Emery, J., Eshed, Y., Bao, N., Bowman, J., and Barton, M. K. (2001). Role of PHABULOSA and PHAVOLUTA in determining radial patterning in shoots. Nature 411, 709713. Mette, M. F., van der Winden, J., Matzke, M., and Matzke, A. J. (2002). Short RNAs can identify new candidate transposable element families in Arabidopsis. Plant Physiol 130, 6-9. Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. (2002). miRNPs: a novel class of ribonucleoproteins containing numerous microRNAs. Genes Dev 16, 720-728. Palatnik, J. F., Allen, E., Wu, X., Schommer, C., Schwab, R., Carrington, J. C., and Weigel, D. (2003). Control of leaf morphogenesis by microRNAs. Nature 425, 257-263. Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12, 1484-1495. 98 Reinhart, B. J., Weinstein, E. G., Rhoades, M. W., Bartel, B., and Bartel, D. P. (2002). MicroRNAs in plants. Genes Dev 16, 1616-1626. Rhoades, M. W., Reinhart, B. J., Lim, L. P., Burge, C. B., Bartel, B., and Bartel, D. P. (2002). Prediction of plant microRNA targets. Cell 110, 513-520. Ruegger, M., Dewey, E., Gray, W. M., Hobbie, L., Turner, J., and Estelle, M. (1998). The TIR1 protein of Arabidopsis functions in auxin response and is related to human SKP2 and yeast grrlp. Genes Dev 12, 198-207. Samach, A., Klenz, J. E., Kohalmi, S. E., Risseeuw, E., Haughn, G. W., and Crosby, W. L. (1999). The UNUSUAL FLORAL ORGANS gene of Arabidopsis thaliana is an F-box protein required for normal patterning and growth in the floral meristem. Plant J 20, 433-445. Schwarz, D. S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P. D. (2003). Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208. Stark, A., Brennecke, J., Russell, R. B., and Cohen, S. M. (2003). Identification of Drosophila MicroRNA Targets. PLoS Biol 1, E60. Takahashi, H., Yamazaki, M., Sasakura, N., Watanabe, A., Leustek, T., Engler, J. A., Engler, G., Van Montagu, M., and Saito, K. (1997). Regulation of sulfur assimilation in higher plants: a sulfate transporter induced in sulfate-starved roots plays a central role in Arabidopsis thaliana. Proc Natl Acad Sci U S A 94, 11102-11107. Tang, G., Reinhart, B. J., Bartel, D. P., and Zamore, P. D. (2003). A biochemical framework for RNA silencing in plants. Genes Dev 17, 49-63. Ulmasov, T., Hagen, G., and Guilfoyle, T. J. (1999). Dimerization and DNA binding of auxin response factors. Plant J 19, 309-319. 99 Vance, V. B. (1991). Replication of potato virus X RNA is altered in coinfections with potato virus Y. Virology 182, 486-494. Vaucheret, H., Vazquez, F., Crete, P., and Bartel, D. P. (2004). The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev In Press. Vazquez, F., Gasciolli, V., Crete, P., and Vaucheret, H. (2004). The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14, 346-351. Vierstra, R. D. (2003). The ubiquitin/26S proteasome pathway, the complex last chapter in the life of many plant proteins. Trends Plant Sci 8, 135-142. Wilkinson, M. D., and Haughn, G. W. (1995). UNUSUAL FLORAL ORGANS Controls Meristem Identity and Organ Primordia Fate in Arabidopsis. Plant Cell 7, 1485-1499. Xie, Q., Frugis, G., Colgan, D., and Chua, N. H. (2000). Arabidopsis NACI transduces auxin signal downstream of TIR1 to promote lateral root development. Genes Dev 14, 3024-3036. Xie, Z., Johansen, L. K., Gustafson, A. M., Kasschau, K. D., Lellis, A. D., Zilberman, D., Jacobsen, S. E., and Carrington, J. C. (2004). Genetic and Functional Diversification of Small RNA Pathways in Plants. PLoS Biol 2, E104. Xie, Z., Kasschau, K. D., and Carrington, J. C. (2003). Negative feedback regulation of DicerLikel in Arabidopsis by microRNA-guided mRNA degradation. Curr Biol 13, 784-789. Yekta, S., Shih, I. H., and Bartel, D. P. (2004). MicroRNA-directed cleavage of HOXB8 mRNA. Science In press. 100 MicroRNA-mediated regulation of an F-box gene is required for embryonic, floral, and vegetative development Matthew W. Jones-Rhoades1 and David P. Bartell 2 'Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142 2 Correspondence: dbartel@wi.mit.edu 101 Abstract MicroRNAs are endogenous -21 nt RNAs that function as post-transcriptional regulators in both plants and animals. miR394, and conserved miR394-complementary sites in F-box mRNAs, were previously identified in a bioinformatic screen for unknown miRNAs. Here we show that miR394-mediated regulation of F-box gene Atlg27340 is required at multiple stages of Arabidopsis development. Transgenic plants expressing a miR394-resistant version of Atlg27340 display a range of developmental abnormalities, including radialized and fused cotyledons, absent shoot apical meristems, curled and radialized leaves, and abortive flowers. The severity of these abnormalities correlates with the overaccumulation of Atlg27340 mRNA, suggesting that an SCFAtlg27340complex ubiquitinates an activator of class III HD-ZIP function. Introduction MicroRNAs (miRNAs) are endogenous - 21 nucleotide non-coding RNAs that regulate gene expression in both plants and animals (reviewed in (Bartel, 2004)). Initially expressed as single-stranded stem-loop precursor RNAs, miRNAs require the RNase III enzyme DICERLIKE1 (DCL1), as well as HEN1, HYL1, HST, and AGO1, for proper processing and accumulation (Park et al., 2002; Reinhart et al., 2002; Boutet et al., 2003; Han et al., 2004; Vaucheret et al., 2004; Vazquez et al., 2004; Park et al., 2005). Many miRNAs isolated from Arabidopsis are conserved to Oryza (rice) and other plant species (Reinhart et al., 2002; Floyd and Bowman, 2004; Jones-Rhoades and Bartel, 2004; Sunkar and Zhu, 2004; Axtell and Bartel, 2005), suggesting that miRNAs have evolutionarily conserved roles in land plants. Regulatory targets have been confidently predicted for most Arabidopsis miRNAs based on the high degree of complementarity between the miRNAs and their target mRNAs (Rhoades et al., 2002; JonesRhoades and Bartel, 2004). Like the miRNAs themselves, many of these miRNA target sites are broadly conserved in plant species (Rhoades et al., 2002; Jones-Rhoades and Bartel, 2004). Perhaps because of the extensive complementarity of plant miRNA-target duplexes, most Arabidopsis miRNAs guide the cleavage of target mRNAs (Llave et al., 2002; Kasschau et al., 2003; Tang et al., 2003). Several lines of evidence indicate that plant miRNAs play key roles in a broad range of developmental processes. Plants with dcll, henl, agol, hst, or hyll mutations have severe and pleotropic developmental abnormalities (Bohmert et al., 1998; Telfer and Poethig, 1998; Lu and 102 Fedoroff, 2000; Chen et al., 2002; Morel et al., 2002; Schauer et al., 2002) which correlate with the impairment of miRNA activity (Park et al., 2002; Reinhart et al., 2002; Boutet et al., 2003; Han et al., 2004; Vaucheret et al., 2004; Vazquez et al., 2004; Park et al., 2005), as do plants which express certain viral suppressors of RNA mediated silencing (Mallory et al., 2002; Kasschau et al., 2003; Chapman et al., 2004; Chen et al., 2004; Dunoyer et al., 2004). The majority of confirmed and predicted evolutionarily-conserved miRNA targets are mRNAs that encode for transcription factors and other regulatory proteins, such as F-box proteins and components of the miRNA pathway itself (Jones-Rhoades and Bartel, 2004). Plants with impaired miRNA-mediated regulation of particular transcription factor mRNAs have been shown to have various developmental phenotypes. For example, plants expressing miR166 resistant versions of HD-ZIP genes PHABULOSA, PHAVOLUTA, and REVOLUTA have radialized leaves or vasculature (Emery et al., 2003; Kidner and Martienssen, 2004; Mallory et al., 2004b; Zhong and Ye, 2004), and miRNA-resistant copies of certain TCP or ARF transcription factors result in seedlings that arrest or that have extra cotyledons, respectively(Palatnik et al., 2003; Mallory et al., 2005). A recent bioinformatic screen for conserved plant miRNAs and targets identified two miRNA families that guide the cleavage of mRNAs that encode for F-box proteins (JonesRhoades and Bartel, 2004). F-box proteins are specificity determinants of SCF E3 ubiquitin ligases, which facilitate the transfer of ubiquitin from E2 ubiquitin conjugating proteins to specific target proteins, thereby marking them for degradation by the 26S proteasome (reviewed in (Deshaies, 1999; Smalle and Vierstra, 2004)). In Saccharomyces cerevisiae, SCF complexes are composed of four primary subunits: Cullinl, Rbxl, and Skpl are thought to comprise the core ubiquitin ligase activity, and an F-box protein is thought to serve as a bridge between the SCF complex and the target protein (Deshaies, 1999; Smalle and Vierstra, 2004). The - 60 amino acid N-terminal F-box domain interacts with the rest of the SCF complex, and the Cterminal portion, which is highly divergent between different F-box proteins, is thought to interact with the target protein and thus confer specificity of ubiquitination (Zheng et al., 2002; Willems et al., 2004). The Arabidopsis genome contains nearly 700 F-box proteins (Gagne et al., 2002), several of which have been shown to be important for diverse aspects of plant biology such as hormone signaling, response to the environment, and developmental patterning. TRANSPORT 103 INHIBITOR RESPONSE1 (TIR1) targets AUX/IAA proteins for degradation in an auxindependent manner, and is needed for auxin-induced developmental processes (Ruegger et al., 1998; Gray et al., 2001). The F-box proteins EBF1/EBF2, GID2 and COI1 mediate ethylene, gibberellin, and jasmonate signaling, respectively (Xie et al., 1998; Guo and Ecker, 2003; Potuschak et al., 2003; Sasaki et al., 2003; Gagne et al., 2004). UNUSUAL FLORAL ORGANS (UFO) is required for proper floral development (Wilkinson and Haughn, 1995; Samach et al., 1999), and ORE1 regulates leaf senescence and axillary shoot growth(Woo et al., 2001; Stirnberg et al., 2002). However, the majority of Arabidopsis F-box genes have no known function. It is likely that many of these F-box proteins (or subclades of F-box proteins) each target specific proteins for ubiquitination and proteolysis. Here we show that miR394-mediated regulation of Atlg27340, an F-box gene related to UFO, is required for proper development. Seedlings expressing 5mAtlg27340, a miR394resistant version of Atlg27340, frequently arrest without forming shoot apical meristems (SAMs), often with fused and/or radialized cotyledons. 5mAtlg27340 expressing plants that do form SAMs have pleotropic defects in vegetative and floral development, including downwardly curled leaves and abortive flowers. These developmental abnormalities correlate with the overaccumulation of Atlg27340 mRNA, suggesting that they are the result of overexpression of an Atlg27340-directed SCF ubiquitin ligase. Results Atlg27340 defines a conserved class of miR394-regulated F-box genes with homology to UFO In a phylogenetic tree of 694 F-box genes, Atlg27340 falls in a subclade of five genes that contains UFO (Gagne et al., 2002). Although Atlg27340 is the second best blastp hit to UFO in the Arabidopsis genome (E value 6.7-2 1), the two proteins have only -30% similarity at the amino acid level. UFO is unlikely to be regulated by miR394; whereas miR394 can pair to Atlg27340 with 19 out of 20 nucleotides, only 12 out of 20 miR394 nucleotides can pair to the corresponding section of UFO (Figure la). Although the similarity between Atlg27340 and UFO is limited, F-box genes in other plant species are highly similar to Atlg27340 and contain conserved miR394 complementary sites. Two Populus and one Oryza F-box proteins have Atlg27340 as their best Arabidopsis blastp hits (Figure la). All three of these proteins have are at least 75% similar to Atlg27340 at the amino acid level, including extensive identity in the C-terminal region that is likely to specify 104 substrate recognition, and miR394 can pair to the mRNA encoding each with 0-1 unpaired nucleotides (Figure la). In addition to these Atlg27340-like genes in plants with sequenced genomes, numerous plant species have ESTs which a) have Atlg27340 as their best Arabidopsis blastx hit and b) can pair to miR394 with 0-1 mismatches (Figure 1). These miR394complementary, Atlg27340-like ESTs are found in both monocots and dicots, as well as in conifers (genus Picea). This conservation implies that the divergence of Atlg27340 from UFO and the regulation of Atlg27340-like genes by miR394 predate the divergence of gymnosperms and angiosperms. miR394 regulation of Atlg27340 is required for normal development In vivo miR394-directed cleavage of Atlg27340 can be detected by 5' RACE (JonesRhoades and Bartel, 2004). In order to investigate the biological significance miR394-mediated regulation, we constructed a mutant version of Atlg27340 with reduced complementarity to miR394. This 5mAtlg27340 construct encodes for the same amino acid sequence as Atlg27340, but has five silent mutations within the miR394 complementary site, and contains 1.6 kb of putative promoter sequence upstream of Atlg27340 (Figure lb). We transformed Arabidopsis thaliana separately with both this 5mAtlg27340 construct and with an unmutated Atlg27340 control construct. Only 1 out of 91 control Atlg27340 primary transformants (-1%) had any developmental abnormalities (small outgrowths from the midveins of a few cauline leaves on one plant). In contrast, 65 of 105 5mAtlg27340 primary transformants (62%) displayed various vegetative and floral phenotypes (Table 1). Most noticeably, 51 5mAtlg27340 transformants (49%) had moderately to severely downwardly curled rosette leaves (Figure 2a,b). Fifty-one 5nmAtlg27340transformants (49%) also had cauline leaf abnormalities. Most commonly, cauline leaves had a spiked outgrowth protruding from the abaxial midvein (Figure 2b,c2). In other cases, the entire cauline leaf was replaced by a radialized, spiked structure (Figure 2c3-6). In some cases, these radialized cauline leaves subtended approximately wild-type axillary inflorescences (Figure 2c6), whereas in other cases radialized cauline leaves subtend axillary inflorescences that themselves produce aberrant cauline leaves and flowers (Figure 2c3,4). The number and severity of abnormal cauline leaves generally correlated with the extent of rosette leaf curling. 5mAtlg27340 transformants also exhibited various floral abnormalities. Most of the flowers that were produced had the expected numbers of organs and were fertile, although 105 flowers on 13 of the plants (12%) with stronger phenotypes were generally missing 1-4 petals and had reduced fertility. Twenty-seven 5mAtlg27340 transformants (26%) sporadically produced abortive flowers that consisted of only a filamentous structure in some cases (Figure 2d inset, Figure 2e), whereas in other cases flowers consisted of two sepals without any other floral organs (Figure 2e). The percentage of abortive flowers produced per plant varied from -1% to -40%. In most cases, a single inflorescence would alternate between producing fertile and abortive flowers in a seemingly stochastic pattern (Figure 2d,e). In extreme cases, inflorescences of 5mAtlg27340 plants produced a proliferation of determinate filaments in place of floral buds (Figure 2e). Shoots of 5mAtlg27340 expressing plants often had a seemingly stochastic phyllotaxy of maturing siliques, with the locations of the missing siliques marked by abortive filaments or empty flowers (Figure 2d). Approximately 10% of 5mAtlg27340 T1 transformants failed to develop a SAM and never formed any true leaves. Analysis of T2 seeds for several 5mAtlg27340 lines revealed that seedling arrest occurred in 0-55% of T2 seedlings, with the percentage of arrested seedlings correlating with the severity of the T1 phenotype, whereas the remainder of Basta-resistant seedlings did form SAMs and recapitulated the vegetative and floral abnormalities observed in their T1 parents (Table 2). The arrested seedlings displayed a range of different phenotypes (Figure 3a,b). Some seedlings had only one cotyledon (Figure 3bl,4), whereas others had two (Figure 3b2,3). In some cases the cotyledons were radialized (Figure 3bl,2), whereas in other cases seedlings had cotyledons that approached wild-type size and shape, but did not form functional SAMs (Figure 3b3,4). In some of these cases, one or two determinate, spike-like structures eventually emerged from the region where the SAM should have been (data not shown). 5mAtlg27340 plants overaccumulate Atlg27340 mRNA Many plant miRNAs guide the cleavage to target mRNAs. Because of this, mRNAs targeted by miRNAs generally overaccumulate in plants impaired in miRNA function, and the expression of a miRNA-resistant version of a miRNA target can similarly result in overaccumulation of the miRNA-resistant mRNA. We find that this is the case with 5mAtlg27340-expressing plants; normalized Atlg27340 mRNA levels are 1.7, 2.8, and 2.2 fold higher in leaves, inflorescences, and seedlings, respectively, in T2 5mAtlg27340 plants compared to control Atlg27340 plants (Figure 3b). Atlg27340 mRNA levels are highest in 106 .5mAtlg27340 T2 seedlings that lack SAMs; these arrested seedlings accumulate Atlg27340 transcripts at levels 1.8 fold higher than 5mAtlg27340 seedlings with functional SAMs and 3.9 -foldhigher than control Atlg27340 seedlings. Discussion We find that expression of a miR394-resistant version of Atlg27340 has broad ranging effects on Arabidopsis development, whereas expression of an additional wild-type copy does not. These results confirm the biological relevance of the interaction between miR394 and Atlg27340, and represent the first insights into the roles of miRNA-mediated regulation of F-box genes. Our finding that Atlg27340 mRNA levels are increased in plants expressing 5mAtlg27340 is consistent with the idea that miR394 exerts its influence over Atlg27340 primarily through guided RNA cleavage. Indeed, the extent of developmental abnormalities correlates with the level of Atlg27340 mRNA in that Atlg27340 transcript levels are highest in seedlings that fail to develop shoot apical meristems. The Arabidopsis shoot apical meristem is a small group of pluripotent cells which gives rise to all aerial tissues and organs (reviewed in (Baurle and Laux, 2003). The proper initiation of and maintenance of SAM pluripotency requires a complex interplay of gene interactions, and is critical to all stages of vegetative and floral development. SHOOTMERISTEMLESS (STM) and WUSCHEL (WUS), which encode for homeodomain transcription factors, act in parallel to initiate and maintain meristem identity (Endrizzi et al., 1996; Laux et al., 1996; Long et al., 1996; Mayer et al., 1998). The embryonic expression of STM, and hence the embryonic establishment of SAM identity, is dependent on the proper development of the cotyledons. Embryos with double homozygous mutations in the NAC domain transcription factors CUPSHAPED COTYLEDONS1 (CUC1) and CUC2 have fused cotyledons and fail to initiate STM expression during embryogenesis (Aida et al., 1997; Aida et al., 1999). Similarly, the correct balance between the antagonistic activities of class III HD-ZIP and KANADI transcription factors is essential for proper cotyledon development and SAM formation. Seedlings which are either homozygous for loss-of-function mutations in three partially redundant HD-ZIP genes (phblphv/rev), or which overexpress KANADI genes, have one or two radialized cotyledons and fail to initiate SAMs (Eshed et al., 2001; Kerstetter et al., 2001; Emery et al., 2003). Our results establish that both MIR394 and Atlg27340 are also important regulators of meristem identity. MIR394 is expressed highly in inflorescences (Jones-Rhoades and Bartel, 107 2004), and the relative increase of Atlg27340 mRNA in 5mAtlg27340 plants was greatest in inflorescences. This increase in Atlg27340 mRNA levels is likely associated with the overexpression of Atlg27340 protein and/or the accumulation of Atlg27340 protein in cells in which miR394 would block wild-type Atlg27340 expression. If Atlg27340 functions in SCF E3 ubiquitin ligases as do other F-box proteins, then the observed 5mAtlg27340 phenotypes are likely to be the result of increased ubiquitination and protealysis of unknown factors targeted by the putative SCFAtlg27340 ubiquitin ligase. Because many 5mAtlg27340 seedlings have abnormal cotyledons, the target of SCFAtlg27340 is likely to be upstream of STM, which is dispensable for cotyledon development (Endrizzi et al., 1996; Long et al., 1996). The 5mAtlg27340 seedlings with one or two fused cotyledons are reminiscent of homozygous phblphvlrev triple mutants and KANADI overexpressors (Eshed et al., 2001; Kerstetter et al., 2001; Emery et al., 2003), suggesting that the targets of SCFAtlg27340may be activators of the HD-ZIP activity or repressors of KANADI function. Because these genes are also important for the proper initiation and patterning of lateral organs and meristems post-embryonically, the vegetative and phenotypes observed in 5mAtlg27340 plants might also be related to a misregulation of HD-ZIP or KANADI activities. Indeed, plants homozygous for loss-of-function alleles for multiple class III HD-ZIP genes sporadically initiate abortive flowers (Prigge et al., 2005) in a manner reminiscent of 5mAtlg27340 expressing plants. Experimental Procedures DNA constructs and transgenic plants BAC clone F17L21 was digested with SpeI and NsiI to yield a 5.1 kb fragment containing Atlg27340, as well 1.6 kb of upstream sequence and 0.9 kb of downstream sequence, which was ligated into SpeI and PsI cut pBluescriptIISK+ (Stratagene). Site directed mutagenesis was performed by PCR with PfuUltra polymerase and the primers GCACCATATGTTCGGCATGCGATCAACTTCCTTCCACAACAGTGT and ACACTGTTGTGGAAGGAAGTTGATCGCATGCCGAACATATGGTGC, followed by DpnI digestion. Following mutagenesis, a 2.5 kb Hindm-BamHI fragment of the original Atlg27340 clone was replaced with the corresponding fragment containing the mutagenized miR394 complementary site, which was sequenced to ensure that no additional mutations had occurred during PCR. Wild-type and mutant ATlg27340 5.1 kb Spel-HindIII fragments were subcloned 108 into the binary vector pGreenII0229, and electroporated into Agrobacterium tumefaciens strain GV3101::pMP90. Arabidopsis thaliana (Columbia accession) was transformed by the floral dip method (Clough and Bent, 1998), and the collected seeds were surface sterilized and plated on Bouterage No.2 media (Duchefa Biochemie) containing 10 ug/ml Basta. Seedlings were grown under long day conditions (20° C, 16 hr light, 8 hr dark) for about 10 days before transfer to soil consisting of 50% promix (Premier Horticulture) and 50% redi-earth (Scotts). RNA Isolation and Northern blot analysis Total RNA was isolated as described (Mallory et al., 2001). For mRNA northerns, 12 ug of total RNA was size fractionated on a 1% agarose/formaldehde gel and transferred to a nitrocellulose membrane as described (Mallory et al., 2005). 1.4 kb of exon2 of Atlg27340 was PCR amplified with primers AGTCTCTAGAATGGTGTTGCCCTGTATTGAGGA and CAGTAAGCTTAAGAGGTTCCACACAACCCA, and directionally cloned into pBluescriptIISK+ (Stratagene). Following XbaI digestion, this template was used to generate Atlg27340 antisense RNA probe by T7 transcription in the presence of a-3 2P UTP. Blots were hybridized in at 680 C in Ultrahyb buffer overnight, and washed successively with 2X SSC, 0.1% SDS (two times) and 0.1X SSC, 0.1% SDS (two times). For miRNA northerns, 30 ug total RNA was fractionated on a 15% polyacrylamide gel, transferred to a nitrocellulose membrane, hybridized, and washed as described, using the 5' 32p labeled DNA oligo AGGAGGTGGACAGAATGCCAA as a probe for miR394. Scanning Electron Microscopy Plant tissues were fixed, dehydrated, critical point dried, and coated with gold and palladium as described (Mallory et al., 2004a). Samples were imaged on a Jeol 5600LV scanning electron microscope. 109 Table 1. Observed phenotypes of T1 transformant plants construct Atlg27340 5mAtlg27340 1 (1%) curled rosette leaves 0 (0%) spikes on cauline leaves 1 (1 %) radialized cauline leaves 1 (1 %) missing petals 0 (0%) abortive flowers 0 (0%) no SAM 0 (0%) 65 (62%) 51 (49%) 51 (49%) 23 (22%) 13 (12%) 27 (26%) 11 (10%) total wild type development abnormal development 91 90 (99%) 105 40 (38%) The number and percentage of T1 plants with various developmental abnormalities are indicated. See text for details. Table 2. Observed phenotypes of 5mAtlg27340 T2 transformant plants basta Line total sensitive no SAM like T1 5mAtlg27340-16 36 27% 56% 18% 5mAt lg27340-1 36 23% 33% 44% 5mAtlg27340-23 37 30% 32% 38% T1 phenotpye severe strong strong 5mAtlg27340-6 66 21% 30% 49% severe 5mAtlg27340-30 45 27% 26% 47% severe 5mAtlg27340-44 95 25% 25% 50% severe 5mAtlg27340-33 32 25% 20% 55% strong 5mAtlg27340-3 33 20% 9% 71% mild 5mAtlg27340-18 43 36% 6% 58% mild 5mAtlg27340-24 22 24% 3% 73% slight 5mAtlg27340-27 35 23% 0% 61% slight The observed frequencies of T2 phenotypes for 5mAtlg27340 lines are indicated, as is the severity of developmental defects observed in the T1 parent of each line. 110 Figure Legends Figure 1. Atlg27340 is complementary to miR394. (A) miR394 complementary sites in F-box genes from different plant genera are depicted. Nucleotides which can form Watson-Crick base pairs with miR394 are in upper case and highlighted, whereas nucleotides which are mismatched or can form G:U wobble pairs are in lower case. For each F-box gene, the Atlg27340 blastp (for Arabidopsis, Oryza, and Populus proteins) or blastx (for ESTs from other genera) E value and rank (out of all Arabidopsis proteins) are indicated. (B) The Atlg27340 genomic clone used to transform Arabidopsis is depicted. Intergenic regions are shown as solid lines, UTR sequence as shaded boxes, coding sequence as open boxes, and intronic sequence as a dashed line. The restriction sites used to isolate the genomic clone from BAC F17L21 are indicated. Within the Atlg27340 coding region, the position of the F-box domain ("F") and miR394 complementary site ("*") are shown. The amino acid sequence, nucleotide sequence, and miR394-complementarity of the wild-type and mutated miR394 complementary sites are shown. Figure 2. Vegetative and floral phenotypes of 5mAtlg27340 plants. (A) Three week old wild-type plant with broad, flat rosette leaves and T1 SmAtlg27340 plant with downwardly curled rosette leaves. (B) Close-up views of flat wild-type rosette and cauline leaves and curled T1 5mAtlg27340 rosette and cauline leaves (right). 5mAtlg27340 has a spiked outgrowth from the abaxial midvein (arrow). (C) Control T2 Atlg27340 (1) and 5mAtlg27340 cauline leaves and axillary shoots (2-6). (D) Shoots of T2 control Atlg27340 and 5mAtlg27340 plants. At right are close up views of inflorescences showing reduction in silique number in 5mAtlg27340 plants. The inset show the presence of filaments on 5mAtig27340 shoots where phyllotaxy suggests siliques should be. (E) Inflorescences of control Atlg27340 containing flowers in various developmental stages and inflorescences of 5mAtlg27340 plants containing numerous abortive filaments, a few empty flowers (arrows), as well as some reproductively functional flowers. Figure 3. Seedling phenotypes of 5mAtlg27340 plants (A) Six day old control T2 Atlg27340 seedlings have the first pair of true leaves emerging from the shoot apical meristem. (B) Some T2 5mAtlg27340 seedlings display a variety of 111 developmental abnormalities, including having one radicalized cotyledon (1), two radicalized cotyledons (2), one flat cotyledon (4), and two flat cotyledons but no apparent true leaves (4). (C) Atig27340 mRNA overaccumulates in 5mAtlg27340 plants. 12 ug of total RNA from control Atlg27340 and 5mAtlg27340 rosette leaves (L), inflorescences (Inf), and seedlings (Se, SAM-), was analyzed by Northern blot using a body labeled RNA probe complementary to most of exon 2. For 5mAtlg27340, RNA was isolated separately from seedlings with (Se) and without (SAM-) evident shoot apical meristems. The levels of Atlg27340 mRNA were quantified relative to the ethidium bromide staining of the 25S ribosomal RNA. 112 Figure 1 At 9g27340 A miR394 C Yyy FY Ff y 5 blast E value yAI Y (rank) Atlg27340 Arabicdopsis 5s .. . UFO Arabicdopsis fgenesh4_pm.C_LG_111000589 Populus estExt Genewisel vl.CLG 17715 PopulUS CB292711 CD476694 AW351311 BQ971555 BJ571294 BQ874161 5' ... c 1 ~c cig EIu la a ~ 5'... ~. 5... lu Citrus Eschs scholzia 5 I .'. . Glycir le 5'...i nthus 5'... _ CX543200 Poncirus 5' ... PrunuS 5... u _ 3.7-62(1) .3 4.4-59(1) 9.1-91(1) . ..3 33.36(1) .3 7.5-70(1) ..3 1.9-74(1) .3 7.1- 3 6 (1) .. 3 5.9-96(1) .. 3 .. 3 7.4-57(1 ) 2.7-142(1) .. 3 4.9-6(1 ) _ ---- R' -- ar-lyr ... 5'... .. 3 5... 5'... ._ ~~ ~~ ·r-JUQlUlli~ 5... _ .. 3 .. 3 .3 5' ... 3 5 ... - -------~-- HIT MAMldYs ..3 .3 5'... 5'... [QLWlbla " 5'... a AL-I U. .3 I .3 IlRI rYI 4.3-174(1) 2.7-'00°°(1 ) 3. 0-69(1) 3 ~ - 6.721(3) 2.0-153(1) 3 - 5 5 .3 _ Solan um 5 ...u Zantedeschia ... Avena Gossypium Hordeum Pennisetum Saccharum Sorghum Triticum Zea Linum Picea .. 3 _. 5'..._u 5,... .. 3 u __- 5 '... Os01g69940 Oryza c .3 .3 Heliar 5 ... CN820826 BQ407881 AW982846 BM084705 CA076958 BE366831 BE427348 A1438876 CA482544 CO0204356 .3 5'... Ipomc oea Lactui ca CN580831 Malus CV049137 BF153392 AJ700842 B 3' 8.6-63(1) 8.2-81(1) 1.0-43(1) 1.7-57(1) 2.7 52(1) 2.451(1) 1.8-44(1) 4.8-62(1) 9.2-43(1) Atlg27340 mRNA Spel Nsil 11111 1.6 kb 5' 0.9 kb 3' 0~~~~ Atlg273405'.. miR394 3' 5m-Atlg27340 5, .. E . G .. a V ? D TJ AI R M A . P ~?V 2:U 116 ,u. .c cx Atr1r g32lc I. . . 33' Figure 2 D At1g27340 E Atlg27340 At 1g27340 5mAtlg27340 Col 5mAtlg27340 Col 5mAt1 g27340 5mAt1g27340 Atlg27340 5mAtlg27340 5mAtlg27340 Atlg27340 5mAtlg27340 Figure 3 A D I-' D Atl g27340 Atlg27340 5mAt1g27340 L Inf Se L Inf Se SAM- 4- D 0 0 0 CC arC a: :: miR394 Atlg27340 4- arcu -J E ¸ iiiii; 4i!i!! 75: C U1) U O Jiz 0U) _ -: -21 -18 I1 U6 25S 3.3 1.0 2.0 5.6 2.8 4.3 7.8 At1 g27340/25S I' Aida, M., Ishida, T., and Tasaka, M. (1999). Shoot apical meristem and cotyledon formation during Arabidopsis embryogenesis: interaction among the CUP-SHAPED COTYLEDON and SHOOT MERISTEMLESS genes. Development 126, 1563-1570. Aida, M., Ishida, T., Fukaki, H., Fujisawa, H., and Tasaka, M. (1997). Genes involved in organ separation in Arabidopsis: an analysis of the cup-shaped cotyledon mutant. Plant Cell 9, 841-857. Axtell, M.J., and Bartel, D.P. (2005). Antiquity of MicroRNAs and Their Targets in Land Plants. Plant Cell 17, 666-99999. Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-297. Baurle, I., and Laux, T. (2003). Apical meristems: the plant's fountain of youth. Bioessays 25, 961-970. Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. (1998). AGO1 defines a novel locus of Arabidopsis controlling leaf development. Embo J 17, 170-180. Boutet, S., Vazquez, F., Liu, J., Beclin, C., Fagard, M., Gratias, A., Morel, J.B., Crete, P., Chen, X., and Vaucheret, H. (2003). Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13, 843-848. Chapman, E.J., Prokhnevsky, A.I., Gopinath, K., Dolja, V.V., and Carrington, J.C. (2004). Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18, 1179-1186. Chen, J., Li, W.X., Xie, D., Peng, J.R., and Ding, S.W. (2004). Viral virulence protein suppresses RNA silencing-mediated defense but upregulates the role of microrna in host gene expression. Plant Cell 16, 1302-1313. Chen, X., Liu, J., Cheng, Y., and Jia, D. (2002). HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129, 1085-1094. Clough, S.J., and Bent, A.F. (1998). Floral dip: a simplified method for Agrobacteriummediated transformation of Arabidopsis thaliana. Plant J 16, 735-743. Deshaies, R.J. (1999). SCF and Cullin/Ring H2-based ubiquitin ligases. Annu Rev Cell Dev Biol 15, 435-467. Dunoyer, P., Lecellier, C.H., Parizotto, E.A., Himber, C., and Voinnet, 0. (2004). Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16, 1235-1250. Emery, J.F., Floyd, S.K., Alvarez, J., Eshed, Y., Hawker, N.P., Izhaki, A., Baum, S.F., and Bowman, J.L. (2003). Radial patterning of Arabidopsis shoots by class III HD-ZIP and KANADI genes. Curr Biol 13, 1768-1774. Endrizzi, K., Moussian, B., Haecker, A., Levin, J.Z., and Laux, T. (1996). The SHOOT MERISTEMLESS gene is required for maintenance of undifferentiated cells in Arabidopsis shoot and floral meristems and acts at a different regulatory level than the meristem genes WUSCHEL and ZWILLE. Plant J 10, 967-979. 116 Eshed, Y., Baum, S.F., Perea, J.V., and Bowman, J.L. (2001). Establishment of polarity in lateral organs of plants. Curr Biol 11, 1251-1260. Floyd, S.K., and Bowman, J.L. (2004). Gene regulation: ancient microRNA target sequences in plants. Nature 428, 485-486. Gagne, J.M., Downes, B.P., Shiu, S.H., Durski, A.M., and Vierstra, R.D. (2002). The F-box subunit of the SCF E3 complex is encoded by a diverse superfamily of genes in Arabidopsis. Proc Natl Acad Sci U S A 99, 11519-11524. Gagne, J.M., Smalle, J., Gingerich, D.J., Walker, J.M., Yoo, S.D., Yanagisawa, S., and Vierstra, R.D. (2004). Arabidopsis EIN3-binding F-box 1 and 2 form ubiquitin-protein ligases that repress ethylene action and promote growth by directing EIN3 degradation. Proc Natl Acad Sci U S A 101, 6803-6808. Gray, W.M., Kepinski, S., Rouse, D., Leyser, O., and Estelle, M. (2001). Auxin regulates SCF(TIR1)-dependent degradation of AUX/IAA proteins. Nature 414, 271-276. Guo, H., and Ecker, J.R. (2003). Plant responses to ethylene gas are mediated by SCF(EBF1/EBF2)-dependent proteolysis of EIN3 transcription factor. Cell 115, 667-677. Han, M.H., Goud, S., Song, L., and Fedoroff, N. (2004). The Arabidopsis double-stranded RNA-binding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci U S A 101, 1093-1098. Jones-Rhoades, M.W., and Bartel, D.P. (2004). Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell 14, 787-799. Kasschau, K.D., Xie, Z., Allen, E., Llave, C., Chapman, E.J., Krizan, K.A., and Carrington, J.C. (2003). P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA unction. Dev Cell 4, 205-217. Kerstetter, R.A., Bollman, K., Taylor, R.A., Bomblies, K., and Poethig, R.S. (2001). KANADI regulates organ polarity in Arabidopsis. Nature 411, 706-709. Kidner, C.A., and Martienssen, R.A. (2004). Spatially restricted microRNA directs leaf polarity through ARGONAUTEI. Nature 428, 81-84. Laux, T., Mayer, K.F., Berger, J., and Jurgens, G. (1996). The WUSCHEL gene is required for shoot and floral meristem integrity in Arabidopsis. Development 122, 87-96. Llave, C., Xie, Z., Kasschau, K.D., and Carrington, J.C. (2002). Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053-2056. Long, J.A., Moan, E.I., Medford, J.I., and Barton, M.K. (1996). A member of the KN07TED class of homeodomain proteins encoded by the STM gene of Arabidopsis. Nature 379, 66-69. Lu, C., and Fedoroff, N. (2000). A mutation in the Arabidopsis HYL1 gene encoding a dsRNA binding protein affects responses to abscisic acid, auxin, and cytokinin. Plant Cell 12, 2351-2366. Mallory, A.C., Bartel, D.P., and Bartel, B. (2005). microRNA-Directed Regulation of Arabidopsis AUXIN RESPONSE FACTOR1 7 Is Essential for Proper Development and Modulates Expression of Early Auxin Response Genes. Plant Cell 17. Mallory, A.C., Dugas, D.V., Bartel, D.P., and Bartel, B. (2004a). MicroRNA regulation of NAC-domain targets is required for proper formation and separation of adjacent embryonic, vegetative, and floral organs. Curr Biol 14, 1035-1046. Mallory, A.C., Reinhart, B.J., Bartel, D., Vance, V.B., and Bowman, L.H. (2002). A viral suppressor of RNA silencing differentially regulates the accumulation of short interfering RNAs and micro-RNAs in tobacco. Proc Natl Acad Sci U S A 99, 15228-15233. 117 Mallory, A.C., Reinhart, B.J., Jones-Rhoades, M.W., Tang, G., Zamore, P.D., Barton, M.K., and Bartel, D.P. (2004b). MicroRNA control of PHABULOSA in leaf development: importance of pairing to the microRNA 5' region. Embo J 23, 3356-3364. Mallory, A.C., Ely, L., Smith, T.H., Marathe, R., Anandalakshmi, R., Fagard, M., Vaucheret, H., Pruss, G., Bowman, L., and Vance, V.B. (2001). HC-Pro suppression of transgene silencing eliminates the small RNAs but not transgene methylation or the mobile signal. Plant Cell 13, 571-583. Mayer, K.F., Schoof, H., Haecker, A., Lenhard, M., Jurgens, G., and Laux, T. (1998). Role of WUSCHEL in regulating stem cell fate in the Arabidopsis shoot meristem. Cell 95, 805-815. Morel, J.B., Godon, C., Mourrain, P., Beclin, C., Boutet, S., Feuerbach, F., Proux, F., and Vaucheret, H. (2002). Fertile hypomorphic ARGONAUTE (agol) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14, 629-639. Palatnik, J.F., Allen, E., Wu, X., Schommer, C., Schwab, R., Carrington, J.C., and Weigel, D. (2003). Control of leaf morphogenesis by microRNAs. Nature 425, 257-263. Park, M.Y., Wu, G., Gonzalez-Sulser, A., Vaucheret, H., and Poethig, R.S. (2005). Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci U S A 102, 3691-3696. Park, W., Li, J., Song, R., Messing, J., and Chen, X. (2002). CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12, 1484-1495. Potuschak, T., Lechner, E., Parmentier, Y., Yanagisawa, S., Grava, S., Koncz, C., and Genschik, P. (2003). EIN3-dependent regulation of plant ethylene hormone signaling by two arabidopsis F box proteins: EBF1 and EBF2. Cell 115, 679-689. Prigge, M.J., Otsuga, D., Alonso, J.M., Ecker, J.R., Drews, G.N., and Clark, S.E. (2005). Class III Homeodomain-Leucine Zipper Gene Family Members Have Overlapping, Antagonistic, and Distinct Roles in Arabidopsis Development. Plant Cell 17, 61-76. Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). MicroRNAs in plants. Genes Dev 16, 1616-1626. Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel, D.P. (2002). Prediction of plant microRNA targets. Cell 110, 513-520. Ruegger, M., Dewey, E., Gray, W.M., Hobbie, L., Turner, J., and Estelle, M. (1998). The TIR1 protein of Arabidopsis functions in auxin response and is related to human SKP2 and yeast grrlp. Genes Dev 12, 198-207. Samach, A., Klenz, J.E., Kohalmi, S.E., Risseeuw, E., Haughn, G.W., and Crosby, W.L. (1999). The UNUSUAL FLORAL ORGANS gene of Arabidopsis thaliana is an F-box protein required for normal patterning and growth in the floral meristem. Plant J 20, 433445. Sasaki, A., Itoh, H., Gomi, K., Ueguchi-Tanaka, M., Ishiyama, K., Kobayashi, M., Jeong, D.H., An, G., Kitano, H., Ashikari, M., and Matsuoka, M. (2003). Accumulation of phosphorylated repressor for gibberellin signaling in an F-box mutant. Science 299, 1896-1898. Schauer, S.E., Jacobsen, S.E., Meinke, D.W., and Ray, A. (2002). DICER-LIKEI: blind men and elephants in Arabidopsis development. Trends Plant Sci 7, 487-491. Smalle, J., and Vierstra, R.D. (2004). The ubiquitin 26S proteasome proteolytic pathway. Annu Rev Plant Biol 55, 555-590. 118 Stirnberg, P., van De Sande, K., and Leyser, H.M. (2002). MAX1 and MAX2 control shoot lateral branching in Arabidopsis. Development 129, 1131-1141. Sunkar, R., and Zhu, J.K. (2004). Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16, 2001-2019. Tang, G., Reinhart, B.J., Bartel, D.P., and Zamore, P.D. (2003). A biochemical framework for RNA silencing in plants. Genes Dev 17, 49-63. Telfer, A., and Poethig, R.S. (1998). HASTY: a gene that regulates the timing of shoot maturation in Arabidopsis thaliana. Development 125, 1889-1898. Vaucheret, H., Vazquez, F., Crete, P., and Bartel, D.P. (2004). The action of ARGONAUTEl in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18, 1187-1197. Vazquez, F., Gasciolli, V., Crete, P., and Vaucheret, H. (2004). The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14, 346-351. Wilkinson, M.D., and Haughn, G.W. (1995). UNUSUAL FLORAL ORGANS Controls Meristem Identity and Organ Primordia Fate in Arabidopsis. Plant Cell 7, 1485-1499. Willems, A.R., Schwab, M., and Tyers, M. (2004). A hitchhiker's guide to the cullin ubiquitin ligases: SCF and its kin. Biochim Biophys Acta 1695, 133-170. Woo, H.R., Chung, K.M., Park, J.H., Oh, S.A., Ahn, T., Hong, S.H., Jang, S.K., and Nam, H.G. (2001). ORE9, an F-box protein that regulates leaf senescence in Arabidopsis. Plant Cell 13, 1779-1790. Xie, D.X., Feys, B.F., James, S., Nieto-Rostro, M., and Turner, J.G. (1998). COII: an Arabidopsis gene required for jasmonate-regulated defense and fertility. Science 280, 1091-1094. Zheng, N., Schulman, B.A., Song, L., Miller, J.J., Jeffrey, P.D., Wang, P., Chu, C., Koepp, D.M., Elledge, S.J., Pagano, M., Conaway, R.C., Conaway, J.W., Harper, J.W., and Pavletich, N.P. (2002). Structure of the Cull-Rbxl-Skpl-F boxSkp2 SCF ubiquitin ligase complex. Nature 416, 703-709. Zhong, R., and Ye, Z.H. (2004). Amphivasal vascular bundle 1, a gain-of-function mutation of the IFL1/REV gene, is associated with alterations in the polarity of leaves, stems and carpels. Plant Cell Physiol 45, 369-385. 119 .Appendix 1. Conserved miRNA target sites in Arabidopsis, Oryza and Populus The sequence and score (see Jones-Rhoades & Bartel, 2004 Molecular Cell 14(6):787-99) is listed for miRNA complementary sites within predicted miRNA targets of three plant species. Some complementary sites occur adjacent to annotated gene models, especially in Populus; this iis indicated as "In annotation" (Y or N). rniRNA family targetgene TargetFamily Score miRNAcomplementary sequence species In annotation? rniR156 At1g27360.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y rniR156 At1g27370.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y rniR156 Atlg53160.1 SBP 2 CUGCUCUCUCUCUUCUGUCA Arabidopsis Y rniR156 Atlg69170.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At2g33810.1 SBP 1.5 UUGCUUACUCUCUUCUGUCA Arabidopsis Y miR156 At2g42200.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y nniR156 At3g15270.1 SBP 3 CCGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At3g57920.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y mniR156 At5g43270.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At5g50570.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 At5g50670.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Arabidopsis Y miR156 Os01g69830 SBP 0 UGUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os02g04680 SBP 1 AUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os02g07780 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os04g46580 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os06g45310 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os06g49010 SBP 0 GUGCUCUCUCUCUUCUGUCA Oryza Y miR156 Os07g32170 SBP 2 AUGCUCCCUCUCUUCUGUCA Oryza Y miR156 Os08g39890 SBP 0 UGUGCUCUCUCUCUUCUGUCA Oryza Y miR1516 Os08g41940 SBP 0 UGUGCUCUCUCUCUUCUGUCA Oryza Y miR156 estExt_Genewisel_v1 .C_1240186 SBP 1 AUGCUCUCUCUCUUCUGUCA Populus Y miR156 estExtGenewisel_vl.C_LGXV2187 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y miR156 eugene3.001 :20942 SBP 2 GCGCUCUCUCUCUUCUGUCA Populus Y miR156 eugene3.001 60416 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y miR156 fgenesh4pg.C_LG_11001 303 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y mriRl56 fgenesh4_pg.C_LG_X001404 SBP 1 GUGCUCUCUCUCUCUGUCA Populus Y miR156 grail3.001 0026801 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus Y mliR156 gw1.107.39.1 SBP 1.5 AUGCUCCCUCUCUUCUGUCA Populus N miR156 gw1.129.152.1 SBP 0.5 GUGCUCGCUCUCUUCUGUCA Populus N miR156 gw1.164.76.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.40.76.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.1.7783.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.111.2396.1 SBP 1 GUGCUCUCUCUCUUCUGUCA Populus N miR156 gw1.IV.3037.1 SBP 1.5 AUGCUCUCUCUCUWCUGUCA Populus N miR156 gwl.VII.548.1 SBP 2 UUGCUCUCUCUCUUCUGUCA Populus N miR156; gwl.XI.3794.1 SBP 1.5 AUGCUCCCUCUCUUCUGUCA Populus N miR159 At2g26950.1 MYB 1.5 UGGAGCUCCCUUCAUUCCAAG Arabidopsis Y miR159 At2g26960.1 MYB 2.5 UCGAGUUCCCUUCAUUCCAAU Arabidopsis Y miR159 At2g32460.1 MYB 1.5 UAGAGCUUCCUUCAAACCAAA Arabidopsis Y mrniR159 At3g11440.1 MYB 1.5 UGGAGCUCCCUUCAUUCCAA Arabidopsis Y miR159 At3g60460.1 MYB 1.5 UGGAGCUCCAUUCGAUCCAAA Arabidopsis Y miR159 At4g26930.1 MYB 2.5 AUGAGCUCUCUUCAAACCAAA Arabidopsis Y miR159 At5gO6100.1 MYB 1.5 UGGAGCUCCCUUCAUUCCAA Arabidopsis Y MYB 2 AGCAGCUCCCUUCAAACCAAA Arabidopsis Y miR159 At5g55020.1 120 miR159 Os01g59660 MYB 1 UGGAGCUCCCUUCACUCCAAG miR159 Os03g38210 MYB 2 CCGAGCUCCCUUCAAGCCAAU Oryza miR159 Os04g46390 MYB 1.5 UGGAGCUCCAUUCGAUCCAAA Oryza miR159 Os5g41 170 MYB 0.5 miR159 0s06g40330 MYB 1 miR159 Os06g46560 MYB 2.5 GCGAGCUCCCUUCGAACCAAU Oryza miR159 fgenesh4_pm. C_LG_I 11000641 MYB 2.5 UGGAGCUCUAUUCGGUCCAAA Populus miR159 fgenesh4_pm.C_scaffold_40000020 MYB 1.5 Oryza UGGAGCUCCCUUUAAUCCAAU Oryza UAGAGCUCCCUUCACUCCAAU Oryza UGGAGCUCCAUUCGAUCCAAA Populus miR159 gwl. 1.6885.1 MYB 1 UGGAGCUCCCUUCACUCCAAU Populus miR159 gwl .1.9701.1 MYB 1 UAGAGCUCCCUUCACUCCAAU Populus miR159 gwl.111.41.1 MYB 0 UUGAGCUCCCUUCACUCCAAU Populus miR159 Atlg30210.1 TCP 2.5 AGGGGGACCCUUCAGUCCAA Arabidopsis miR159 At1g53230.1 TCP 3 AGGGGUCCCCUUCAGUCCAU Arabidopsis miR159 At2g31070.1 TCP 2.5 AGGGGUACCCUUCAGUCCAG Arabidopsis miR159 At3g15030.1 TCP 2.5 AGGGGUCCCCUUCAGUCCAG Arabidopsis miR159 At4g18390.1 TCP 2.5 AGGGGGACCCUUCAGUCCAA Arabidopsis miR159 TCP 3.5 AGGGGACCCCUUCAGUCCAGU Oryza miR159 0s03g57190 TCP 2.5 AGGGGGACCCUUCAGUCCAA miR159 Os07g05720 TCP 2.5 AGGGGGACCCUUCAGUCCAA Oryza miR159 TCP 2.5 CGGGGCACACUUCAGUCCAA Oryza miR159 eugene3.0011 0429 TCP 2.5 AGGGGGACCCUUCAGUCCAA Populus miR159 eugene3.00110631 TCP 3 AGGGGAACCCUUCAGUCCAG Populus miR159 eugene3.00121020 TCP 2.5 AGGGGGACCCUUCAGUCCAA Populus miR159 eugene3.00190830 TCP 3 AGGGGGCCCCUUCAGUCCAG miR159 eugene3.00410019 TCP 3 AGGGGAACCCUUCAGUCCAG Populus Populus miR159 grail3.0032015302 TCP 3 AGGGGACCCCUUCAGUCCAG miR159 gwl .IV.2486.1 TCP 3 miR160 At1g77850.1 ARF 0.5 miR160 At2g28350.1 ARF 1 miR160 At4g30080.1 ARF 1.5 miR160 0s02g41800 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 0s04g43910 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 Os04g59430 ARF 1 UGACAUUCAGGGAGCCAGGCA Oryza miR160 0s06g47150 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 Os10g33940 ARF 0 AGGCAUACAGGGAGCCAGGCA Oryza miR160 estExtfgenesh4pg.C_LG_V0901 ARF 0.5 miR160 0 miR160 estExt_fgenesh4_pm.C_LG_X0888 ARF estExt_fgenesh4_pm.C_LG_XVI0323ARF miR160 Os01g11550 Os12g42190 Oryza Populus AUGAGCUCCCUCCACUCAACPopulus UGGCAUGCAGGGAGCCAGGCA Arabidopsis AGGAAUACAGGGAGCCAGGCA Arabidopsis GGGUUUACAGGGAGCCAGGCA Arabidopsis UGGCAUGCAGGGAGCCAGGCA Populus AGGCAUACAGGGAGCCAGGCA Populus 0.5 UGGCAUGCAGGGAGCCAGGCA Populus eugene3.00660262 ARF 0 AGGCAUACAGGGAGCCAGGCA Populus miR160 fgenesh4_pg.C_LG_11000830 ARF 0.5 miR160 fgenesh4_pg.C_LG_X001411 ARF 0 miR160 fgenesh4_pg.C_LG_VI11000301 ARF 0 UGGCAUGCAGGGAGCCAGGCA Populus AGGCAUACAGGGAGCCAGGCA Populus AGGCAUACAGGGAGCCAGGCA Populus miR160 gw1.28.631.1 ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR160 gw1.28.632.1 ARF 0.5 UGGCAUGCAGGGAGCCAGGCA Populus miR161 At1g06580.1 PPR 2 CCCGGAUGUAAUCACUUUCAG Arabidopsis miR161 CCCUGAUGUAUUCACUUUCAG Arabidopsis Atl g62670.1 PPR 1.5 rniR161 At1g62720.1 PPR 2.5 CCCCGAUGUAGUGACUUAUAA Arabidopsis rniR161 At1g63080.1 PPR 2 UCCAAAUGUAGUCACUUUCAA Arabidopsis rniR161 At1g63150.1 PPR 2.5 rniR161 At1g63400.1 PPR 2 CCCCAAUGUUGUUACUUUCAA Arabidopsis UCCAAAUGUAGUCACUUUCAA Arabidopsis 121 At1g64580.1 PPR miR161 At5g16640.1 miR161 1.5 CCCUGAUGUUGUCACUUUCAC Arabidopsis Y PPR 2 CCCUGAUGUAUUUACUUUCAA Arabidopsis Y miR161 At5g41170.1 PPR 1.5 ACCUGAUGUAAUCACUUUCAA Arabidopsis Y miR162 At gOl 01040.1 DCL 2 CUGGAUGCAGAGGUAUUAUCGAArabidopsis Y Os03g02970 DCL 2 CUGGAUGCAGAGGUUUUAUCG Oryza Y miR162 eugene3.00021687 DCL 2 CUGGAUGCAGAGGUCUUAUCG Populus miR163 At1g66690.1 SAMT 0.5 At1966700.1 SAMT 0.5 miR163 Atlg66720.1 SAMT 1 miR162 miR163 Arabidopsis AUCGAGUUCCAAGUCCUCUUCAA Arabidopsis AUCGAGUUCCAAGUCCUCUUCAA AUCGAGUUCCAGGUCCUCUUCAA Arabidopsis Arabidopsis AUCGAGUUCCAAGUUUUCUUCAA y y y y y yY miR163 At3g44860.1 SAMT 1.5 miR163 At3g44870.1 SAMT 1.5 miR164 At1g56010.1 NAC 1 AGCACGUACCCUGCUUCUCCA Arabidopsis miR164 At3g15170.1 NAC 1 AGCACGUGUCCUGUUCUCCA Arabidopsis miR164 At5g07680.1 NAC 1.5 miR164 At5g39610.1 NAC 2 CUCACGUGACCUGCUUCUCCG Arabidopsis miR164 At5g53950.1 NAC 1 AGCACGUGUCCUGUUUCUCCA At5g61430.1 NAC 1.5 miR164 Os02g36880 NAC 1 CGCACGUGACCUGCUUCUCCA Oryza miR164 Os04g38720 NAC 1 CGCACGUGACCUGCUUCUCCA Oryza miR164 0s06g23650 NAC 1 AGCUCGUGCCCUGCUUCUCCA Oryza miR164 Os06g46270 NAC 1 AGCAAGUGCCCUGCUUCUCCA Oryza miR164 Os08g10080 NAC 1.5 miR164 Os12g41680 NAC 1 miR164 eugene3.00150202 NAC 1.5 CCUACGUGCCCUGCUUCUCCA Populus 1.5 CCUACGUGCCCUGCUUCUCCA Populus y y N Y miR164 Arabidopsis AUCGAGUUCCAAGUUUUCUUCAA UUUACGUGCCCUGCUUCUCCA Arabidopsis Arabidopsis UCUACGUGCCCUGCUUCUCCA Arabidopsis AGCAAGUGUCCUGCUUCUCCG Oryza AGCAAGUGCCCUGCUUCUCCA Oryza C_LG_XI1000069 miR164 fgenesh4_pm. NAC miR164 gw1.107.10.1 NAC 1 AGCACGUGUCCUGUUUCUCCA Populus miR164 gwl.V.3536.1 NAC 1 AGCAAGUGCCCUGCUUCUCCA Populus miR164 gwl .VII1.2722.1 NAC 1 AGCAAGUGCCCUGCUUCUCCA Populus miR164 gwl .XI.3766.1 NAC 1 AGCACGUGUCCUGUUUCUCCA Populus miR166 At1g30490.1 HD-ZIP 1.5 UUGGGAUGAAGCCUGGUCCGG Arabidopsis miR166 At1g52150.1 HD-ZIP 1.5 CUGGAAUGAAGCCUGGUCCGG Arabidopsis miR166 At2g3471 0.1 HD-ZIP 1.5 UUGGGAUGAAGCCUGGUCCGG Arabidopsis miR166 At4g32880.1 HD-ZIP 1.5 CUGGGAUGAAGCCUGGUCCGG miR166 At5g60690.1 HD-ZIP 1.5 CUGGGAUGAAGCCUGGUCCGG miR166 Os03g01890 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG miR166 Os03g43930 HD-ZIP 2 UUGGGAUGAAGCCUGGUCCGG Oryza HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Oryza *miR166 Osl2g41860 HD-ZIP 2 UUGGGAUGAAGCCUGGUCCGG Oryza miR166 estExt_fgenesh4_pg.C_2360002 HD-ZIP 3 UUGGGAUGAAGCCUGGUCCAG Populus miR166 estExt_fgenesh4_pg.C_LG_12905 HD-ZIP 2.5 UUGGUAUGAAGCCUGGUCCGG Populus miR166 estExtfgenesh4_pg.C_LG_1110436 HD-ZIP 1.5 CUGGAAUGAAUGAAGCCUGGUCCGG Populus miR166 HD-ZIP 3 estExt_fgenesh4_pm.C_LG_V1071 2 CUGGGAUGAAGCCUGGUCCGG Populus miR166 estExt_Genewisel _vl .C_660759 HD-ZIP 1.5 miR166 fgenesh4_pg.C_LG_XVI11000250 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus miR166 C_LG_1000560 fgenesh4_pm. HD-ZIP 1.5 CUGGAAUGAAGCCUGGUCCGG Populus miR166 Populus gw1.6326.1.1 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus rniR166 gwl .IX.4748.1 HD-ZIP 2 CUGGGAUGAAGCCUGGUCCGG Populus rniR167 Atlg30330.1 ARF 2 GAGAUCAGGCUGGCAGCUUGU Arabidopsis rniR167 At5g37020.1 ARF 2 UAGAUCAGGCUGGCAGCUUGU Arabidopsis rniR167 Os02g06910 ARF 3 GAGAUCAGGCUGGCAGCUUGU Oryza 122 y yN y y y Arabidopsis y Arabidopsis y y Oryza Y miR166 Os109g33960 CUGGAAUGAAGCCU GGUCCGG yY Y yY yY y y y y yY Y yY yY Y y Y y y Y miR167 Os04g57610 ARF 2 UAGAUCAGGCUGGCAGCUUGU Oryza Y miR167 Os06g46410 ARF 3 GAGAUCAGGCUGGCAGCUUGU Oryza Y miR167 Os12941950 ARF 3 AAGAUCAGGCUGGCAGCUUGU Oryza Y miR167 estExt_Genewisel _vl .C_LG_110777 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 estExt_Genewiselvl .C_LG_XI2869 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 fgenesh4_pg.C_LG_1002802 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 fgenesh4_pg.C_scaffold_1 006000001 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR167 gw1.44.432.1 ARF 2 UAGAUCAGGCUGGCAGCUUGU Populus Y miR167 gw1.IV.3880.1 ARF 2 UAGAUAG GGCUGGCAGCUUGU Populus Y miR167 gwl .V.806.1 ARF 3 GAGAUCAGGCUGGCAGCUUGU Populus Y miR168 At1g48410.1 AGO 2.5 UUCCCGAGCUGCAUCAAGCUA Arabidopsis Y miR168 Os02g45070 AGO 0 UUCCCGAGCUGCACCAAGCCU Oryza N miR168 Os02g58490 AGO 2.5 CUCCCGAGCUGCGCCAAGCAA Oryza Y miR168 Os04g47870 AGO 0 UUCCCGAGCUGCACCAAGCCC Oryza Y miR168 Os04g52540 AGO 3 UUCGCCCGCUGCACCAAGCCG Oryza Y miR168 Os04g52550 AGO 3 UUCGCCCGCUGCACCAAGCCG Oryza Y miR168 Os06g51310 AGO 3 CUCCCGAGCUGCUCCAAGCAA Oryza Y miR168 grail3.0031006602 AGO 3 CACCCGAGCUGCACCAAGCUA Populus N miR168 grail3.0122002801 AGO 3 CACCCGAGCUGCACCAAGCUA Populus N miR169 At1g917590.1 CCAAT 1.5 AAGGGAAGUCAUCCUUGGCUG Arabidopsis Y miR169 Atlg54160.1 CCAAT 2 ACGGGAAGUCAUCCUUGGCUA Arabidopsis Y miR169 Atlg72830.1 CCAAT 1.5 AGGGGAAGUCAUCCUUGGCUA Arabidopsis Y miR169 At3g05690.1 CCAAT 1.5 AGGCAAAUCAUCUUUGGCUCA Arabidopsis Y miR169 At3g14020.1 CCAAT 2.5 UAGCCAAGGAUGACuUCCCU Arabidopsis Y miR169 At3g20910.1 CCAAT 2 CGGCAAUUCAUUCUUGGCUUU Arabidopsis N miR169 At5g0651 0.1 CCAAT 1.5 AGGCAAAUCAUCUUUGGCUCA Arabidopsis Y miR169 At5g12840.1 CCAAT 1.5 CCGGCAAAUCAUUCUUGGCUU Arabidopsis Y miR169 Os03g07880 CCAAT 2.5 AUGGCAAAUCAUCCUUGGCUU Oryza Y miR169 Os03g29760 CCAAT 1.5 GUGGCAAUUCAUCCUUGGCUU Oryza Y miR169 Os03g44540 CCAAT 1 Oryza Y miR169 Os03g48970 CCAAT 1.5 CAGGCAAUUCAUUCUUGGCUU Oryza Y miR169 Os07g06470 CCAAT 1 miR169 Os07g41720 CCAAT 1.5 miR169 1 UAGGCAACUCAUUCUUGGCUG 1 CAGGCAAUUCAUCCUUGGCUU Populus Y miR169 Os12942400 CCAAT estExt_fgenesh4pg.C_LG_XVI110020 CCAAT eugene3.00011755 CCAAT 1.5 CAGGCAAUUCAUUCUUGGCUU Populus Y miR169 eugene3.00060980 CCAAT 1 CAGGCAAUUCAUCCUUGGCUU N miR169 eugene3.00061121 CCAAT 3 AGGGCAAGUCGUUCUUGGCUC Populus N miR169 eugene3.00091116 CCAAT 2 GCGGCAAAUCAUUCUUGGCUU Populus Y miR169 eugene3.00160615 CCAAT 2.5 AGGGCAAGUCGUUCUUGGCUC Populus N miR169 fgenesh4_pg.C_LG_IX000987 CCAAT 1.5 CAGGCAAUUCAUUCUUGGCUU Populus Y miR169 grail3.0024038301 CCAAT 2.5 UUGGCAAAUCAUUCUUGGCUU Populus N miR169 gw1..1522.1 CCAAT 2.5 GCGGCAAAUCAUUCUUGGCUU Populus N miR171 At2g45160.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Arabidopsis Y miR171 At3g60630.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Arabidopsis Y miR171 At4g00150.1 miR171 Os02g44360 SCL 0 GAUAUUGGCGCGGCUCAAUCA Arabidopsis Y SCL 0 GAUAUUGGCGCGGCGCGGCUCAAUCA Oryza Y miR171 Os02g44370 miR171 Os04g46860 SCL 0 GAUAUUGGCGCGGCUCAAUCA Oryza Y SCL 0 GAUAUUGGCGCGGCUCAAUCA Oryza Y imiR171 Os06g01620 SCL 0 GAUAUUGGCGCGGCUCAAUCA Oryza Y miR169 123 UAGGCAAAUCAUUCUUGGCUC Otyza Y GUGGCAAUUCAUCCUUGGCUU Oryza Y Oryza Y GUGGCAAUUCAUCCUUGGCUG Populus miR171 Os10g40390 SCL 0.5 miR171 estExt_Genewisel _vl .CLG_113184 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 eugene3.44860001 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 fgenesh4_pg.C_LG_11000787 SCL 1.5 miR171 gw1.127.243.1 SCL 0 miR171 gwl.40.23.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gw1.57.294.1 SCL 1 GAUAUUGGAACGGCUCAACGGC UCA miR171 gw1.11.1043.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gw1.111.2060.1 SCL 0 GAUAUUGGCGCGGCUCAAUCA Populus miR171 gwl .VI1.3405.1 SCL 2.5 GAUACUGGAACGGCUCAAUCA Populus miR172 At2g28550.1 AP2 1.5 miR172 At2g39250.1 AP2 1 UUGUAGCAUCAUCAGGAUUCC Arabidopsis miR172 At3g54990.1 AP2 1 UGCAGCAUCAUCAGGAUUCC Arabidopsis miR172 At4g36920.1 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 At5g60120.1 AP2 0.5 AUGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 At5g67180.1 AP2 1.5 UGGCAGCAUCAUCAGGAUUCU Arabidopsis miR172 Os03g60430 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Oryza miR172 0s04g55560 AP2 1 miR172 Os05g03040 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Oryza miR172 0s06g43220 AP2 0.5 CUGCAGCAUCAUCAGGAUUCC Oryza miR172 Os07g913170 AP2 0.5 CUGCAGCAUCAUCAGGAUUCU Oryza miR172 grail3.001 9003502 AP2 0.5 CUGCAGCAUCAUCAGGAUUCC Populus AP2 0.5 CUGCAGCAUCAUCAGGAUUCG Populus miR172 gw1.28.415.1 GAUAUUGGCGCGGCUCAAUUA Oryza GGUGAUAUUGG GGCGGCUCAA Populus GAUAUUGGCGCGGCUCAAUCA Populus Populus CAGCAGCAUCAUCAGGAUUCU Arabidopsis CUGCAGCAUCAUCACGAUUCC Oryza miR172 gwl .V.4061.1 AP2 0.5 miR172 gw1.VII.1637.1 AP2 1 miR172 gwl .X.2501.1 AP2 0.5 UUGCAGCAUCAUCAGGAUUCU Populus miR172 gwl .XVI.2655.1 AP2 0.5 CUGCAGCAUCAUCAGGAUUCG Populus miR393 Os08g41320 bHLH 3.5 ACCAAAAGAAUCACAUCGCCC Oryza miR393 At3g23690.1 bZIP 2 miR393 eugene3.00140963 bZIP 2.5 miR393 At1g912820.1 Fbox 1 AAACAAUGCGAUCCCUUUGGA Arabidopsis miR393 At3g2681 0.1 Fbox 1 AAACAAUGCGAUCCCUUUGGA Arabidopsis miR393 At3g62980.1 Fbox 1.5 AGACAAUGCGAUCCCUUUGGA Arabidopsis miR393 At4g03190.1 Fbox 2.5 AGACCAUGCGAUCCCUUUGGA Arabidopsis miR393 Os04g32460 Fbox 1.5 AGACAAUGCGAUCCCUUUGGA Oryza miR393 Os05g05800 Fbox 1.5 AGACAAUGCGAUCCCUUUGGA Oryza miR393 estExt_Genewisel_vl.C_880149 F-box 1 AAACAAUGCGAUCCCUUUGGA Populus miR393 eugene3.00012208 F-box 1 AAACAAUGCGAUCCCUUUGGA Populus miR393 eugene3.00110318 F-box 3 AGUCAAUGAGGUCACUUUGGA Populus miR393 eugene3.00140791 F-box 1.5 AGACAAUGCGAUCCCUUUGGA Populus miR393 eugene3.00141554 F-box 1.5 miR394 Atl 9g27340.1 Fbox 1 GGAGGUUGACAGAAUGCCAA Arabidopsis miR394 Os01g69940 Fbox 0 GGAGGUGGACAGAAUGCCAA Oryza miR394 estExtGenewisel_vl .C_LG_17715 F-box 1 GGAGGUUGACAGAAUGCCAA Populus miR394 fgenesh4_pm. C_LG_111000589 F-box 1 GGAGGUUGACAGAAUGCCAA Populus miR395 At3g22890.1 APS 1.5 GAGUUCCUCCAAACUCUUCAU Arabidopsis miR395 At4g14680.1 APS 1.5 GAGUUCCUCCAAACUCUUCAU Arabidopsis miR395 At5g43780.1 APS 0.5 APS 0.5 GAGUUCCUCCAAACACUUCAU Arabidopsis GAGUUCCUCCAAGCACUUCAU Oryza miR395 estExtGenewisel_vl .C_LG_VI112439APS 1.5 GAGUUCCUCCAAACUCUUCAU Populus miR395 Os03g53230 124 CUGCAGCAUCAUCAGGAUUCU Populus UUGCAGCAUCAUCAGGAUUCU Populus GGUCAGAGCGAUCCCUUUGGC Arabidopsis GAUCAGAGCGAUCCCUUUGAG Populus AGACAAUGCGAUCCCUUUGGA Populus miR395 grail3.0175000802 APS 0.5 GAGUUCCUCCAAACACUUCAU Populus miR395 At5gl 0180.1 S transporter 1.5 AAGUUCUCCCAAACACUUCAA Arabidopsis miR395 Os03g09930 S transporter 1 GAGUUCACCCAAACACUUCAG miR395 Os03g09940 S transporter 0 GAGUUCCCCCAAACACUUCAG Oryza Oryza GAGUUCCCUCAAGCACUUCAA Populus miR395 estExt_fgenesh4_pm.C_LG_110422 S Transporter 2.5 eugene3.00070572 S Transporter 1 GAGUUUUCCCAAACACUUCAA miR395 fgenesh4_pm.C_LG_V000080 S Transporter 3 UAUUUCCCCUGAACACUUCAA Populus miR396 At2g22840.1 GRF 3 UCGUUCAAGAAAGCCUGUGGAAArabidopsis miR396 At2g36400.1 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGAA Arabidopsis miR396 At2g45480.1 GRF 3 ACGUUCAAGAAAGCUUGUGGAAArabidopsis miR396 At3g52910.1 GRF 3 CCGUUCAAGAAAGCCUGUGGAAArabidopsis miR396 At4g24150.1 GRF 3 UCGUUCAAGAAAGCAUGUGGAAArabidopsis miR396 At4g37740.1 GRF 3 UCGUUCAAGAAAGCCUGUGGAAArabidopsis miR396 At5g53660.1 GRF 3 UCGUUCAAGAAAGCAUGUGGAAArabidopsis miR396 Os02g45570 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGA Oryza miR396 Os02g47280 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os02g53690 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGA Oryza miR396 Os03g47140 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os03g51970 GRF 3 CCGUUCAAGAAAGCAUGUGGA Oryza miR396 Os04g51190 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os06g02560 GRF 3 CCGUUCAAGAAAGCCUGUGGA Oryza miR396 Os 1g35030 GRF 3 UCGUUCAAGAAAGAAAGCAUGUGGA Oryza miR396 Os12g29980 GRF 3 CCGUUCAAGAAAGCAUGUGGA Oryza miR396 estExt_Genewisel_v.C_290455 GRF 3 CCGUUCAAGAAAGCCUGUGGA Populus miR396 eugene3.00010995 GRF 3 GCGUUCAAGAAAGCUUGUGGA Populus miR396 eugene3.00011018 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGA Populus miR396 eugene3.00021070 GRF 3 UCGUUCAAGAAAGAAAGCCUGUGGA Populus miR396 fgenesh4_pg. C_LG_1000725 GRF 3 CCGUUCAAGAAAGCCUGUGGA Populus miR396 fgenesh4_pg.C_LG_XI1000270 GRF 3 CCGUUCAAGAAAGAAAGCAUGUGGA Populus miR396 fgenesh4_pg. C_LG_XIV000034 GRF 3 UCGUUCAAGAAAGCCUGUGGA Populus miR396 fgenesh4_pm.C_scaffold_28000142 GRF 3 CCGUUCAAGAAAGAAAGCCUGUGGA Populus miR396 gwl .XIV.854.1 GRF 3 ACGUUCAGAAAGAAAGCUGUGGA Populus miR396 At2g40760.1 Rhodenase 2.5 miR396 Os05g25780 Rhodenase 3 AAAUUUAAGAGAGCUGUUGAU Oryza miR396 gwl.XIX.1660.1 Rhodenase 3 AAGUUCAAAGGAGCUGUUGAU Populus miR397 At2g29130.1 Laccase 0.5 miR397 At2g38080.1 Laccase 1 AGUCAACGCUGCACUUAAUGA Arabidopsis miR397 At5g60020.1 Laccase 1 AAUCAAUGCUGCACUUAAUGA Arabidopsis miR397 Os1g44330 Laccase 2 CAUCAACGCUGCAGUCAACGA Oryza miR397 Os01g61160 Laccase 3 CAUCAACGCGGCACUCAACCA Oryza miR397 OsO1 g62480 Laccase 2.5 CAUCAACGCCGCGCUCAACGA Oryza miR397 Os01g62490 Laccase 0.5 CAUCAACGCUGCGCUCAAUGA Oryza miR397 Os01g63180 Laccase 1.5 CAUCAACGCUGCGCUCAACAC Oryza miR397 Os2g51440 Laccase 3 CAUCAACGCUGGACUCACCAA Oryza miR395 Populus AAGUUUAAAGGAGCUGUGGAU Arabidopsis AAUCAAUGCUGCACUCAAUGA Arabidopsis miR397 OsO3g16610 Laccase 1.5 GAUCAACGCUGCGCUCAACGA Oryza miR397 Os05g38390 imiR397 Os05g38410 Laccase 2.5 GAUCAACGCGGCGCUCAACGA Oryza Laccase 1 CAUCAACGCUGCACUCAACGA Oryza miR397 Os05g38420 Laccase 1 CAUCAACGCUGCACUCAACGA Oryza miR397 OslgO01730 Laccase 2.5 miR397 Osl 1g48060 Laccase 1 125 CAUCAACGCCGCGCUCAACAC Oryza CAUCAACGCUGCACUGAAUGA Oryza miR397 Os12g01730 Laccase 2.5 CAUCAACGCCGCGCUCAACAC Oryza miR397 Os12915530 Laccase 2.5 CAUCAACGCCGCGCUCAACAC Oryza miR397 Os12g915680 Laccase 1.5 CAUCAACGCUGCGCUCAACAC Oryza miR397 estExtfgenesh4_pg.C_LG_X1 635 Laccase 1.5 miR397 estExtfgenesh4_pm.C_LG_V10293 Laccase 1 miR397 estExt_fgenesh4pm.C_LG_VII 10291 Laccase 1.5 CAUCAAUGCUGCACUCAAUCA Populus miR397 estExtGenewisel_v .C_LG_XV13501 Laccase 0.5 GAUCAAUGCUGCACUCAAUGA Populus miR397 eugene3.0001 0449 Laccase 0.5 miR397 eugene3.00060812 Laccase 1 AAUCAACGCUGCACUCAAUAA Populus miR397 eugene3.00091222 Laccase 1 CAUCAACGCUGCACUAAAUGA Populus miR397 eugene3.00161066 Laccase 1.5 miR397 eugene3.01070064 Laccase 1 GAUCAACGCCGCACUCAAUGA Populus miR397 eugene3.04340001 Laccase 1 AAUCAACGCUGCACUCAAUAA Populus miR397 fgenesh4_pg.C_LG_IV001314 Laccase 1.5 CAUCAAUGCUGCACUCAACGA Populus miR397 fgenesh4_pg.C_LG_IX000614 Laccase 1.5 CAUCAAUGCUGCACUCAACGA Populus miR397 fgenesh4_pg.C_LG_IX001 228 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 fgenesh4_pg.C_LG_VI000783 Laccase 1.5 GAUCAAUGCAGCACUCAAUGA Populus miR397 fgenesh4_pg.C_LG_XVI000990 Laccase 2.5 AAUCAACGCUGCUCUCGAUAA Populus miR397 fgenesh4_pg.C_scaffold_107000055 Laccase 1 GAUCAACGCCGCACUCAAUGA Populus miR397 fgenesh4_pm.C_LG_1000649 Laccase 1 UAUCAACGCUGCACUAAAUGA Populus miR397 fgenesh4_pm.C_LG_1000891 Laccase 2 AAUCAACGCAGCACUAAAUGA Populus miR397 grail3.0023027201 Laccase 1.5 GAUCAAUGCAGCACUCAAUGA Populus miR397 gw1.4300.5.1 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 gwl..1 184.1 Laccase 1.5 GAUCAAUGCUGCACUCAACGA Populus miR397 gw1.1.247.1 Laccase 0.5 GAUCAAUGCUGCACUCAAUGA Populus miR397 gwl.VII.3595.1 Laccase 2.5 CAUCAAUGCUGCCCUCAACGA Populus miR397 gwl .V11.21 00.1 Laccase 3 GGUCAAUUCUGCACUCAAUCA Populus miR397 gwl.XI.3910.1 Laccase 1.5 GAUCAAUGCCGCACUCAAUGA Populus miR397 gwl.XI.3915.1 Laccase 1.5 GAUCAAUGCUGCCCUCAAUGA Populus miR398 Atlg08830.1 CSD 3 AAGGGGUUUCCUGAGAUCACA Arabidopsis miR398 At2g28190.1 CSD 4 UGCGGGUGACCUGGGAAACA Arabidopsis miR398 Os03g11960 CSD 4 UGUGGGCGACCUGGGAAACA Oryza miR398 Os08g44770 CSD 4 UGCGGGUGACCUGGGAAACA Oryza miR398 fgenesh4_pm.C_scaffold_1 63000009 CSD 4 UGCGGGUGACCUGGGAAACAU Populus miR398 gwl .IX.5030.1 CSD 4 UGCGGGUGACCUGGGAAACAU Populus miR398 Atlg15640.1 CytC oxidase 3 AAGGUGUGACCUGAGAAUCACAArabidopsis miR398 Os01g42650 CytC oxidase 4 GCGCCGCGACCUGAGAGCACA Oryza miR399 At3g54700.1 P transporter 2 CAGGCCAGCUCUUCUUUGGCU Arabidopsis miR399 Os03g04360 P transporter 3 CGGGGCAGCUCUUCUUCGGGU Oryza miR399 Os08g45000 P transporter 0.5 CAGGGCAACUCUUCUUUGGCU Oryza miR399 Os109g30770 P transporter 3 CGGGGCAGCUCUUCUUCGGGU Oryza miR399 Os109g30790 P transporter 3 CGGGGCAGCUCUUCUUCGGGU Oryza miR399 estExt_fgenesh4_pm.C_LG_V0552 P transporter 2.5 CGGGCCAGCUCUUCUUUGGCU Populus miR399 eugene3.00051302 P transporter 2.5 CGGGCCAGCUCUUCUUUGGCU Populus miR399 eugene3.186960001 P transporter 2.5 CGGGCCAGCUCUUCUUUGGCU Populus miR399 fgenesh4_pg.C_scaffold_125000020 P transporter 1.5 CAGGGCAACUCUUCUUUGGGU Populus miR399 At2g33770.1 Ub 0.5 UAGAGCAAAUCUCCUUUGGCA Arabidopsis miR399 At2g33770.1 Ub 0.5 UAGGGCAAAUCUUCUUUGGCA Arabidopsis rniR399 At2g33770.1 Ub 0.5 UAGGGCAUAUCUCCUUUGGCA Arabidopsis miR399 Ub 0.5 UCGAGCAAAUCUCCUUUGGCA Arabidopsis At2g33770.1 126 CAUCAAUGCUGCACUCAAUCA Populus AAUCAACGCUGCACUCAACGA Populus GAUCAAUGCUGCACUCAAUGA Populus GAUCAAUGCUGCACUCAACGA Populus UUGGGCAAAUCUCCUUUGGCA Arabidopsis miR399 At2g33770.1 Ub 0.5 miR399 0s05g48390 Ub 1 CCGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 1 CGUGGUAAUUCUCCUUUGGCA Oryza miR399 0s05g48390 Ub 0 CUGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 0 UAGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 2 UCGGGCAAAUCUCCUUUGGCA Oryza miR399 Os05g48390 Ub 2 UUGGGCAAAUCUCCUUUGGCA Oryza miR399 eugene3.00040513 Ub 0.5 CAGGGCAAAUCUUCUUUGGCA Populus miR399 eugene3.00040513 Ub 1.5 UAGGGCAAAUCUCUUUUGGCU Populus miR399 eugene3.00040513 Ub 3 AAGGAAAGAUCUUCUUUGGCA Populus miR399 eugene3.00040513 Ub 0.5 UUGGGCAAAUCUCCUUUGGCA Populus miR399 eugene3.00040513 Ub 1 miR399 240047 eugene3.01 Ub 0.5 miR399 eugene3.01240047 Ub 1 miR399 eugene3.01240047 Ub 2.5 miR399 240047 eugene3.01 Ub 1 miR399 eugene3.01240047 Ub 1.5 miR403 Atl 9g31280.1 AGO2 0 GGAGUUUGUGCGUGAAUCUAA Arabidopsis miR403 gw1.200.30.1 AGO2 0 GGAGUUUGUGCGUGAAUCUAA Populus miR408 At2g30210.1 Laccase 3 ACCAGUGAAGAGGCUGUGCAG Arabidopsis miR408 At5g05390.1 Laccase 2.5 GCCGGUGAAGAGGCUGUGCAA Arabidopsis miR408 At5g07130.1 Laccase 2.5 GCCGGUGAAGAGGCUGUGCAG Arabidopsis miR408 Os01g61160 Laccase 2.5 GCCGGUGAAGAGGCUGUGCAA Oryza miR408 Os03g18640 Laccase 2.5 GCUAGUGAAGAGGCUGUGCAA Oryza miR408 eugene3.00131222 Laccase 3 ACCAGUGAAGAGGCUGUGCAG Populus miR408 eugene3.00191007 Laccase 2.5 GCCAGUGAGGAGGCUGUGCAG Populus miR408 gwl .VIII.2100.1 Laccase 3 UCCAGUGAAGAGGCUGUGCAA Populus miR408 At2g02850.1 Plantacyanin 1 CCAAGGGAAGAGGCAGUGCAU Arabidopsis miR408 Os02g49850 Plantacyanin 1 CUCGGGGAAGAGGCAGUGCAU Oryza miR408 Os03g15340 Plantacyanin 1 CCCAGGGAAGAGGCAGUGCAG Oryza miR408 Os6g15600 Plantacyanin 0.5 GCCGGGGAAGAGGCAGUGCAA Oryza miR408 estExt_fgenesh4_pm.C_LG_11I1 18 Plantacyanin 1.5 GCCAGGGAAGAUGCAGUGCGA Populus 127 UAGGGAAAAUCUCCUUUGGCA Populus UAGGGCAAAUCUCCUUUGGCA Populus UUGGGCAAAUCUCCUUUGGCA Populus AAGGGCAGAUCUUCUUUGGCA Populus UUGGGCAAAUCUCCUUUGGCA Populus CAGGGCAAAUCUUCUUUGGCG Populus MicroRNAs in plants Brenda J. Reinhart,' Earl G. Weinstein, 1 Matthew W. Rhoades,' Bonnie Bartel,2 '3 and David P. BartelL'3 'Whitehead Institute for Biomedical Research, and Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA; 2 Department of Biochemistry and Cell Biology, Rice University, Houston, Texas 77005, USA MicroRNAs (miRNAs) are an extensive class of -22-nucleotide noncoding RNAs thought to regulate gene expression in metazoans. We find that miRNAs are also present in plants, indicating that this class of noncoding RNA arose early in eukaryotic evolution. In this paper 16 Arabidopsis miRNAs are described, many of which have differential expression patterns in development. Eight are absolutely conserved in the rice genome. The plant miRNA loci potentially encode stem-loop precursors similar to those processed by Dicer (a ribonuclease III) in animals. Mutation of an Arabidopsis Dicer homolog, CARPEL FACTORY, prevents the accumulation of miRNAs, showing that similar mechanisms direct miRNA processing in plants and animals. The previously described roles of CARPEL FACTORY in the development of Arabidopsis embryos, leaves, and floral meristems suggest that the miRNAs could play regulatory roles in the development of plants as well as animals. [Key Words: miRNA; siRNA; ncRNA; Dicer; CARPEL FACTORY] Received May 6, 2002; revised version accepted May 22, 2002. A growing body of evidence suggests that -22-nucleotide of which are conserved from worms to humans (Pas- (nt) noncoding RNA molecules play crucial roles as regulators of gene expression in eukaryotes. The first endogenous -22-nt RNAs to be identified were lin-4 RNA and let-7 RNA, both of which are key regulatory molecules in the pathway controlling the timing of larval development in the nematode Caenorhabditis elegans (Leeet al. quinelli et al. 2000; Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). RNAs are classified as 1993; Reinhart et al. 2000). When these RNAs are ex- pressed, they pair to sites within the 3' untranslated region (UTR) of target mRNAs, triggering the translational repression of the mRNA targets (Lee et al. 1993; Wightman et al. 1993; Reinhart et al. 2000; Slack et al. 2000). The mature lin-4 and let- 7 RNAs are processed from the double-stranded region of RNA precursor transcripts by Dicer, a molecule with an N-terminal helicase and tandem C-terminal ribonuclease III domains (Bernstein et al. 2001; Grishok et al. 2001; Hutvagner et al. 2001; Ket- ting et al. 2001). Argonaute homologs also influence the accumulation of the lin-4 and let-7 RNAs, but their biochemical roles are unclear (Grishok et al. 2001). Argonaute family members have a PAZ domain, which may allow protein-protein interaction with Dicer, as well as a Piwi domain, whose function is unknown (Cerutti et al. 2000). The lin-4 and let-7 regulatory RNAs are now recognized as the founding members of a large class of -22-nt noncoding RNAs termed microRNAs (miRNAs),several 3 Corresponding authors. E-MAILbartelflrice.edu; FAX (713)348-5154. E-MAILdbartelwi.mit.edu; FAX(617)258-6768. Article and publication are at http://www.genesdev.org/cgi/doi/10.1101/ gad.1004402. 1616 miRNAs if they share the following features with lin-4 and let-7 RNAs: (1) The mature form of the RNA is a 20-nt to 24-nt species that is usually detectable on Northern blots. (2) The RNA has the potential to pair to flanking genomic sequences, placing the mature miRNA within an imperfect RNA duplex thought to be needed for its processing from a longer precursor transcript. In addition, miRNAs are typically derived from a segment of the genome that is distinct from predicted proteincoding regions. Thus far, >150 tiny RNAs that satisfy these criteria have been identified in animals (LagosQuintana et al. 2001, 2002; Lau et al. 2001; Lee and Am- bros 2001; Mourelatos et al. 2002).The abundance of the miRNA genes, their intriguing expression patterns in different tissues or in different stages of development, and their evolutionary conservation imply that, as a class, miRNAs have broad regulatory functions in addition to the known roles of lin-4 and let-7 RNAs in the temporal control of developmental events. In support of this idea, six of the recently identified Drosophila miRNAs are complementary to 3'-UTR elements known to confer posttranscriptional regulation in this species (Lai 2002). MicroRNAs are not the only small RNAs processed by Dicer. Dicer was originally identified as a nuclease involved in the RNA interference (RNAi) pathway of animals (Bernstein et al. 2001).This method of RNA silencing is triggered by long double-stranded RNA (dsRNA), typically introduced by injection or expression from a transgene (Fire et al. 1998). The dsRNA trigger is GENES& DEVELOPMENT 16:1616-1626O 2002 by Cold SpringHarborLaboratoryPressISSN 0890-9369/02$5.00; www.genesdev.org microRNAsin plants cleaved by Dicer into -22-nt RNAs (Bernstein et al. 2001). These -22-nt RNAs, known as small interfering -100 were cloned from flowers. Of these, 18 sequences RNAs (siRNAs), act as guide RNAs to target homologous subject of further analysis. Of these 18 RNAs, 16 had mRNA sequences for destruction (Hammond et al. 2000; Zamore et al. 2000; Elbashir et al. 2001).RNAs -25 nt in length are also associated with posttranscriptional gene silencing (PTGS)in plants, and it has been suggested that a Dicer-like activity also produces these small RNAs (Hamilton and Baulcombe 1999; Matzke et al. 2001; striking similarities to the miRNAs of animals and have therefore been named miR156 through miR171, with were represented by more than one clone and were the genes designated MIR 156 through MIR 171 (Table 1). Six of the miRNAs represent three pairs of closely related RNA sequences differing only by one or two nucleotides. Vance and Vaucheret 2001). RNAi, PTGS, and quelling Interestingly, most of the plant miRNAs begin with a U, a trend previously observed in animal miRNAs (Lagos- of Neurospora are related pathways that require a conserved set of proteins (Hutvigner and Zamore 2002).For Quintana et al. 2001; Lau et al. 2001). Five of the plant miRNA sequences have a single copy example, PTGS requires ARGONAUTE in the Arabidopsis genome, whereas each of the other 11 (Fagard et al. 2000), the RNA-directed RNA polymerase SDE1/SGS2, which may amplify dsRNA used as a trigger for silencing (Dalmay et al. 2000; Mourrain et al. 2000),and the RNA sequences correspond to multiple (2-7) loci (Table 1), helicase SDE3 (Dalmay et al. 2001). Some aspects of expected for miRNA loci, nearly all (37 of 40) of the RNA silencing may be species-specific, such as the RNA-directed DNA methylation required to maintain transgene silencing in plants (Morel et al. 2000; Bender 2001). Although RNA silencing has been proposed to genomic loci lie outside of annotated segments of the genome, and thus do not correspond to previously iden- have evolved as a viral defense mechanism (Vance and Vaucheret 2001), it can clearly be used by organisms for the regulation of endogenous genes. The Drosophila Argonaute family member aubergine is involved in the endogenous RNAi-like silencing of Stellate by dsRNA pro- duced from both DNA strands of the Suppressor of Stellate locus (Aravin et al. 2001). It is possible that other animals or plants also generate endogenous siRNAs for gene regulation in development. To further examine the roles of small RNAs in the regulation of plant gene expression, we cloned endogenous RNAs from Arabidopsis. Here we describe 16 plant RNAs that have the defining features of miRNAs. The presence of miRNAs in plants greatly expands the known phylogenetic distribution of this class of tiny noncoding RNAs and indicates that miRNAs arose early in eukaryotic evolution, before the last common ancestor of plants and animals. The presence of miRNAs in plants also suggests that the developmental defects of carpel factory (caf)l,a mutation in a Dicer homolog (Jacobsen et al. 1999), and mutations in ARGONAUTE family proteins (Bohmert et al. 1998; Moussian et al. 1998) could result from miRNA processing defects. In fact, we find that the accumulation of plant miRNAs is substantially reduced in the caf mutant. The ancient origin of miRNAs, together with the potential link between miRNAs and development, implies that miRNAs might most likely because of duplications in the Arabidopsis genome (The Arabidopsis Genome Initiative 2000). As tified genes. The three exceptions are for a single miRNA, miR171. Furthermore, each of these 37 loci place the cloned RNA sequence in a context where it can pair with a nearby genomic segment to form a dsRNA hairpin structure resembling those thought to be required for Dicer processing of miRNAs (Fig. 1; Supple- mental data available online at http://www.genesdev. orgl. As with metazoans, the mature miRNA can be processed from either the 5' or the 3' arm of the fold-back precursor. Nevertheless, each miRNA with multiple matches to the genome is always present on the same arm of its potential precursors, suggesting that these loci share a common ancestry (see Supplemental data available online at http://www.genesdev.org). We do not know whether all of these loci are transcriptionally active or whether some might be pseudogenes. The sizes of the predicted Arabidopsis hairpins are more variable than those of animals. For example, Caenorhabditis elegans miRNAs tend to be cleaved from precursors -70 nt in length, with the mature miRNA located only -2-10 bp from the terminal loop of the stem-loop (Lau et al. 2001). Although some of the Arabidopsis precursor predictions resemble those of C. elegans (Fig. 1), others are larger, as seen for the -190-nt predicted precursor of miR169 (Fig. 1). In other systems, only one of the RNA strands accu- mulates following Dicer processing of miRNAs from the double-stranded region of the precursor, while the remainder of the precursor quickly degrades (HutvAgneret have played roles during the origins and evolution of al. 2001). As a result, RNA from only one side of the both plant and animal multicellular life. miRNA precursor is typically cloned or detected on Northern blots, although on rare occasions RNA from the other side of the precursor is identified (Lau et al. Results Identification of Arabidopsis miRNAs 2001; Mourelatos et al. 2002), particularly if many clones are sequenced (E.G. Weinstein and D.P. Bartel, unpubl.). Using methods designed to clone Dicer cleavage products, which are 20-nt to 24-nt RNAs with 5'-phosphate and 3'-hydroxyl groups (Bernstein et al. 2001; Elbashir et In contrast, Dicer processing of perfectly complementary dsRNA molecules in the RNAi pathway is thought to produce two stable overlapping -21-nt RNA molecules that pair to each other with -2-nt 3' overhangs (Elbashir al. 2001; Hutvigner et al. 2001; Nykiken et al. 2001). As expected, for most et al. 2001; Lau et al. 2001), -200 tiny RNAs were cloned from Arabidopsis seedlings and (14/16)of the plant miRNAs, we cloned sequences from GENES& DEVELOPMENT 1617 Table 1. MicroRNAs cloned from Arabidopsis miRNA gene No. of clones MIR156a 16 miRNA sequence miRNA length (nt) Oryza matches 20-2 I 10 Fold- Foldb;ack back arm length Chr. Distance to nearest gene 5' 82 2 3.2 kb downstream of At2g25100(s) MIR156b 5' 80 4 0.36 kb upstream of At4g30970(a) MIR156c 5' 83 4 3.2 kb downstream of At4g31875(s) MIR156d 5' 86 5 2.6 kb upstream of At5g10940(s) MIR 156e 5' 96 5 1.6 kb downstream of At5gl1980 (s) MIR156f 5' 90 5 1.3 kb downstream of At5g26150(a) 5' 91 1 1.8 kb downstream of Atlg66780 (a) MIR157a 9 UGACAGAAGAGAGUGAGCAC UUGACAGAAGAUAGAGAGCAC 20-2 MIR157b 5' 91 1 2.7 kb downstream of Atlg66790 (a) MIR 157c 5' 165 3 2.3 kb downstream of At3g18215(a) MIR157d 5' 173 1 1.0 kb upstream of Atlg48470 (s) 0.6kb upstream of At3g10750(s) MIR 158 8 UCCCAAAUGUAGACAAAGCA 20 3' 64 3 MIR 159 8 UUUGGAUUGAAGGGAGCUCUA 21 3' 182 1 1.9 kb upstream of Atlg73690 (s) MIR 160a 4 UGCCUGGCUCCCUGUAUGCCA 21 5' 78 2 4.0 kb downstream of At2g39180(a) MIR .160b 5' 80 4 2.4 kb upstream of At4g17790(a) MIR.160c 5' 81 5 1.5 kb upstream of At5g46850(a) 5' 90 1 2.6 kb downstream of Atlg48270 (a) 3' 85 5 1.2 kb upstream of At5g08190(s) 3' 88 5 1.4 kb upstream of At5g23070(s) 3' 303 1 0.6 kb upstream of Atlg66730 (s) 5' 78 2 1.lkb upstream of At2g47590(s) 5' 149 5 2.4 kb upstream of At5g01750(s) 3' 101 1 1.5 kb downstream of AtlgOI 180 (a) 3' 136 4 2.8 kb upstream of At4gO0880(s) MIR .161 MIR.162a 16 UUGAAAGUGACUACAUCGGGG 20-2 1 3 UCGAUAAACCUCUGCAUCCAG 21 4 - 1 MIR .162b MIR.163 24 UUGAAGAGGACUUGGAACUUCGAU 24 MIR.164a 21 UGGAGAAGCAGGGCACGUGCA 21 2 UCGGACCAGGCUUCAUCCCCC 2 MIR164b MIR165a 20-2 MIR165b 3' 136 2 4.7 kb upstream of At2g46690(a) MIR166b 3' 112 3 3.5 kb upstream of At3g61900(a) MIR166c 3' 108 5 10 kb downstream of At5g08690(s) MIR166d 3' 101 5 22 kb downstream of At5g08740 (a) MIR166e 3' 135 5 2.6 kb downstream of At5g41910(a) MIR166f 3' 91 5 1.1 kb downstream of At5g43600(s) MIR166g 3' 90 5 1.5 kb upstream of At5g63720(s) 5' 101 3 4.7 kb upstream of At3g22890(a) 5' 90 3 0.19 kb downstream of At3g63370(s) 5' 104 4 2.3 kb upstream of At4g19390(a) 5' 89 5 0.5 kb downstream of At5g45310(s) 5' 190 3 1.9 kb downstream of At3g13400(a) 3' 64 5 0.5 kb downstream of At5g66040(s) 3' - 92 - 3 2 3 4 0.5 kb downstream of At3g51380(a) in At2g45160SCARECROW-like(a) in At3g60630SCARECROW-like(a) in At4g00150SCARECROW-like6 (a) MIR166a MIR167a 5 19 UCGGACCAGGCUUCAUUCCCC UGAAGCUGCCAGCAUGAUCUA 21 21 6 3 MIR167b 3 UCGCUUGGUGCAGGUCGGGGA 21 MIR169 3 CAGCCAAGGAUGACUUGCCGA 21 MIRI 70 3 UGAUUGAGCCGUGUCAAUAUC 21 MIR171 10 UGAUUGAGCCGCGCCAAUAUC 21 MIR168a MIR168b 2a 5 b Some miRNAs are represented by clones of different lengths due to heterogeneity of the RNAends. The sequence of the most abundant clone is shown. Both miR156 and miR161 clones were found with 5' or3' heterogeneity. MIR160band MIR161 each had one clone of the same size but in a registershifted 5' of the sequence shown by 2 and 8 nucleotides, respectively. The number of perfect matches to the availablerice genomic sequence (Oryza matches) are indicated, as is the arm of the predicted stem-loop precursor that contains the miRNA (Fold-backarm) and the minimum number of nt that would be required to from a fold-back structure bounded by the miRNA and the segment of the predictedprecursor that pairs to the miRNA (Fold-backlength). Oryza fold-backshave the miRNA in the same arm as their Arabidopsis homologs (Supplemental data available online at http://www.genesdev.org). Chromosomal (Chr)positions, distance to the nearest annotated gene, and the position of the miRNA, sense (s)and antisense (a),relative to the nearest gene are noted for all matches in the Arabidopsis genome. aOne of the miR169 Oryza matches is at the end of a contig, precluding prediction of a fold-back precursor structure. bAs with Arabidopsis, only one of the miR171 Orzya matches has a predicted fold-back characteristic of miRNAs. 1618 GENES& DEVELOPMENT microRNAs in plants UC U U A-U C-G G-C U-A U C A-U A-U C-G G-C GAU oCB A-U G U U C G-U U-A A-U U-A U-A G-C C U U A-U C-G G-C A-U C-G UUc UUC C G U-A A-U C-G G-C U-A U U A-U A-U C-G C A C-G A-U C-G G-C A-U G-C U-A G-C A-UG G-C A-U G-C AU A-U A-U G-C A-U C-G A-U G-C U-A C-G GUG U A-U C C A C-G A-U C-G G-C A-U G-C U-A G-CC A-U G-UC A-U G-C U U A-U A-U G-C A-U C-G A-U G-C U-A C-G A-U A-U A-U C-G zU C 5' 3' 5' 3' A-U C-U A C-G A-U C-G G-C A-U G-C U-A G-C A-U C A G U A U A-U A-U U'G A-U U-A G'U U-A U U G G A G A-U G-U G G G-C G'U A-U A-U A-U C-G A C C-G A C_ G C AU G C UA G C A-U G C AU CG C AU A-U G-U A-U C G A U UA U A-U G-C A-U GCC A-UA G-C A-U C-G 5' U-A C-G 3' U-A G-C A-U A-U G-C 5' c 3' GGA AU A-U G-C U-A UG U'G U-A U-A UG U-A U-A U-A 1 Ir44itod U U-A U-A U-A U-A UCG U-A U-A U-A U-A U-A C-G UGA U U U U U-A U G U'G AA BQ G - CU U U-A A-U u u U U U U C-G U-A UU-A GGu-G U-A A-8 C_ G-C U U 5' d 3' e U U U-A G-C G-U U-A A-U C-0 A-U C-G A-U C-G G-C A-U G-U U-A G-C A-U G-C A-U G-C A-U C A-U G-C A-U C-G A-U G-C U-A G-C AG-C G-C o-C A-U U CU CC U UUA op A G- C-0 C G-C U U C A-U A 3 G C G C G-C A C C-G A-U G-C U-A G-C G-C 3' f A u.-GU U-A A-U G-C G'U U-A A-U C-G A-U C-G A-U C-G G-C A-U G-C U-A G-C A-U G-C A-U G-C A-U 5' A-U 4ntF C C G C A C A a C U GU-G G-U A-U 5' 3' MIR169 U UUACUU A A C UU-AC A-U G-C A-U C-G U-A C U A-U C-G U-A U-G G-C G-C U-G C U C-G G-U G-C U-A U-A A-U U-A A-U G-C U U C C 5' 3' MIR170 MIR156 Figure 1. Fold-back secondary structures of Arabidopsis miRNA predicted precursors as determined by the RNAfold program. The miRNA sequences are in red. For miR156 and miR169, RNAs from the other side of the fold-back (boxed in blue) were each cloned once. The duplexes that could form between these RNAs and the miRNA from the other strand have -2-nt 3' overhangs characteristic of Dicer cleavage (Elbashir et al. 2001). only one arm of the fold-back precursor. For two loci, we also cloned a single 21-nt sequence from the other arm of the fold-back (Fig. 1). The disparity in cloning frequency between the two sides, 16:1 in the case of MIR156, was similar to that seen for metazoan miRNAs (E.G. Weinstein and D.P. Bartel, unpubl.). The isolation of these two sequences generated from the opposite arm of the predicted fold-back supports the existence of these stemGENES & DEVELOPMENT 1619 Reinhart et al. loops as miRNA precursors. Furthermore, the duplexes that could be formed between the sequences isolated from both sides of the stems have 2-nt 3' overhangs (Fig. 1), suggesting that they are products of a Dicer-like activity similar to that which processes the metazoan miRNAs (E.G. Weinstein and D.P. Bartel, unpubl.). have differently processed precursors or tissue-specific differences in the Arabidopsis miRNA processing machinery. We have not been able to reliably detect expression of RNAs in the size range of 60-200 nt that might correspond to the stem-loop precursors cleaved by Dicer. Arabidopsis miRNAs are produced by CARPEL FACTORY The Arabidopsis miRNAs display developmental expression differences Although the presence of precursors in Arabidopsis was not detected on Northern blots, the potential for their production prompted us to investigate whether the -21nt miRNAs might be processed from a longer dsRNA by proteins homologous to those that generate metazoan miRNAs. Dicer is thought to cleave the double-stranded region of the miRNA precursors in Drosophila, C. elegans, and humans (Grishok et al. 2001; Hutvigner et al. 2001; Ketting et al. 2001; Lee and Ambros 2001). Mutations have been isolated in only one of the four Dicer homologs in Arabidopsis, CARPEL FACTORY (CAF; also named SHORT INTEGUMENT [SIN1]; GenBank accession no. AAG38019). The pleiotropic phenotypes associated with loss of CAF/SIN1 function, such as floral meristem proliferation defects, floral organ morphogenesis defects, and altered ovule development, emphasize the critical developmental role of RNAs processed by CAF (Robinson-Beers et al. 1992; Ray et al. 1996a,b; Jacobsen et al. 1999). Northern analysis showed that the expression level of the three miRNAs tested is signifi- Northern analysis confirmed that the 16 miRNAs were stably expressed as -21-nt RNAs (Fig. 2). All are expressed at some level in seedlings, leaves, stems, flowers, and siliques (seed pods). Whereas miR163 accumulates in all tissues, with only slightly lower levels in seedlings and siliques, other miRNAs have quite variable levels among the tissues tested. For example, miR157 is most highly expressed in seedlings, and miR171 is most highly expressed in flowers, suggesting that they might play roles in the development of these stages/organs. The size of the RNAs detected approximately matches those that were cloned. In some cases, RNAs of two sizes can be detected, reflecting the heterogeneity of the cloned sequences (Table 1). For example, a probe to miR156 detects both 20-nt and 21-nt RNAs, and the miR156 clones were of both sizes. In another case, miR167, a 21-nt RNA accumulates in all tissues except stem, where a 22-nt RNA accumulates instead. This might reflect either differential transcription of the two MIR167 genes that M Se L St F M Se Si L St F Si 24miR158 Figure 2. Developmental expression of Ara- bidopsis miRNAs. Total RNA from Columbia seedlings (Se), leaves (L), stems (St), flowers (F), and siliques (Si) was analyzed on Northern blots by hybridization to end-labeled DNA oligonucleotide probes complementary to the miRNA. The lengths of end-labeled RNA oligonucleotides run as a size marker (M) are noted to the left of each panel. Although miR165 and miR166 sequences and miR170 and miR171 sequences are too closely related to be reliably distinguished by hybridization probes, miR156 and miR157 should be specifically recognized (Lau et al. 2001), as reflected in their different levels of expression in seedlings and siliques. miR159 and miR164 show a similar expression profile to miR165, whereas miR160, miR162, and miR168 have similar profiles to miR158 (data not shown). The low expression level of most miRNAs in leaves and siliques might reflect a difference in the efficiency of small RNA recovery with the RNA isolation method used for these two tissues (see Materials and Methods). Blots were stripped and reprobed with an oligonucleotide probe complementary to U6 as a loading control. 1620 24- (1) 2118- miRll 46,' miR163 I-, miR157 21-0 18- ft 242118miR156 oi 2 miR169 U6 7824miR165 18- Sr miR171 i miR167 w 2421- mIR170 18- *aSu 78 GENES & DEVELOPMENT )3K U6 microRNAs in plants cantly reduced in carpel factory homozygotes (Fig. 3). Although the level of miRNA precursors is increased when Dicer function is reduced in metazoans (Grishok et al. 2001; Hutvigner et al. 2001; Ketting et al. 2001; Lee and Ambros 2001), we have not detected precursor accumulation in caf mutants (Fig. 3; data not shown). Evolutionary conservation of Arabidopsis miRNAs in Oryza The evolutionary conservation of miRNA sequences in different species indicates that they have important biological functions (Pasquinelli et al. 2000; Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). Eight Arabidopsis miRNAs have sets of identical matches in the genome of the rice Oryza sativa L. ssp. indica (Table 1), which was estimated to have 92% functional coverage at the time of our analysis (Yu et al. 2002). With rare exceptions (noted in Table 1), these sets of Oryza homologs have adjacent sequences that could form stem-loop precursors analogous to those of Arabidopsis, with the miRNA sequence invariably on the M (CAF/CAF) (CAF/caO L St L St F F (caf/ca) L St F miR169 4W 40 #0 ow NoS~ miR156 $** WO ft r! e A*.W miR158 7Jlil Ueo Figure 3. Expression of miR169 is dependent on CARPEL FACTORY. Total RNA from wild-type Landsberg erecta (CAF/ CAF), heterozygous (CAF/caf), and homozygous (caf/caf) carpel factory leaves (L), stems (St), and flowers (F)was analyzed on a Northern blot. RNA size markers (M) are noted to the left. The blot probed for miR158 was stripped and reprobed with a U6 end-labeled DNA probe as a loading control. same arm of the precursor in both species (see Supplemental data available online at http://www.genesdev.org). The Arabidopsis and Oryza sequences have drifted considerably in regions outside the miRNA sequence, but selective pressure can be seen in the segments predicted to base-pair with the miRNAs, resulting in only a few base changes in these segments and a conserved overall propensity for dsRNA formation (Fig. 4). For each set of related loci, the precursor duplexes extend beyond the length of the miRNA, but the sequence of the flanking duplex RNA is variable (see Supplemental data available online at http://www.genesdev.org). This conservation in secondary structure accompanied by variability in sequence provides added evidence that the secondary structural context of these RNAs is important, presumably for their processing from stem-loop precursors. An miRNA complementary to three related mRNAs In nematodes, lin-4 and let-7 RNA recognize their target mRNAs through limited base-pairing to complementary sites within the 3' UTR of their targets. The largest regions of uninterrupted complementarity are only -8 nt (Lee et al. 1993; Wightman et al. 1993; Reinhart et al. 2000; Slack et al. 2000). Consistent with this precedent, the plant miRNA sequences do not perfectly match coding regions, with the exception of miRl71, which has four matches to the genome. One locus is 0.5 kb from the nearest predicted coding region and adjacent to genomic sequence that can form a classical miRNA precursor, consistent with the idea that it is a true miRNA. Further supporting this idea is the observation that a closely related sequence, miR170, was also cloned multiple times and has all the characteristics of the other plant miRNAs. However, the other three MIR171 loci differ from those of the other miRNAs (Table 1). They are anti-sense to the coding region of three SCARECROW-like genes of the GRAS family of putative transcription factors (DiLaurenzio et al. 1996; Pysh et al. 1999). This is the first example of a convincing miRNA candidate that is also the perfect anti-sense match to a coding region. Although this miR171 sequence identity might be a coincidence, the targets of this 21-nt RNA could include these three SCARECROW-like genes. miR171 (and perhaps the related miRNA, miR170) might act like a translational regulator similar to the lin-4 and let-7 RNAs, or it might pair with these three genes for a very different type of regulatory interaction. miR171 could direct cleavage of the messages as if it were an siRNA of the RNAi pathway, or it could direct a nucleic acid modification such as the methylation of genomic DNA seen in PTGS and transcriptional gene silencing of plants. Interestingly, the five perfect matches to miR171 in Oryza also include one miRNA homolog and four anti-sense matches to SCARECROW family members. This observation raises the possibility that these SCARECROW segments might be conserved based on their function as miRNA targets in addition to their function in coding proteins. GENES & DEVELOPMENT 21·1 1621 Reinhart et al. AU A U G-U G-U C-0 G-U AU G U-A A-U U-A A-U U-A O 0 U-A G-U C-G G-U U-6 A-U G-U G-U C-0 G G A-U G-C G-C A-U C U C-0 U U U A G a-C U-A C-a G-C C C A A G-C A A A A U A U-A A-U C-B A-U AA AC AA AA G UG G-C U-A C U C-G U-A U-A C U U-A C-0 U-A A-U G-C C-G U-A A-U C A U-A U-A 0-C a-C C U A-U -C0 G-C O A A-U G-C a-C U-A C-G G-C C-9 U U 5' 3' MIR162a AU U U U U A-UU U'G A-U A-U G-U U-A G-C U-AAA C C - GAA a a G -C C-0 C-G C-0 U-A U-A A-U A C C-G U-A A-U G-C C-0 U-A A-U C A U-A U-A G-C G-C C U G-C A-U 0-G 0-C G A A-U G-C G-c U-A C-G a-C C-0 U C 5' 3' MIR162b Arabdopsis Arabidopsis U-A U-A U C C C U-A A-U a-C C-0 U-A A-U U-A U-A U-A G-C G-C U U a-C A-U CG0 0-C A C 0 U a-C a-C U-A C-0 CU 5' 3' MIR162 Oryza C-G A A-UA A-UC U C AC A-U A-U AC A U AU A-U A-U A-U c-aG G-c U-A G-C C-0 A-U A C G-C G-U G-U A A UC A A-U A-U 8-C A-U G-C G-C U-A U-A G-C U-A 5' ccU UC-0U C 3' A A-U C-6 U-A C-0 AU AU- U-A A-U U.0 A 0 U-A A-U 0-0 U-A C-0 aGU A-U U-A c-G U C-0 C-0 U CU U U-A 0-G U-A C-a A-U C-0 U-A C-G U U C-a A-U AAU A C 0-C C-G C U C G a U-A 0-C U-A U-A C 0C C- A B cc u A C-0 G-C Au CU-a U-A U-A U-C UA-UC G-C U-A U 0-c A-U C-G U-A A-U C-G A-U C-G G-C G-C AG'CA 0e-cu U•.U A-U A-U AG'CA G-0 G-C U-A A-U G-C 5' 3' MIR164a MIR164b Arabidopsis Arabidopsis GU-A c G-C A-U A-U G-C A-U G-C A-U A-U G-C A-U U-A U U G-C C C G-C 5' 3' MIRI64a Oryza C a c C-a C-G G-C G-C C-G C C G-C U-AU C U U C-a 0-C C UC A C A-U C-G 0-G U-A A-U C-G C-0 uuA -U UC A U U-A G-U C-6 A-U C-G G-C G-C A 0-C C U G-C A-U A-U G-C A-U G-C U-A a-c G-C 5' 3' MIR164b Oryza Figure 4. Conservation between the Arabidopsis and Oryza predicted stem-loop precursors. (A) miR162 homologs. (B) miR164 homologs. Sequence homology is seen within the miRNA (in red), its paired sequences, and a few base pairs adjacent to the miRNA. The remainder of the sequence has drifted considerably, with the main constraint being the formation of a stem-loop structure. Other endogenous small RNAs The other two RNAs cloned multiple times, Seq C and Seq F in Figure 5, are not likely to be miRNAs. Expression of Seq F but not Seq C can be detected on Northern blots (data not shown). Nonetheless, neither appears to have the potential to form extended pairing with the adjoining sequence like that seen for the other 16 se1622 GENES & DEVELOPMENT quences. Interestingly, both of these sequences match single loci in the same 2.3-kb region of Chromosome 2 that is also the source of four other -22-nt RNAs that we cloned once (Fig. 5). These RNAs are unlikely to be simply degradation products of mRNAs. Only two of these six sequences correspond to the same DNA strand as the two predicted protein-coding genes in this 2.3-kb region. Moreover, one of the single-clone RNAs (Fig. 5, Seq B) is microRNAs in plants At2g39680 At2g39670 C* E I C 5 11I I II AB 1 kb A UUCAAUAAAU AAUUGGUUCU A (1) B GAACUAGAAA AGACAUUGGA C (1) C UCCAAUGUCU UUUCUAGUUC GU (3) D AGAGUAAGAU GGAUCUUGAU AA (1) . - I I D F E UAUAUCCCAU UUCUACCAUC UG (1) F UCCAAGCGAA UGAUGAUACU U (3) Figure 5. A cluster of small RNAs derived from Chromosome 2. Arrows represent the two predicted genes in this region, and vertical lines represent the genomic positions of the six cloned RNAs. Sequences of the RNAs are listed, with cloning frequencies in parentheses. a 2-nt-offset reverse-complement of Seq C. A duplex formed between them would have 1-nt and 2-nt 3' overhangs, reminiscent of Dicer cleavage products during RNAi (Elbashir et al. 2001). The high density of 21-nt to 22-nt RNAs cloned from this region implicates either endogenous RNAi or some other, unknown Dicer-mediated event. Discussion We have described 16 plant miRNAs that have the characteristic features of metazoan miRNAs. Like the miRNAs of animals, the plant miRNAs are 20-nt to 24nt endogenous RNAs detectable on Northern blots and are derived from one arm of an apparent stem-loop precursor through the action of Dicer. As with most of the metazoan miRNAs, most plant miRNAs begin with a U, are transcribed from independent genes, and are evolutionarily conserved. The discovery that the phylogenetic distribution of miRNAs extends to plants indicates that miRNAs arose early in eukaryotic evolution and suggests that they have been shaping gene expression since the emergence of multicellular life. Although the evolution of the RNAi and PTGS pathways and their related proteins has been attributed to defense against viruses and transposons (Ketting and Plasterk 2000; Vance and Vaucheret 2001), the presence of miRNAs in plants suggests that Dicer and Argonaute proteins also have ancient roles in miRNA processing and function. One difference between plant and animal miRNAs is the dsRNA precursor from which the mature miRNAs are cleaved. Based on the length of RNA that would be necessary to allow the miRNA to be incorporated into an RNA duplex suitable for Dicer cleavage, we predict that plant miRNA precursors can be more than three times as large as those of animals (Table 1).However, we have not detected plant precursor molecules during our Northern analysis of wild-type or caf RNA. Our method may not be sufficiently sensitive to detect very low levels of precursors. Perhaps precursor transcripts are more rapidly cleaved and turned over in Arabidopsis than in metazoans, or plant precursors might be too large or diffuse in size for Northern analysis techniques maximized for the resolution of the -21-nt mature RNAs. For instance, plant miRNAs might be processed cotranscriptionally, directly from transient primary transcripts. This would be in contrast to metazoan miRNAs, which often appear to be processed from metastable stem-loop precursors that have been preprocessed from a primary transcript (Lauet al. 2001).Although the common role of Dicer homologs in the production of plant and animal miRNAs highlights the similarities between their mechanisms of production, there might be differences in the structure and production of precursors, cellular compartmentalization, timing of precursor processing, or types of cofactors involved in processing. The increasing number of miRNAs being identified raises the question of what their cellular functions are. Although some might regulate translation via base-pairing to target gene 3' UTRs in a manner similar to regulation by lin-4 and let-7 RNAs, it is not clear whether all will be found to perform similar biochemical functions. One hint that miRNAs could perform other types of RNA-mediated gene regulation is our finding that miR171 could interact with the coding region of three GRAS family transcription factors through perfect complementarity rather than the limited base-pairing seen between lin-4 and let-7 and the 3' UTRs of their targets. If these genes are regulatory targets of miR171, the miRNA could act like other -21-nt regulatory RNAs and direct mRNA degradation or epigenetic modification of the genomic sequence. A role for the miRNAs in development of both plants and animals is suggested by the phenotypes of Dicer and Argonaute family mutants. In C. elegans, developmental defects resulting from reduction of function of dcr-1 (Dicer) and alg-llalg-2 (Argonaute-like gene) have been attributed to the improper processing of miRNA precursors and a reduction in mature miRNA expression (Grishok et al. 2001). The mutant animals essentially reiterate stem-cell-like divisions and delay the switch to a later-stage developmental program. An intriguing parallel in Arabidopsis is that mutant alleles of caf/sinl delay the meristem switch from vegetative to floral development (Ray et al. 1996a)and cause overproliferation of the floral meristem (Jacobsen et al. 1999), which sug- gests a distant link between the pathways affected by Dicer mutants in plants and animals. Mutations in two Arabidopsis Argonaute family genes also alter meristem development. The argonaute mutants disrupt axillary shoot meristem formation and leaf development (Bohmert et al. 1998), and ZWILLE/PINHEADis required for shoot meristem maintenance and floral development GENES& DEVELOPMENT 1623 Reinhartet al. (Moussian et al. 1998; Lynn et al. 1999). The existence of amide gel, electroblotted to a nylon membrane, and hybridized miRNAs in plants suggests that aberrant processing of miRNAs could be responsible for some if not all of the developmental defects in caf mutants, and it is possible that the same will be true for argonaute or zwille/pinhead mutants. However, ARGONAUTE is also required to end-labeled anti-sense DNA probes (Lee et al. 1993). for PTGS (Fagard et al. 2000; Morel et al. 2002), and a genomes/A_thaliana/ (13-Aug-2001).Predicted secondary structures were generated using the Zucker folding algorithm and manually inspected for fold-backs with the RNA sequence in related protein is required for RNAi in animals (Tabara et al. 1999; Hammond et al. 2001; Williams and Rubin 2002).In fact, the Drosophila Argonaute family member aubergine, a gene required for oogenesis (Schupbach and Wieschaus 1991), is involved in the endogenous RNAi- Sequence analysis Sequences of RNA clones were compared with the Arabidopsis genome downloaded from ftp://ncbi.nlm.nih.gov/genbank/ the stem as is characteristic of metazoan miRNAs (Lau et al. 2001). To identify Oryza sativa homologs, the miRNAs were compared with the rice genome sequence downloaded from the like silencing of Stellate by dsRNA produced from both DNA strands of the Suppressor of Stellate locus (Aravin et al. 2001), raising the possibility that the Arabidopsis argonaute or caf phenotypes reflect the role of these proteins in the production of endogenous siRNAs that con- Beijing Genomics Institute Web site at http://btn.genomics. trol gene expression. Further investigation of the roles of Acknowledgments small RNAs such as those from the Chromosome 2 cluster (Fig. 5) will address this possibility. We thank the Arabidopsis Biological Resource Center at Ohio State University for seeds segregatingfor the caf mutation, Nel- Finally, we suspect that other classes of Dicer- and Argonaute-dependent small RNAs are present in Arabidopsis. Noncoding RNAs continue to be discovered in a wide range of organisms, and the roles they play in the cell are only beginning to be understood (Eddy 2001). In many ways, the most interesting possibility is that no one class of RNAs can be responsible for the phenotypes of Dicer and Argonaute family mutations because organisms use such a rich variety of RNA-mediated gene regulation in their development. org.cn/rice (first draft) using the BLAST algorithm, and the adjoining sequences were analyzed for fold-back secondary structures as described above. son Lau for reagents and advice in the cloning of endogenous Dicer products, Lee Lim for advice on bioinformatics, and Phil Zamore and members of the Bartel laboratories for comments on the manuscript. This research was supported in part by the Robert A. Welch Foundation (C-1309). The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact. References Materials and methods The Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the floweringplant Arabidopsis thaliana. Plant growth and RNA isolation Nature 408: 796-815. Aravin, A.A., Naumova, N.M., Tulin, A.A., Rozovsky, Y.M., and Gvozdev, V.A. 2001. Double-stranded RNA-mediated si- Total RNA from wild-type Arabidopsis thaliana (Columbia accession) was isolated from 6-day-old seedlings grown on agar- based medium overlaid with filter paper and from flowers and stems of 4-week-old plants grown in soil using Trizol (GIBCO BRL. Total RNA was prepared from leaves and siliques using a modification of the method described in Nagy et al. (1988),in which the LiCl precipitation was replaced by ethanol precipitation. For isolation of RNA from carpel factory plants, progeny of CAF/caf heterozygous plants (in the Landsberg erecta accession! were grown on medium supplemented with 12 g/mL kanamycin for 8 d, after which kanamycin-resistant individuals were transferred to soil and grown for an additional 24 d under continuous illumination. Plants were then scored as having (caf/cafl or lacking (CAF/cafl the carpel factory phenotype (Jacobsen et al. 1999), and RNA was prepared from leaves, stems, and flowers using a modification of the Nagay et al. (1988) method (see above). Wild-type plants (Landsberg erecta accession) were processed similarly, except that seeds were originally sown on medium lacking kanamycin. RNA analysis Endogenous 18-nt to 26-nt RNAs from seedlings and flowers were isolated from total RNA by 15% PAGE and cloned as described (Lau et al. 2001). The laboratory protocol is available at http://web.wi.mit.edu/bartel/pub/. For Northern analysis, 20 ug of total RNA per lane was separated on a 15% polyacryl- 1624 GENES& DEVELOPMENT lencing of genomic tandem repeats and transposable elements in Drosophila melanogaster germline. Curr. Biol. 11: 1017-1027. Bender, J. 2001. A vicious cycle: RNA silencing and DNA meth- ylation in plants. Cell 106: 129-132. Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. 2001. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature 409: 295-296. Bohmert, K., Camus, I., Bellini, C., Bouchez, D., Caboche, M., and Benning, C. 1998. AGO1 defines a novel locus of Arabidopsis controlling leaf development. EMBO . 17: 170180. Cerutti, L., Mian, N., and Bateman, A. 2000. Domains in gene silencing and cell differentiation proteins: The novel PAZ domain and redefinition of the Piwi domain. Trends Biochem. Sci. 25: 481-482. Dalmay, T., Hamilton, A., Rudd, S., Angell, S., and Baulcombe, D.C. 2000. An RNA-dependent RNA polymerase in Arabidopsis is required for posttranscriptional gene silencing mediated by a transgene but not by a virus. Cell 101: 543-553. Dalmay, T., Horsefield, R., Braunstein, T.H., and Baulcombe, D.C. 2001. SDE3 encodes an RNA helicase required for posttranscriptional gene silencing in Arabidopsis. EMBO J. 20: 2069-2078. DiLaurenzio, L., Wysocka-Diller, J., Malamy, J.E., Pysh, L., Helariutta, Y., Freshour, G., Hahn, M.G., Feldmann, K.A., and microRNAs in plants Benfey, P.N. 1996. The SCARECROW gene regulates an Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans asymmetric cell division that is essential for generating the radial organization of the Arabidopsis root. Cell 86: 423- heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843-854. 433. Eddy, S.R. 2001. Non-coding RNA genes and the modem RNA world. Nat. Rev. Genet. 2: 919-929. Elbashir, S.M., Leneckel, W., and Tuschl, T. 2001. RNA interference is mediated by 21- and 22-nucleotide RNAs. Genes & Dev. 15: 188-200. Fagard, M., Boutet, S., Morel, J.-B., Bellini, C., and Vaucheret, H. 2000. AGO1, QDE-2, and RDE-1 are related proteins re- Lynn, K., Fernandez, A., Aida, M., Sedbrook, J., Tasaka, M., Masson, P., and Barton, M.K. 1999. The PINHEAD/ZWILLE quired for post-transcriptional gene silencing in plants, quelling in fungi, and RNA interference in animals. Proc. gene acts pleiotropically in Arabidopsis development and has overlapping functions with the ARGONAUTE1 gene. Development 126: 469-481. Matzke, M.A., Matzke, A.J., Pruss, G.J., and Vance, V.B. 2001. RNA-basedsilencing strategies in plants. Curr. Opin. Genet. Dev. 11: 221-227. Morel, J., Mourrain, P., B&clin, C., and Vaucheret, H. 2000. DNA Natl. Acad. Sci. 97: 11650-11654. Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., methylation and chromatin structure affect transcriptional and posttranscriptional transgene silencing in Arabidopsis. Curr. Biol. 10: 1591-1594. and Mello, C.C. 1998. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Morel, J.B., Godon, C., Mourrain, P., Bclin, C., Boutet, S., Feuerbach, F., Proux, F., and Vaucheret, H. 2002. Fertile hy- Nature 391: 806-811. Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., Ruvkun, G., and Mello, C.C. 2001. Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106: 23-34. Hamilton, A.J. and Baulcombe, D.C. 1999. A novel species of small antisense RNA in posttranscriptional gene silencing. Science 286: 950-952. pomorphic ARGONAUTE (agol) mutants impaired in posttranscriptional gene silencing and virus resistance. Plant Cell 14: 629-639. Mourelatos, Z., Dostie, J., Paushkin, S., Sharma, A., Charroux, B., Abel, L., Rappsilber, J., Mann, M., and Dreyfuss, G. 2002. miRNPs: A novel class of ribonucleoproteins containing nu- merous microRNAs. Genes & Dev. 16: 720-728. E., Beach, D., and Hannon, G.J. Mourrain, P., Beclin, C., Elmayan, T., Feuerbach, F., Godon, C., Morel, J.B., Jouette, D., Lacombe, A.M., Nikic, S., Picault, N., et al. 2000. Arabidopsis SGS2 and SGS3 genes are re- 2000. An RNA-directed nuclease mediates posttranscriptional gene silencing in Drosophila cells. Nature 404: 293- quired for posttranscriptional gene silencing and natural virus resistance. Cell 101: 533-542. 296. Hammond, S.M., Boettcher, S., Caudy, A.A., Kobayashi, R., and Hannon, G.J. 2001. Argonaute2, a link between genetic and biochemical analyses of RNAi. Science 293: 1146-1150. HutvAgner, G. and Zamore, P.D. 2002. RNAi: Nature abhors a Moussian, B., Schoof, H., Haecker, A., Jurgens, G., and Laux, T. 1998. Role of the ZWILLE gene in the regulation of central Hammond, S.C., Bernstein, double-strand. Curr. Opin. Genet. Dev. 12: 225-232. HutvAgner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D. 2001. A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293: 834-838. Jacobsen, S.E., Running, M.P., and Meyerowitz, E.M. 1999. Dis- ruption of an RNA helicase/RNAseIII gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126: 5231-5243. Ketting, R.F. and Plasterk, R.H.A.2000. A genetic link between co-suppression and RNA interference in C. elegans. Nature 404: 296-298. Ketting, R.F., Fischer, S.E.J., Bernstein, E., Sijen, T., Hannon, G.J.,and Plasterk, R.H.A.2001. Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes & Dev. 15: 26542659. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of novel genes coding for small expressed RNAs. Science 294: 853-858. Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T. 2002. Identification of tissue-specific microRNAs from mouse. Curr. Biol. 12: 735-739. Lai, E.C. 2002. MicroRNAs are complementary to 3' UTR mo- tifs that mediate negative post-transcriptional regulation. Nat. Genet. 30: 363-364. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. 2001. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294: 858-862. Lee, R.C. and Ambros, V. 2001. An extensive class of small RNAs in Caenorhabditis elegans. Science 294: 862-864. shoot meristem cell fate during Arabidopsis embryogenesis. EMBO . 17: 1799-1809. Nagy, F., Kay, S.A., and Chua, N.-H. 1988. Analysis of gene expression in transgenic plants. In Plant molecular biology manual ed. S.B. Gelvin and R.A. Schilperoort), Part B4, pp. 1-29. Kluwer, Dordrect. Nykaken, A., Haley, B., and Zamore, P.D. 2001. ATP require- ments and small interfering RNA structure in the RNA interference pathway. Cell 107: 309-321. Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M., Maller, B., Srinivasan, A., Fishman, M., Hayward, D., Ball, E., et al. 2000. Conservation across animal phylogeny of the sequence and temporal regulation of the 21 nucleotide let-7 heterochronic regulatory RNA. Nature 408: 86-89. Pysh, L.D., Wysocka-Diller, J.W., Camilleri, C., Bouchez, D., and Benfey, P.N. 1999. The GRASgene family in Arabidopsis: Sequence characterization and basic expression analysis of the SCARECROW-LIKE genes. Plant 1. 18:111-119. Ray, A., Lang, J.D., Golden, T., and Ray, S. 1996a. SHORT INTEGUMENT (SIN1), a gene required for ovule development in Arabidopsis, also controls flowering time. Development 122: 2631-2638. Ray, S., Golden, T., and Ray, A. 1996b. Maternal effects of the short integument mutation on embryo development. Dev. Biol. 180:365-369. Reinhart, B.J., Slack, F.J., Basson, M., Bettinger, J.C., Pasquinelli, A.E., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G. 2000. The 21 nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans. Nature 403: 901-906. Robinson-Beers, K., Pruitt, R.E., and Gasser, C.S. 1992. Ovule development in wild-type Arabidopsis and two female-sterile mutants. Plant Cell 4: 1237-1249. Schupbach, T. and Wieschaus, E. 1991. Female sterile muta- GENES& DEVELOPMENT 1625 Reinhart et al. tions on the second chromosome of Drosophila melanogaster II. Mutations blocking oogenesis or altering eggmorphology. Genetics 129: 1119-1136. Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and Ruvkun, G. 2000. The lin-41 RBCC gene acts in the C. el- egans heterochronic pathway between the let-7 regulatory RNA and the LIN-29 transcription factor. Mol. Cell 5: 659669. Tabara, H., Sarkissian, M., Kelly, W.G., Fleenor, J., Grishok, A., Timmons, L., Fire, A., and Mello, C.C. 1999. The rde-1 gene, RNA interference, and transposon silencing in C. elegans. Cell 99: 123-132. Vance, V. and Vaucheret, H. 2001. RNA silencing in plantsDefense and counterdefense. Science 292: 2277-2280. Wightman, B., Ha, I., and Ruvkun, G. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell 75: 855-862. Williams, R.W. and Rubin, G.M. 2002. ARGONAUTE1 is required for efficient RNA interference in Drosophila embryos. Proc. Natl. Acad. Sci. 99: 6889-6894. Yu, ., Hu, S., Wang, J., Wong, G.K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., et al. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296: 7992. Zamore, P.D., Tuschl, T., Sharp, P.A., and Bartel, D.P. 2000. RNAi: Double-stranded RNA directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals. Cell 101: 25-33. 1626 GENES& DEVELOPMENT Cell, Vol. 115,787-798,December26, 2003,Copyright©2003 by CellPress Prediction of Mammalian MicroRNA Targets 4 4 Benjamin P. Lewis,', I-hung Shih,2, Matthew W. Jones-Rhoades, 1' 2 David P. Bartel,' 2.* and Christopher B. Burge'l* 'Department of Biology Massachusetts Institute of Technology Cambridge, Massachusetts 02139 2 Whitehead Institute for Biomedical Research 9 Cambridge Center Cambridge, Massachusetts 02142 that they could have many more regulatory functions than those uncovered to date (Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001; Lai et al., 2003; Um et al., 2003a, 2003b). The regulatory roles of the vertebrate miRNAs in particular remain unknown. The possibility that many mammalian miRNAs play im- portant roles during development and other processes is supported by their tissue-specific or developmental stage-specific expression pattems as well as their evo- lutionary conservation,which is very strong within mamSummary MicroRNAs (miRNAs) can play important gene regulatory roles in nematodes, insects, and plants by basepairing to mRNAs to specify posttranscriptional repression of these messages. However, the mRNAs regulated by vertebrate miRNAs are all unknown. Here we predict more than 400 regulatory target genes for the conserved vertebrate miRNAs by identifying mRNAs with conserved pairing to the 5' region of the miRNA and evaluating the number and quality of these complementary sites. Rigoroustests using shuffled miRNA controls supported a majority of these predictions, with the fraction of false positives estimated at 31% for targets identified in human, mouse, and rat and 22% for targets identified in pufferfish as well as mammals. Eleven predicted targets (out of 15 tested) were supported experimentally using a HeLa cell reporter system. The predicted regulatory targets of mammalian miRNAs were enriched for genes involved in transcriptional regulation but also encompassed an unexpectedly broad range of other functions. Introduction MicroRNAs are endogenous "22 nt RNAs that can play important gene regulatory roles by pairing to the mes- sages of protein-coding genes to specify mRNA cleavage or repression of productive translation (Lai, 2003; Bartel, 2004). The first to be discovered were the lin-4 and let-7 miRNAs, which are components of the gene regulatorynetwork that controls the timing of C. elegans larval development (Lee et al., 1993; Wightman et al., 1993; Moss et al., 1997; Reinhart et al., 2000; Abrahante et al., 2003; Lin et al., 2003). More recently discovered miRNA functions include the control of cell proliferation, cell death, and fat metabolism in flies (Brennecke et al., 2003; Xu et al., 2003) and the control of leaf and flower development in plants (Aukerman and Sakai, 2003; Chen, 2003; Emery et al., 2003; Palatnik et al., 2003). MicroRNA genes are one of the more abundant classes of regulatory genes in animals, estimated to comprise between 0.5 and 1 percent of the predicted genes in worms, flies, and humans, raising the prospect *Correspondence:dbarteltwi.mit.edu (D.P.B.), cburgemit.edu (C.B.B.) 4 Theseauthorscontributedequallyto this work. mals and often extends to invertebrate homologs (Pasquinelli et al., 2000; Aravin et al., 2001; Lagos-Quintana et al., 2001, 2002, 2003; Lau et al., 2001; Lee and Ambros, 2001; Ambros et al., 2003b; Dostie et al., 2003; Houbaviy et al., 2003; Krichevsky et al., 2003; Lai et al., 2003; Lim et al., 2003a, 2003b; Moss and Tang, 2003). Indeed, miR-181, one of the many miRNAs conserved among vertebrates, is preferentially expressed in the B lymphocytes of mouse bone marrow, and the ectopic expres- sion of this miRNA in hematopoietic stem/progenitor cells modulates blood cell development such that the proportion of B lymphocytes increases (Chen et al., 2003). However, regulatory targets have not been estab- lished or even confidently predicted for any of the vertebrate miRNAs, which has slowed progress toward un- derstanding the functions of these tiny noncoding RNAs in humans and other vertebrates. Finding regulatory targets is much easier for the plant miRNAs. In a systematic search for the targets of 13 Arabidopsis miRNA families, 49 unique targets were found with a signal-to-noise ratio exceeding10:1,simply by looking for Arabidopsis messageswith near-perfect complementarity to the miRNAs (Rhoades et al., 2002). Confidence in many of these predictions was bolstered by the observation that the complementarity is conserved among rice orthologs of the miRNAs and messages (Rhoades et al., 2002), and many of the 49 have since been confirmed experimentally (Uave et al., 2002; Emery et al., 2003; Kasschau et al., 2003; Tang et al., 2003). These predicted targets were greatly enriched in transcription factors involved in developmental patteming or stem cell maintenance and identity, sug- gesting that many plant miRNAsfunction during cellular differentiation to clear regulatory gene transcripts from daughter cell lineages, perhaps enabling more rapid dif- ferentiation without having to depend on regulatory genes having constitutively unstable messages (Rhoades et al., 2002). An analogous search for near-perfect pairing between the miRNAs and messages of C. elegans and Drosophila genes did not uncover more hits than would be expected by chance (Rhoadeset al., 2002). More sophisticated methods for predicting targets of insect miRNAs have recently been published (Stark et al., 2003) or submitted (Enright et al. http://genomebiology. com/2003/4/11/P8). The method of Stark et al. (2003) provides lists of candidate target genes that when used in combination with additional biological criteria, including functional relationshipsshared among predicted targets of individual miRNAs,led to validation of six targets for two Drosophila miRNAs (Stark et al., 2003). The cur- rent Drosophila analyses do not include estimates of false positive rates, leaving open the question of the accuracy of these methods in cases where predicted targets of a miRNA do not have clear functional relatedness. In the present study, we describe an approach that predicts hundreds of mammalian miRNA targets and provide computational and experimental evidence that most are authentic, allowing us to begin to explore fundamental questions about miRNA:target relationships inanimals. Pairing to the 5' portion of the miRNA, particularly nucleotides 2-8, appears to be most important for target recognition by vertebrate miRNAs. As seen previously for plant miRNAs, the predicted regulatory targets of mammalian miRNAs are enriched for genes involved in transcriptional regulation. In addition, the predicted mammalian regulatory targets encompass an unexpectedly broad range of other functions. Indeed, several lines of evidence imply that the targets identified in this initial analysis are only a fraction of the total, supporting the possibility that miRNAs regulate the expression of a large portion of the mammalian transcriptome. A SMAD-1 5' UGCCU---CUGGAAAACUAUUGAGCCUUGCAUGUACUUGAAG 1111 miR-26a SMAD-1 i iiil 5' GAGCCUU ----- GAUAAUACUUGAC 11111 III iiliii UCGGAUAGGACCUA--AUGAACUU 5' -17.0 kcalmnol miR-26a Z=e IIII1 UCGGAUAGGACCUA ------------------- GAA(IU -21 8 kcallmol 17.W020 21. 620 -dGye-dG2jT +e = e + e = 5.3 B Results and Discussion An Algorithm for Predicting Vertebrate MicroRNA Targets To identify the targets of vertebrate miRNAs, we developed an algorithm called TargetScan (the TargetScan software is available for download at http://genes.mit. edu/targetscan), which combines thermodynamics-based modeling of RNA:RNA duplex interactions with comparative sequence analysis to predict miRNA targets conserved across multiple genomes (Figure 1). Given an miRNA that is conserved inmultiple organisms and a set of orthologous 3' UTR sequences from these organisms, TargetScan (1)searches the UTRs inthe first organism for segments of perfect Watson-Crick complementarity to bases 2-8 of the miRNA (numbered from the 5' end)-we refer to this 7 nt segment of the miRNA as the "miRNA seed" and UTR heptamers with perfect WatsonCrick complementarity to the seed as "seed matches"; (2)extends each seed match with additional base pairs to the miRNA as far as possible in each direction, allowing G:U pairs, but stopping at mismatches; (3)optimizes basepairing of the remaining 3' portion of the miRNA to the 35 bases of the UTR immediately 5' of each seed match using the RNAfold program (Hofacker et al., 1994), thus extending each seed match to a longer "target site"; (4)assigns a folding free energy G to each such miRNA:target site interaction (ignoring initiation free energy) using RNAeval (Hofacker et al., 1994); (5) assigns a Z score to each UTR, defined as: Z = I, e - G i, where n is the number of seed matches inthe 0 100 200 300 400 500 Z 7 600 . mn . -- - ., 5.3 4.8 4.9 5.2 Rank 45 72 76 16 Figure 1. Prediction of miRNA Targets (A)Structures, energies, and scoring for predicted RNA duplexes involving human miR-26a and two target sites in the 3' UTR of the human SMAD-1 gene, with seeds and seed matches in red and seed extension in blue. (B) Schematic for identification of targets conserved across mammals (upper) and targets conserved in mammals and fish (lower). The number of genes from each organism with identified orthologs in every other organism is indicated. (C)Positions of two target sites for miR-26a (blue) in orthologous SMAD-1 3' UTR sequences from human (Hs), mouse (Mm), rat (Rn), and Fugu (Fr), with the Z score and rank of each miRNA:UTR pair, with T = 20. k=l UTR, Gk is the free energy of the miRNA:target site interaction (kcal/mol) for the k0 target site evaluated inthe previous step, and T is a parameter described below (UTRs that have no seed match are assigned a Z score of 1.0); (6) sorts the UTRs in this organism by Z score and assigns a rank Ri to each; (7)repeats this process for the set of UTRs from each organism; and (8)predicts as targets those genes for which both Zi - Zc and Ri Rc for an orthologous UTR sequence ineach organism, where Zc and Rc are pre-chosen Z score and rank cutoffs. The only free parameters in this protocol are Rc and Zc, and the T parameter inthe formula relating predicted MammalianmicroRNATargets 789 free energy to Z score. The value of the T parameter influences the relative weighting of UTRs with fewer high-affinity target sites to those with larger numbers of low-affinity target sites, and in this sense is analogous to temperature. However, there is no thermodynamic meaning to the T parameter or the Z scores used in this analysis; they merely provide a convenient means of 17166, decreased to 14539 ortholog sets in humanmouse-rat and 10276 ortholog sets in human-mouserat-Fugu. In addition, some miRNA:target interactions weighting and summing predicted folding free energies. example, although most known invertebrate miRNAtarget sites have 7 nt Watson-Crick seed matches (or longer matches), some do not, such as lin-41, a target Suitable values for Rc, Zc, and T were assigned by optimization over a range of reasonablevalues using separate training and test sets of miRNAs. TargetScan was initially applied using two sets of miRNAs: a nonredundant pan-mammalian set of 79 miRNAs that have homologs in human, mouse, and pufferfish and identical sequence in human and mouse, but not necessarilypufferfish, and a nonredundantpanvertebrate set of 55 miRNAs that have identical se- might not be conserved between mammals and fish. Another likely factor is that some features used by TargetScan to achieve an acceptable signal:noise ratio might not be strictly required for miRNA regulation. For of the C. elegans let-7 miRNA (Lee et al., 1993; Wightman et al., 1993; Moss et al., 1997; Reinhart et al., 2000; Abrahante et al., 2003; Brennecke et al., 2003; Lin et al., 2003). Thus, increasing the number of species increases the probability that the orthologous UTRof one or more species harbors functional sites that fail to satisfy the criteria requiredfor TargetScandetection. Nonetheless, quence in human, mouse, and pufferfish (LagosQuintana et al., 2001, 2002, 2003; Mourelatos et al., 2002; Dostie et al., 2003; Lim et al., 2003a). These sets, referred to as nrMamm and nrVert, respectively (Supplemental Table S1 at http://www.cell.com/cgi/content/fulV1 15/7/ 787/DC1), are nonredundant in that when multiple miRNAs in 115 cases involving the UTRs of 107 genes, the pre- had identical seed heptamers, a single representative was chosen. The initial use of miRNAsthat were both nonredundant and perfectly conserved among the queried species simplified the analysis of signal to noise. It is of utmost importance in this type of bioinformatic analysis to ensure that the shuffled control sequences preserve all relevant compositional features of the au- Prediction of 400 Targets of Mammalian MicroRNAs at a Signal:Noise Ratio of 3.2:1 To predict mammalian miRNA targets, the nrMamm set of miRNAs was searched against orthologous human, mouse,and rat 3' UTRsderived from the Ensemblclassification of orthologous genes. Using Rc = 200, Zc = 4.5, and T = 20, TargetScan identified 451 putative miRNA: target interactions (representing 400 distinct genes), an average of 5.7 targets per miRNA (Figure 2A). This num- ber of predicted targets (the "signal") was compared to the number of targets predicted for cohorts of shuffled (i.e., randomly permuted) miRNAs (the "noise'). As de- scribed below, these shuffled sequenceswere carefully screened to ensure that our estimates of noise were as accurate as possible and not artifactually low. An average of only 1.8 targets were identified per shuffled miRNA sequence, for a signal:noise ratio of 3.2:1. This ratio was higher than the roughly 2:1 ratio observed for targets of the nrMamm miRNA set predicted using only the human and mouse UTRs (Figure 2A), underscoring the importance of evolutionaryconservationacross multiple genomes in our approach. The signal:noise ratio improved to 4.6:1 when conservation was required addi- dicted target sites were sufficiently conserved to be detected by TargetScan in orthologous UTRs from all four vertebrates (details of these predictions are given in Supplemental Table S5 and Figure S1A on the Cell website). thentic miRNAs. For example, when compared to the seeds of shuffled cohorts that had not been screened to control for the expected number of target sites and the expected strength of miRNA:targetsite interactions, the seeds of vertebrate miRNAs have approximately 1.4 times as many seed matches in vertebrate UTRs. Specifically, the seeds of vertebrate miRNAs each had an average of about 2100 perfect-complement matches in masked vertebrate UTR regions whereas random heptamers with the same base composition averaged only about 1500 matches. The high number of additional matches seen for the miRNA seed (and also for the antisense of the seed) argues strongly against the biological significance of most of these matches. Instead, these excess matches appear to be the consequence of dinucleotide composition biases shared between vertebrate miRNAs and UTRs, which must be controlled for in order to avoid artificially highestimates of TargetScan signal:noiseratios (particularly in an algorithmthat looks for multiple matches). Therefore, it was important to ensure that the shuffled miRNA controls matched the corresponding miRNAs closely in all sequence proper- ties that impact the expected number and quality of TargetScan target sites. The properties we considered were (1)the expected frequency of seed matches in the UTR dataset; (2) the expected frequency of matching to tionally in the fourth and most divergent species, Fugu rubripes, using the nrVert set of miRNAs (Figure 2A). Although the signal:noise ratio improved as more genomes were included, the number of predicted targets the 3' end of the miRNA;(3)the observed count of seed per miRNA decreased-even though Rc and Zc were relaxed to 350 and 4.5, respectively, and the value T = 10 was used for the four-species analysis (Figure 2A). domized control sequences that possess all of these Severalfactors might contribute to this effect, including the increased chance that an orthologous gene will be missing from the annotations of one genome as the number of organisms is increased. For example, the number of ortholog pairs available in human-mouse, matches in the UTR dataset; and (4) the predicted free energy of a seed:seed match duplex. A miRNA shuffling protocol, MiRshuffle, was developed to generate ranproperties. For a given miRNA sequence, MiRshuffle generates a series of random permutations with the same length and base composition as the miRNA,until a shuffled sequence is found that matches the parent miRNA closely in each of the four criteria listed above. The MiRshuffle procedure calculated expected frequencies using a first-order Markov model of 3' UTR Cell 790 7.0 6.0 A 5.0' 12.0 4.0' i 30 k 2.0 10.0 E 8.0 ' IL1.0 0.0 C 6.0 X 4.0 2.0 w 0.0 human mouse human mouse rat human mouse rat Fugu 1..7 2.8 5' end 3..9 4.10 5..11 6..12 713 -13-7-12-6-11-5-10..-4 -9-3 -8..-2 -7.-1 Positionof miRNAseed 3' end 10 I 5 1.7 2..8 39 5' end ,, 4.10 5..11 ..12 713 i -13..-7-12.-6 -11.- 5 -10..-4-9-3 -8.-2 -7..-1 Poi 3 end Positic onof haptamer Figure2. PredictedmiRNATargetsConservedin MultipleGenomes (A) Meannumberof predictedtargets per miRNAfor authenticmiRNAs(filled bars)and meanand standarderror of numberof predicted targets per shuffledsequencefor four cohortsof randomizedmiRNAs(open bars).Genomesusedfor identificationof targetsare listed below correspondingbars.The nrMammset of 79 miRNAswas usedfor human/mouseand human/mouse/rat;the nrVertset of 55 miRNAswas usedfor human/mouse/rat/Fugu. (B) Meannumberof targets per miRNAusingthe human/mouse/ratUTRset and alternativemiRNAseed positionsfor the nrVertmiRNAs (filled bars)and for cohortsof shuffledcontrols(openbars).Positionsof seed heptamerare indicatedunderbars;positivenumbersindicate positionrelativeto 5' end of miRNA,negativenumbersindicatepositionsrelativeto 3' endof miRNA.Notethat the signal:noisefor the seed at 2..8differs slightlyfrom that of the human/mouse/ratanalysisin (A) becausea differentset of miRNAswas used. (C)ConservedheptamersamongparalogoushumanmiRNAs.Foreachposition,the numberof differentheptamersthat are perfectlyconserved across multiplemiRNAsin rMammis shown. composition that accounts for the long-recognized impact of dinucleotide frequency biases on the counts of 5.7 - 1.8 = 3.9 true targets conserved across mammals tional control, anothershuffling protocol was developed, per miRNA (Figure 2A). A number of factors limit the sensitivity of our method, including (1) the incompleteness of orthologous gene annotations; (2) the pos- longer oligonucleotides (Nussinov, 1981). As an addi- DiMiRshuffie, which preserves the precise dinucleotide sibility that some targets do not meet our stringent seed composition of both the seed and the 3' end of the matching, Z score, or rank criteria; (3)the possibility that miRNA, as well as the seed match count and seed:seed match folding free energy. This protocol is less general some mammaliantarget sites lie outside the 3' UTR,as often observed for plant miRNAs(Rhoadeset al., 2002); than MiRshuffle in that not every oligonucleotide can be randomizedwhile preservingexact dinucleotide composition-e.g., the only heptamer with the same dinucleotide composition as the miR-100 seed, ACCCGUA, is ACCCGUAitself. Nevertheless,it was possibleto generate DiMiRShuffled controls for 47 of the 79 nrMamm (4) the requirement that targets be conserved in the complete set of organisms; and (5) the limitation that our method does not model the simultaneousinteraction of multiple miRNA species with the same UTR. Thus, the actual number of target genes regulated by each miRNA is likely to be substantially higher. miRNAs, and a signal:noise ratio of 3.5 was observed using this control in the three-mammal analysis (data not shown), comparable to the value obtained for MiRshuffled controls. Because of its wider applicability, MiRshuffle was used in all reported experiments. In summary, even when the shuffled control se- quences were carefully selected to closely match the corresponding miRNAs in all sequence properties expected to influence the number and quality of target sites, these shuffled controls yielded far fewer targets than did the authentic miRNA sequences. This difference results from an increased propensity of vertebrate UTRsto contain multiple conserved regions of complementarity to authentic miRNAs. We conclude that this propensity reflects a functional relationshipbetweenthe miRNAs and the identified UTRs-that is, to the extent that the signal exceeds the noise,these identified UTRs are the regulatory targets of the miRNAs. Correcting for the estimated rate of false positives, TargetScan appears to have identified an average of The Conserved 5' Region of Mammalian MicroRNAs Is Most Important for Target Identification TargetScantreats the 5' and 3' ends of miRNAsdifferently, with perfect basepairing required for the seed at the 5' end, but no such requirement at the 3' end. The importance of complementarityto the 5' portion of invertebrate miRNAshas been suspected since the observation that complementary sites within the lin-14 mRNA have "core elements" of complementarityto the 5' segment of the lin-4 miRNA (Wightman et al., 1993) and has been corroborated with the observation that the 5' segments of numerous invertebrate miRNAs are perfectly complementary to 3' UTR elements that mediate posttranscriptional regulation or are known miRNA targets (Lai, 2002; Stark et al., 2003).Moreover, the 5' ends of related miRNAs tend to be better conserved than the 3' ends (Lim et al., 2003b), further supporting the MammalianmicroRNATargets 791 hypothesis that these segments are most critical for the rMamm set was restricted to those miRNAs with mRNA recognition. recognized Fugu homologs. The higher signal seen for To explore this hypothesis, TargetScan was applied to predict targets of the nrVert miRNA set conserved the more broadly conserved miRNAscan be explained by the idea that miRNAswith larger numbers of targets would be under greater selective constraint, and therefore less likely to change during the course of evolution. between human, mouse, and rat using versions of the algorithm differing in the miRNA heptamer defined as the seed in step 1 (Figure 2B). Consistent with residues Thus, more broadly conserved miRNAs would be likely at the 5' end of miRNAsbeing most important for target recognition, the highest signal:noise ratio was observed to have more targets and consequently a higher Tar- when the seed was positioned at or near the extreme 5' end of the miRNA, with signal:noise values of 2.7, 3.4, and 1.6 observed for seeds at segments 1..7, 2..8, and 3..9, respectively, and signal:noise ratios of 1.3 or less conclusion that TargetScan is detecting authentic targets because otherwise it would be difficult to explain the observed difference in signal:noisefor broadly con- at other seed positions. We suggest that the critical importance of pairing to segment 2..8for target identification in silico reflects its importance for target recognition in vivo and speculate that this segment nucleates pairing between miRNAs and mRNAs. getScan signal. This observation again supports the served miRNAs relative to that of less broadly conserved miRNAs. The 854 miRNA:UTR pairs represented UTRs of just 442 distinct genes because many genes were hit by multiple miRNAs. In these cases, the miRNAs were usually, but not always, from the same paralogous miRNA Thoseseed positions that had the highestsignal:noise family, often with the same seed heptamer. In those ratios in the sliding seed analysis (Figure 2B) also had the highest degree of heptamer conservation in paralogous human miRNAs (Figure 2C). This observation strengthens the assertion that the signal seen above noise in cases where the same UTR was hit by multiple miRNAs from different families (54genes),the target sites generally did not overlap, consistent with simultaneous binding and regulation of some target genes by combina- our analysis reflects a functional relationship between tions of miRNAs.A complete list of the 442target genes the miRNAs and the identified UTRs because otherwise and the corresponding miRNAs is provided (Supplemen- it would be difficult to explain why the most conserved portions of the miRNA and not other miRNA segments have the greatest propensity to match multiple con- tal Figure SIB and Table S2 on the Cell website). An served segments in UTRs. abbreviated list appears as Table 1, where genes were chosen on the basis of high biological interest. Genes involved in transcription, signal transduction, and cell- cell signaling dominate this list, including a number of The Number of Predicted Targets Is Greatest for the Most Highly Conserved MicroRNAs The set of target genes predicted using conservationof miRNAcomplementarityacross the three mammalswas most suitable in size and quality for systematic analysis of gene function. To obtain as large a set of targets as possible, we searched our set of orthologous mammalian 3' UTRs using an expanded set of 121 conserved mammalian miRNAs (rMamm, Supplemental Table Si on Cell website) that includes miRNAs that were excluded from the nrMamm set because they had redun- dant seeds,yielding a total of 854 predicted miRNA:UTR pairs conserved across human, mouse, and rat (Supple- mental Figure SIB). This number of predicted targets (854) represents an 89% increase over the 451 targets predicted for the nrMamm miRNAs, even though the human disease genes such as the tumor suppressor gene PTEN and the protooncogenes E2F-1, N-MYC, C-KIT, FLI-1, and LIF. Experimental Support for 11 Predicted Regulatory Targets Reporter assays were used to test 15 predicted targets of mammalian miRNAs in HeLa cells. The 15 targets selected for these experiments all had known biological functions but resembledthe complete set of predictions in other respects, e.g., there was no significant difference in the average Z score, rank, or number of target sites per mRNA between the tested targets and the complete set of predicted targets. In only one case did the tested targets of a miRNA have obvious functional number of miRNAs used increased by only 53% from relatedness (NOTCH1, a receptor for DELTA1, both predicted targets of miR-34). Three of the 15 genes, 79 to 121. This discrepancy prompted us to ask whether SMAD-1, BRN-3b, and Notchl, were also in the set of membership in a multi-miRNA gene family influenced predicted targets conserved to Fugu. Eight genes were the abundance of targets. Indeed, we found that the 27 predicted targets of miRNAsthat had been cloned from miRNAs in nrMamm that were members of paralogous HeLa cells (Lagos-Quintana et al., 2001; Mourelatos et al., 2002), and three genes were predicted targets of miR-34, which is also expressed in HeLa cells, based on Northern analysis (data not shown). For these 11 miRNA families, i.e., families with variant miRNAs that have the same seed, had an average of 8.7 predicted targets per miRNA, more than twice the average of 4.2 seen for the remaining 52 nrMamm miRNAs, although the difference in signal:noise between these two sets was not as pronounced. When initially expanding our list of mammalian miRNAs, we found that the set of 19 mammalian miRNAs that were conserved between human and rodents but for which a Fugu homolog was not found gave an unacceptably low signal:noise ratio of 1.2:1,even though the analysis did not extend to the Fugu UTRs.Accordingly, genes, a 100 to 1200 nt 3' UTR segment that included miRNA target sites was inserted downstream of a firefly luciferase ORF, and luciferase activity was compared to that of an analogous reporter with point substitutions disrupting the target sites (as illustrated for SMAD-1, Figure 3A). Of these 11 UTRs, mutations in eight (SMAD-1, SDF-1, BRN-3b, ENX-1, N-MYC, PTEN, Deltal, and Notch1, but not HOX-A5,MECP-2, or VAMP-2)significantly enhanced expression (p < 0.001),as expected if Cell 792 Table1. HighlyCited PredictedTargetsof MammalianmiRNAs Category Seed miRNAs EnsemblID GeneName Regulation of transcription/ DNAbinding AGUGCAA GUGCAAA AAAGUGC GAGGUAG GAAAUGU ACAGUAC GAGGUAU AAUCUCA UAAGGCA GCUGGUG AAAGUGC UCCAGUU GCAGCAU GGAAGAC UAAGGCA UGGUCCC UCACAUU GCUACAU GGAAUGU UAAGGCA GGCAGUG CCCUGAG AGUGCAA UCACAGU AAUACUG GAAAUGU AUUGCAC GCUGGUG GUAAACA AUUGCAC GAGAACU GGCUCAG GAGAUGA AGCUGCC GCAGCAU GUGCAAA AGUGCAA GGAAUGU UUGGCAC AGCACCA AGCACCA AUUGCAC AAGUGCU AAAGUGC CCCUGAG miR-130,-130b miR-19a miR-20,-106 let-7(a-g,i),miR-98 miR-203 miR-101 miR-202 miR-216 miR-124a miR-138 miR-20,-106 miR-145 miR-103,-107 miR-7 miR-124a miR-133,-133b miR-23a,-23b miR-221,-222 miR-1,-206 miR-124a miR-34 miR-125a,-125b miR-130,-130b miR-27a miR-200b miR-203 miR-25,-92 miR-138 miR-30(a-e) miR-25,-92 miR-146 miR-24 miR-143 miR-22 miR-103,-107 miR-19a,-19b miR-130,-130b miR-1,-206 miR-96 miR-29b,-29c miR-29b,-29c miR-25,-92 miR-93 miR-20,-106 miR-125a,-125b 169057 169057 101412 100823 125347 134323 134323 065978 163403 054598 103479 151702 137309 136826 168610 010610 107562 157404 176697 154188 148400 128342 184371 184371 008710 122641 065559 070886 156052 156052 175104 166484 166484 166484 141433 171862 130164 160211 101986 168542 114270 168090 168090 168090 160613 Methyl-CPG-binding protein2 (MECP2) Signal transduction/ cell-cell signaling Other TranscriptionfactorE2F1 DNA-(apurinic or apyrimidinicsite)lyase (APEN) Interferonregulatoryfactor 1 RF-1). N-MYCprotooncogeneprotein ... Nucleasesensitiveelementbindingprotein1 (YB-1) Microphtalmia-associated transcriptionfactor Forkheadbox protein C1 (FKHL7) Retinoblastoma-like protein2 (RBR-2) Friendleukemiaintegration1 transcriptionfactor (FLI-1) High mobilitygroup protein HMG-I/HMG-Y (HMG-I(Y)) Kruppel-likefactor 4 (EZF) Signaltransducerand act. of transcription3 (STAT3) T cell surfaceglycoproteinCD4precursor Stromalcell-derivedfactor 1 precursor(SDF-1) Mast/stemcell growthfactor receptorprecursor(C-KIT) Brain-derivedneurotrophicfactor precursor(BDNF) Angiopoietin-1precursor(ANG-1) Notchhomologprotein 1 precursor(HN1) Leukemiainhibitoryfactor precursor(LIF) Macrophagecolony stimulatingfactor-1 precursor(MCSF) Polycystin1 precursor InhibinbetaA chainprecursor(EDF) Dual spec.mitogen-activatedproteinkinasekinase4 Ephrintype-areceptor8 precursor(HEK3) Guaninenucleotide-bindingprotein G(l),alpha-2 subunit .. .. TNF receptor-associated factor 6 (TRAF6) Mitogen-activatedprotein kinase7 (ERK4) Pituitaryadenylatecyclase act. polypeptideprecursor Phosphatidylinositol-3,4,5-trisphos. 3-phosphatase(PTEN) Low-densitylipoproteinreceptorprecursor(LDLR) Glucose-6phosphate1-dehydrogenase (G6PD) Adrenoleukodystrophy protein (ALDP) Collagenalpha 1(111) chain precursor Collagenalpha 1(VII)chainprecursor COP9subunit6 Proproteinconvertasesubtilisin/kexintype 7 precursor The 442 predictedtargets conservedbetweenhuman,mouseand rat were ranked basedon the numberof referenceslisted in the RefSeq GenBankflatfiles (11/10/03download).Thetop 37 most referencedpredictedtargets are shown,groupedon the basis of GeneOntology annotations.The last six digits of the EnsemblID are shown (ENSG00000#). MicroRNAswith differentseedsthat target the same UTRare listed on separatelines. the endogenous miRNAs in the HeLa cells were specifying the repressionof reporter geneexpression by pairing to the predicted target sites (Figure3B). Significantly enhanced expression was also observedwhen the analogous experiment was performed using either the fulllength C. elegans lin-41 3' UTR or a 124 nt segment of the UTR containing the two previously proposed let-7 miRNA target sites (Reinhart et al., 2000), indicating that at least some of the repression of lin-41 observed in C. elegans can be recapitulated by HeLa let-7 miRNA in this heterologousreporter assay (Figure3B). For all eight predicted human targets of endogenous HeLa miRNAs that responded to mutations, the increasein expression seen when disrupting the pairing to the miRNA seed was at least as high as that seen for mutations in the let-7 target sites of lin-41 (Figure 3B). Four tested genes (G6PD, BDNF, MCSF, and LDLR) were predicted targets of miR-1 and miR-130, two miRNAs that had not been cloned from HeLa cells and were not detected by Northem analysis. Initially, reporters containing UTR segments from these four genes were examined for response to transfected miRNAs (Doench et al., 2003) (data not shown). Of the four, G6PD, BDNF, and MCSF responded to the transfected miRNAs. To further validate these targets, we used a second assay resembling the one described for targets of miRNAs expressed in HeLa cells, except that it took advantage of HeLa cell lines ectopically expressing either human miR-1 or human miR-1 30. Mutations in the miRNA target sites of all three of the genesthat had respondedto transfected miRNAs led to significantly increased reporter output in the lines expressing the cognate miRNAs, but not Mammalian microRNA Targets 793 A 3'UTR ORF llý l SMAD-1 I Firefly Luc+ gone WT 5'-f-Gr GCCU--- CUGGAA-l1lt-GUACUUGAA,, l*nt 11111 fl l 11111111 UCGGAUAGGACCUA-- Mutant 5'-hnt AUGAACUU ifnt-GAGCCUU--- 11111 11111 1111 H ill1 GAUAAUACUUGA( II 1111111 UCGGAUAGGACCUA--AUGAACUU UGCCU --- CUGGAA- 18ft-GUUCCUUAA UCGGAUAGGACCUA ----- (;AGCCUU- --- H ill GAUAAUUCGUUAC5nt-3* I IIII I I UCGGAUAGGACCUA--AUGAACUU AUGAACUU S't-3' mn-2" 19 rmwma 21 let-7 miRNA 100- 10. 7.9 S1.0 16 . 0.1 09 I 0.1 - + miR-1 - mtR-1 +miR-130 -mlR-130 Figure 3. Experimental Support for Predicted Targets (A) Schematic of a reporter construct used to evaluate the role of complementarity between miR-26a and the SMAD-1 3' UTR. The wild-type (WT) construct had a 106 nt fragment of the SMAD-1 UTR (green) containing two miR-26a target sites (blue) inserted within the firefly luciferase 3' UTR. The mutant construct was identical to the WT construct except that it had three point substitutions (red) disrupting pairing to each miR-26a seed. (B)Box plots showing the luciferase activity after reporter plasmids were transfected into HeLa cells. Reporters analogous to those depicted for SMAD-1 were constructed for the indicated target genes (Supplemental Figure S2 on Cell website). The UTR fragments often had two target sites to the indicated miRNA, and both were disrupted in the mutant reporters (exceptions were SDF-1, BRN-3b, G6PD, Deltal, Notchl, and BDNF, which each had three target sites, two of which were disrupted, and N-MYC, which had one of its two miR-101 sites disrupted). Firefly luciferase activity was normalized to Renilla luciferase activity of the transfection control plasmid and then normalized to the median activity of the corresponding WT reporter. Each box represents the distribution of activity measured for each WT (blue) and mutant (red) reporter (n = 12-15; ends of the boxes define the 25* and 75" percentiles, a line indicates the median, bars define the 106 and 90* percentiles, and the number indicates the median activity of the mutant reporter). Asterisks (*)denote instances in which differences between the WT and mutant were statistically significant (p < 0.001; Mann-Whitney test). Two pairs of constructs for C. elegans lin-41, a previously known target of let-7, were tested, one with a full-length and the other with a 124 nt UTR segment (f and s, respectively). Except for miR-1 and miR-130, the miRNAs were all endogenously expressed in the HeLa cells. Reporters corresponding to predicted targets of miR-1 and miR-130 (G6PD, BDNF, and MCSF) were each examined in a HeLa cell line stably expressing the relevant miRNA (+ miR-1 or + miR-130) and the parental cell line (- miR-1 or - miR-130). in the parental lines lacking the miRNAs (Figure 3B), as expected if these genes were authentic targets of the respective miRNAs. The levels of ectopically expressed miR-1 and miR-130 were comparable to those of endogenous miRNAs, as judged by Northern blot analysis (Lim et al., 2003b). For miR-1, Northern analysis with a synthetic miR-1 standard allowed accurate quantitation, revealing an average expression of 500 miR-1 molecules per cell. In sum, for 11 of the 15 cases tested, the sites identi- Cell 794 fled by TargetScan influenced expression of an up- stream ORF when expressed in the same cells as the corresponding miRNAs. Additional experiments in ani- mals will be neededto address the particular biological consequencesof these regulatory interactions, but the evolutionary conservation of the pairings suggests that they are important. All four of the remaininggenes might not be true targets; our statistical analysis using shuffled controls indicatedthat about 30% of predicted mammalian targets are likely to be false positives (Figure 2). Altematively, some might still be authentic targets whose regulation was not detected in our assays. Regulation would be missed in cases for which cell typespecific factors were required that were not expressed in HeLa cells, or in cases for which additional mRNA elements were required but were not included in the UTR segments used in our reporters. One limitation of the existing sequence databases that complicates the systematic identification of miRNA targets is that UTR annotations are often absent or incomplete. In order to compensatefor this limitation, we had extended each annotated 3' UTR with 2 kb of 3' flanking sequence. Using extended UTRs substantially increased the number of predicted targets, with signal-tonoise ratios at least as high as they were for unextended UTRs,suggesting that extension of the annotated UTRs allows detection of many additional authentic target genes. One consequence of using this UTR-extension protocol is that for some genes,all predicted target sites will fall outside of annotated UTRs. Manual inspection of the 15 UTR regions tested in our reporter assays revealed that in all but one of these cases the tested target sites were contained within regions whose status as UTRs was supported by known ESTsand predicted polyadenylation sites, even though some of these regions are not yet annotated as human UTRs. For the single exception, the Notchl gene, the tested target sites were all located downstream of the annotated 3' UTR of the human gene, and the end of the annotated Notch1 3' UTR was supported by a predicted polyadenylation site and alignment of multiple ESTs. However, Notchl might have additional 3' UTR isoforms; many human genes-perhaps as many as 50% or more of the genes in the genome-have alternative polyadenylation sites (Iseliet al., 2002).In order to investigatethe potential expression of the tested Notch1 target sites, which gave a positive result in our assay for miRNA regulation (Figure 3), an RT-PCR assay was used with polyAselected RNAfrom a pool of human tissues. Consistent with the possibility that these sites lie within an altemative UTR isoform of Notchl, an RT-dependent product of the correct size and sequence was observed (data not shown).The TargetScanset of predicted mammalian target genes (SupplementalTable S1B on the Cell website) undoubtedlycontains other examples for which the target sites all lie outside of the UTR regions supported by available data; some of these will be false positives, but others might point to the miRNAregulation of alternative mRNA isoforms. Human miRNAs Predominantly Are Negative Regulators of Gene Expression The finding that a sizable fraction of the tested UTR segments were sensitive to mutations disrupting their target sites supports the assertion that most of the predicted targets are authentic. For many, the pairing outside the seed was less extensive than that previously proposed for miRNAtargets (SupplementalFiguresSI A and Si B). Perhaps TargetScan is identifying mRNA ele- ments that are necessary but not sufficient for miRNA regulation. Alternatively, these elements might be sufficient, in which case their low information content raises the possibility that miRNAsmodulate the utilization of a substantial fraction of the mammalian mRNAs. In none of the 15 cases tested was there evidence of miRNA-mediated activation of reporter expression; changes either were not statistically significant or were in the direction of miRNA-directed repression. This re- sult suggests that mammalian miRNAs are generally negative regulators of gene expression, as has been observed for the known examples in invertebratesand plants (Lai, 2003; Bartel, 2004). Predicted Mammalian MicroRNA Targets Have Diverse Functions To assess target gene functions, we evaluated the frequency of specific gene ontology (GO)molecular function classifications (Gene Ontology Consortium, 2001) among the predicted targets of the nrMamm miRNAs and their shuffled control sequences(Table2). Predicted miRNAtargets populated many major GOfunctional categories, and for each of these categories, the number of targets for the real miRNAs greatly exceeded the average for the shuffled cohorts. Therefore, despite the presence of false positives among our predictions, the data in Table 2 strongly indicate that mammalian miRNAs are involved in regulation of target genes with a wide spec- trum of molecular functions. We also compared the proportion of genes that fell in each of the GO molecular function and GO biological process categories for the predicted targets of miRNAs, for targets of shuffled control sequences, and for the initial set of orthologous genes (Table 2 and Supplemen- tal Table S4 on Cell website).The targets of the shuffled cohorts were enriched relativeto the initial set of orthologous genes in certain GO biological process categories suchas development(14%versus 8%) andtranscription (13% versus 9%) (Table S4) and in molecular function categories such as nucleic acid binding (21% versus 14%), DNA binding (15% versus 10%), and transcriptional regulator activity (10% versus 6%) (Table 2). The biases seen for the shuffled cohorts are likely to result primarily from the TargetScan requirement for con- served segments in the 3' UTRs of predicted targets and may reflect differences in the occurrence of 3' UTR regulatory elements in different classes of genes. In the GO biological process classifications, the predicted regulatory targets of authentic miRNA genes were enriched in the developmentcategory but no more than the targets of shuffled controls and were substantially more enriched for genes involved in transcription (21% of miRNA targets versus 13% of shuffled targets versus 9% of the initial dataset) and regulation of transcription (21% versus 12% versus 8%) (Supplemental Table S4). In terms of the GO molecularfunction classifications, targets of authentic miRNAs were enriched in the categories DNA binding (20% versus 15% versus MammalianmicroRNATargets 795 Table2. MolecularFunctionClassificationof PredictedmiRNATargets GO ID GO:0005215 GO:0005515 GO:0016787 GO:0016740 GO:0016301 GO:0046872 GO:0003676 GO:0003677 GO:0030528 GO:0000166 GO:0004871 GO:0004872 MolecularFunction miRNAs None/unknown Knownfunction 115 285 Transporteractivity Proteinbinding Hydrolaseactivity Transferaseactivity Kinaseactivity Metal ion binding Nucleicacid binding DNAbinding Transcriptionreg.act. Nucleotidebinding Signaltransduceract. Receptoractivity 36 37 36 39 29 27 101 80 56 52 55 29 Meanof ShuffledCohorts All Orthologous Genes (29%) (71%) 45 77 (37%) (63%) 5131 9408 (35%) (65%) (9%) (9%) (9%) (10%) (7%) (7%) (25%) (20%) (14%) (13%) (14%) (7%) 14 11 12 10 6 5 26 18 12 10 12 5 (12%) (9%) (9%) (8%) (5%) (4%) (21%) (15%) (10%) (8%) (10%) (4%) 1441 1005 1502 1104 624 952 2072 1431 879 1172 1959 1351 (10%) (7%) (10%) (8%) (4%) (7%) (14%) (10%) (6%) (8%) (13%) (9%) Thenumberand percentageof genesannotatedwith variousGeneOntologymolecularfunctioncategoriesare shownfor targetsof nrMamm miRNAs,targets of shuffledcontrol miRNAs(mean of four cohorts),and for the initial set of orthologoushuman-mouse-rat genes.If GO categorieshavea parent-childrelationship,the child is indented.Becauseone genecan belongto multipleGO categories,the sum of the percentagesin eachcolumnis not interpretable. 10%), transcription regulatory activity (14% versus 10% versus 6%), and nucleotide binding (13% versus 8% versus 8%) (Table 2). The differing numbers of predicted targets in the similar-sounding categories "regulation of transcription" (GObiological process classification)and "transcription regulatory activity" (GO molecular function classification) prompted us to investigate the gene content of these two categories. Inspection of the lists of genes showed that all but two of the predicted target genes in the "transcription regulatoryactivity" category the periphery of the regulatory networks, where they regulate genes with a variety of molecular functions. The predicted mammaliantargets also differ from the plant targets with respect to biological function. Nearly all of the transcription factors (TFs)predicted to be plant miRNA targets have known or implied roles in plant devel- opment, as do several of the other predicted plant targets (Rhoades et al., 2002). By comparison, only -13% of predicted mammalian miRNA targets were involved in development according to the GO biological process were also included in the larger "regulation of transcrip- categories (Supplemental Table S4). An important ca- tion category," but that the latter category also contained more than two dozen additional target genes,the annotation of which generally supporteda role in control of transcription. The GO process category "regulation veat to this analysis is that gene annotation and GO categories are still evolving. Nonetheless,our data suggest that mammalian miRNAs are not exclusively, or of transcription" (Supplemental Table S4) therefore appears to provide a more complete listing of known and putative transcription factors. The proportion of the predicted mammalian miRNA target genes involved in the GO process categories "transcription" and "regulation of transcription" was significantly higherthan that seen for either shuffled targets or for the initial gene set (p < 0.001). Nonetheless, this bias was much lower in magnitude than that seen in plants: of the 49 targets predicted in a systematic search for complementarity to plant miRNAs, 69% were mem- bers of transcription factor gene families (Rhoadeset al., 2002).Examplesof other types of predicted mammalian targets include translational regulators (e.g., COP9 subunit 6, ERF1), regulators of mRNA stability (e.g., HUAntigen D), structural proteins (e.g., collagen), and enzymes (e.g., G6PD). The set of predicted miRNA targets conserved across all four vertebrates (SupplementalTable S5 online) was also somewhat biased toward genes involved in transcription, but had annotated functions consistent with the broad array of biological activities seen for the larger mammalian target set. We conclude that although mammalian miRNAs are sometimes at the center of gene regulatory networks, where they regulate genes,such as transcription factors, that regulate other genes, they are more likely than plant miRNAs to be at even primarily, involved in the traditional miRNA role of developmental control. Instead, we find evidence for miRNA regulation of a very broad diversity of biological processes. ExperimentalProcedures MicroRNADatasets HumanandmousemiRNAsequencesthat satisfyestablishedcriteria (Ambroset al., 2003a)weredownloadedfrom the Rfamwebsite (http://www.sanger.ac.uk/Software/Rfam). Human miRNAs that lackedannotatedmouseorthologsand mousemiRNAsthat lacked annotatedhumanorthologswere searchedagainstthe mouseand humangenomes,respectively,with BLASTN(Altschulet al., 1997) and MiRscan(Limet al., 2003a,2003b).To identifyFuguhomologs, the humanmiRNAswere searchedagainsttheFugu genomeusing BLASTNand MiRscan,and the 121 humanmiRNAswith perfectly homologousmiRNAsin mouseand clear homologousmiRNAsin Fugu were assignedto rMamm. For sets of human miRNAsin rMammwith identicalseed heptamers,a singlerepresentativewas chosen, yielding 79 human miRNAs(nrMamm).The choice was basedon conservationto FuguandC.elegansmiRNAswhenpossible (i.e., the sequencemost broadlyconservedwas chosen),but was otherwiseessentiallyarbitrary(themiRNAwith the lowestmir-# was generallychosen).Thesubsetof 55miRNAsfromnrMammthat had perfectconservationto Fugu wereassignedto nrVert. 3' UTRDatasets 3' UTR sequencesfor all humangenes,and all mouse,rat, and Fugugenesassociatedwitha humanortholog,wereretrievedusing Cell 796 Annotated EnsMartversion15.1(http://www.ensembl.org/EnsMart). 3' UTRsequenceswereavailablefor only 45%of rat genesin this set and for noneof the Fugugenes.Moreover,14%of annotated rat 3' UTRsequenceswerelessthan 50nucleotidesin length.Therefore, we extendedeach annotated3' UTRwith 2 kb of 3' flanking sequence.Repetitiveelementswere maskedin these sequences (Smit,A.FA. andGreen,P.,http/repeatmasker. usingRepeatMasker with repeat libraries genome.washington.edu/cgi-bin/RM2_req.pl) for primates,rodents,or vertebrates,as appropriate. Identificationof miRNATarget Sites The3' UTRsequencesweresearchedfor antisensematchesto the designatedseed region of each miRNA(e.g., bases 2..8 starting from the 5' end).Our choice of a 7 nt seed was motivatedby the observationthat shorterseedsgavesubstantiallylowersignal:noise ratios,while longerseedsreducedthe numberof predictedtargets at comparablesignal:noiseratios.Becausechangingthe sizeof the seed has a largeeffect on the noise as well as the signal,these observationsaremuchmoredifficultto interpretintermsof potential mechanisticimplicationsthan the "sliding seed"data of Figure2B. For seeds locatedon the 5' portion of the miRNA,35 nt flanking the seedmatchon the 5' end and 5 nt flankingthe seed matchon the 3' end were retrieved(a "mirror" versionof this algorithmwas used for 3' seedsin the experimentdescribedin Figure2B).Target sites in whichthe 35nt flankingregioncontainedmaskedbasesor the seed matchoccurredlessthan 20 nt downstreamof a previous seed matchwere discarded.Basepairingbetweenthe miRNAseed and UTRwas extendedwith additionalflankingbasepairsas far as possiblein both directions,allowingG:Upairs butdisallowinggaps. The basepairingpatternof the remaining3' end (or in the case of a 3' seed, the remaining5' end)was predictedby runningRNAfold on a foldback sequenceconsisting of an artificial stemloop (5'where"L" is an anonyGGGCCCGGGULLLLLLACCCGGGCCC-3', mous unpairedloop character,and all other basesare pairedto a complementarybaseon the oppositeside of the stem)attachedto the extendedseed match. RNAfoldoptimizationwas constrained so that allbasepairsfoundin previousstepswerefixed,thestructure of the artificialstem was fixed, and basesin the miRNAand UTR wereallowedto pair onlywith basesin theUTRandmiRNA,respectively. Thestemloopwas removed,and RNAevalwas usedto estimatetheenergyof the miRNAUTRduplexformedby the basepairs determinedin the previoussteps. ParameterOptimization Trainingsets were constructedwith 40 randomlychosenmiRNAs from nrMammand 27 randomlychosenmiRNAsfrom nrVert.The remainingmicroRNAswere assignedto the nrMammand nrVert referencesets. TargetScanwas tested on the training sets with variousparametervalues:Twas variedfrom 5 to 25 in increments of 5, Zc was variedbetween1 and 10 in incrementsof 0.5, andRc wasvariedbetween50and 1000in incrementsof 50.Theparameters T = 20,Zc = 4.5,Rc= 200werefoundto give anoptimalsignal:noise of 3.4:1for the nrMammtrainingset.WhenRcwas raisedto 300or Zc was loweredto 4, the signal:noisedecreasedonly moderatelyto -3:1. The parametersT = 10, Zc = 4.5, Rc = 350 were found to give anoptimalsignal:noiseof 4.6:1for the nrVerttrainingset used with UTRsets from all four genomes.For both the nrMammand nrVertsets,the signal:noiseratiosobtainedusing the trainingsets did notdiffersignificantlyfromthe correspondingsignal:noiseratios obtained using the referencesets, and thus results from the two sets were merged. Generationof RandomlyPermutedSequences Foreach miRNAin nrMamm,randomlypermutedsequenceswith the same startingbase, length,and base compositionas the real miRNAwere generateduntil four sequenceswere foundthat deviated from the originalmiRNAby less than 15% in the following properties:(1)E(SM),the 1" order Markov probabilityof the seed match,(2)E(rM), the 1" order Markovprobabilityof the antisense of the 3' end of the miRNA(or the 5' end in the caseof a 3' miRNA seed),(3)O(SM),the observedcount of seed matchesin the UTR dataset,and (4)the predictedfolding free energyof a seed:seed matchduplex.Fora miRNA(or shuffledmiRNA)with the initial se- quence SI,S 2 ,SS,S4,SS,S 7,Ss, and the seed designatedas bases s,s,.Pss,) .Ps where .Psiss.PsS, P 2..8,E(SM)was equalto (PsPe, PSiSA,was the conditionalfrequencyof the nucleotideSk+ givenSk at the previousposition in the set of inversecomplementsof the UTRsin the UTRdatabase.E(TM)wastheanalogousquantitycalculatedfor the remainderof the sequence(i.e.,for bases9,10,11, ... to the end of the miRNAor shuffledmiRNA).O(SM)was determined directly from heptamercounts in the UTRdataset.The predicted folding free energyof a seed:seedmatchduplex was determined usingRNAeval.TheDiMirShuffleprogramgeneratedshuffledcontrols for a givenmiRNAsequenceby shufflingthe dinucleotidesof the specifiedmiRNAseed (e.g.,bases2..8of the miRNA). DNAConstructs Thefireflyluciferasevectorwasmodifiedfrom pGL3ControlVector (Promega),suchthat a shortsequencecontainingmultiplecloning was sites (5'-AGCTCTATACGCGTCTCAAGCTTACTGCTAGCGT-3') insertedinto the Xbal site immediatelydownstreamfromthe stop codon.3'UTRsegmentsof the targetgeneswereamplifiedby PCR from human genomicDNA and inserted into the modified pGL3 vectorbetweenSacl andNhelsites. PCRwith the appropriateprimers also generatedinserts with point substitutionsin the miRNA complementarysites.Wild-typeandmutantinsertswereconfirmed FigureS2 online). by sequencingand are listed (Supplemental Transfectionsand Assays AdherentHeLaS3 cellsweregrownin 10%FBSin DMEM,supplementedwith glutaminein the presenceof antibiotics,to 80%-90% confluencyin 24-wellplates.Cellsweretransfectedwith 0.4 Itg of of thecontrolvector thefireflyluciferasereportervectorand0.08 ALg containingRenilla luciferase,pRL-TK(Promega),in a final volume of 0.5ml using Lipofectamine2000(Invitrogen).Fireflyand Renilla luciferaseactivities were measuredconsecutivelyusingthe Dualluciferaseassays(Promega)30 hr after transfection.Eachfirefly plasmidwas testedin 12-15transfections(fouror five independent experiments,each withthree culturereplicates)involvingtwo independentplasmid preparations(six to ninetransfectionseach).A HeLacell line that constitutivelyexpressedmiR-1froma pol-ll promoterwas created usinga derivativeof the retroviralvectorpRevTRE(Clontech)containinga 500bpfragmentof humanmir-ld gene. A HeLaS3cell line that constitutivelyexpressedmiR-130fromthe H1 pol-Ill promoterwas constructedusinga retroviralvectorcontaining a 330 nt fragmentof the humanmir-130geneand a GFP kinasepromoter,which geneunderthe murine3-phosphoglycerate servedas an infectionmarker(Chen,et al., 2003).Cellsexpressing GFP followinginfectionwere enrichedto 95% purity by FACS. Analysisof Gene Ontologies Geneontologieswere assignedto humangenesfrom the Ensembl Ensemblidentifierswith GO identifidatabaseby crossreferencing ers usingEnsMartversion15.1 (http://www.ensembl.org/EnsMart). TheGeneOntologyConsortiumdatabasewasretrievedfrom http:// and function and processontologieswere www.geneontology.org compiledfor all predictedtarget genes.In additionto the assigned categories,each genewas consideredas havingall more general categorieswithinthe "MolecularFunction"and "BiologiC'("parent") cal Process"ontologies.In Tables2 and S4,sets of GOcategories wereselectedthat were both broadenoughto containa significant fractionof the predictedtargetsandspecificenoughto bemeaningful. Becausethe GO descriptionsare not mutuallyexclusive,the sum of the percentagesin these tables is not interpretable.GO categorieswere also usedto producethecategoriesin Table1. To be includedin a category,a genehad to be annotatedwith at least one out of a set of GOcategories.Thesets of GOcategoriesused GO: were: regulationof transcription/DNAbinding (GO:0003700, GO:0016563,or GO:0045449)and signal 0003713,GO:0003714, transduction/cell-cell signaling (GO:0004871, GO:0004872, or GO:0008083). GO:0007267 GO:0007154, GO:0007165, Acknowledgments We thank W.K.Johnston for technicalassistance,C-Z. Chenand L.P. Lim for helpful discussions,H.F. Lodish for use of facilities MammalianmicroRNATargets 797 and equipment,N.C.Lau for the miR-1-expressing cell line, and G. Ruvkunfor plasmidsused to construct the lin-41 reporters.Supportedby grantsfromtheN.I.H(D.P.B.and C.B.B.),theSearleScholars Program(C.B.B.),and theAlexanderand MargaretStewartTrust (D.P.B.),andfellowshipsfromthe DOE(B.P.L.)and the CancerResearchInstitute(I.S.). Received:November18, 2003 Revised:December3, 2003 Accepted:December4, 2003 Published:December24, 2003 References Abrahante,J.E.,Daul,A.L.,Li, M.,Volk,M.L.,Tennessen,J.M.,Miller, EA, and Rougvie,A.E.(2003).TheCaenorhabditiseleganshunchback-likegenelin-57/hbl-1controlsdevelopmental timeand isregulated by microRNAs.Dev.Cell4, 625-637. Altschul, S.F.,Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W.,and Lipman,D.J.(1997).GappedBLASTand PSI-BLAST: a new generationof protein databasesearchprograms.Nucleic Acids Res.25, 3389-3402. Ambros,V., Bartel, B., Bartel, D.P.,Burge,C.B.,Carrington,J.C., Chen,X., Dreyfuss,G., Eddy,S.,Griffiths-Jones,S., Matzke,M., et al. (2003a).A uniform system for microRNAannotation.RNA9, 277-279. Ambros,V.,Lee, R.C.,Lavanway,A., Williams,P.T.,and Jewell,D. (2003b).MicroRNAsand othertiny endogenousRNAsin C.elegans. Curr.Biol. 13,807-818. Aravin,A.A.,Naumova,N.M.,Tulin,A.A.,Rozovsky,Y.M.,andGvozdev,VA. (2001).Double-stranded RNA-mediated silencingof genomic tandemrepeatsandtransposableelementsin Drosophilamelanogastergermline.Curr.Biol. 11, 1017-1027. Aukerman,M.J., and Sakai,H. (2003).Regulationof floweringtime andfloralorganidentityby a MicroRNAand itsAPETALA2-like target genes.Plant Cell 15,2730-2741. Bartel,D.P.(2004).MicroRNAs:genomics,biogenesis,mechanism, and function.Cell,in press. Brennecke,J., Hipfner,D.R.,Stark,A., Russell,R.B.,and Cohen, S.M. (2003).bantam encodes a developmentallyregulatedmicroRNAthat controls cell proliferationand regulatesthe proapoptotic genehid in Drosophila.Cell 113,25-36. Chen,C.-Z.,Li, L., Lodish,H.F.,and Bartel,D.P.(2003).MicroRNAs modulatehematopoieticlineagedifferentiation.Science,in press. 091903. Publishedonline December4, 2003.10.1126/science.1 Chen,X. (2003).A MicroRNAas a translationalrepressorof APETALA2in arabidopsisflowerdevelopment.Science.Publishedonline September11, 2003.10.1126/science.1 088060. Consortium,TheGeneOntology.(2001).Creatingthe geneontology resource:designand implementation. GenomeRes.11,1425-1433. Doench,J.G., Peterson,C.P.,and Sharp, P.A.(2003).siRNAscan function as miRNAs.GenesDev.17, 438-442. Dostie,J., Mourelatos,Z., Yang,M., Sharma,A., and Dreyfuss,G. (2003).NumerousmicroRNPsin neuronalcells containingnovelmicroRNAs.RNA9, 631-632. Emery,J.F.,Floyd,S.K.,Alvarez,J., Eshed,Y., Hawker,N.P.,Izhaki, A.,Baum,S.F.,and Bowman,J.L.(2003).Radialpattemingof Arabidopsis shoots by class III HD-ZIPand KANADIgenes.Curr. Biol. 13,1768-1774. Hofacker,I.L., Fontana,W., Stadler, P.F.,Bonhoeffer,S., Tacker, M., and Schuster,P. (1994).Fast folding and comparisonof RNA secondarystructures.Monatsheftefur Chemie125,167-188. Houbaviy,H.B.,Murray, M.F.,and Sharp, PA. (2003).Embryonic stem cell-specificMicroRNAs.Dev.Cell5, 351-358. Iseli,C., Stevenson,B.J.,de Souza,S.J.,Samaia,H.B.,Camargo, A.A.,Buetow,K.H.,Strausberg,R.L.,Simpson,A.J.,Bucher,P.,and Jongeneel,C.V.(2002).Long-rangeheterogeneityat the 3' ends of humanmRNAs.GenomeRes. 12,1068-1074. Kasschau,K.D.,Xie,Z., Allen,E.,Uave, C., Chapman,E.J.,Krizan, K.A.,and Carrington,J.C. (2003).P1/HC-Pro,a viral suppressorof RNAsilencing,interfereswithArabidopsisdevelopmentandmiRNA function.Dev. Cell4, 205-217. Krichevsky,A.M., King,K.S.,Donahue,C.P.,Khrapko,K.,and Kosik, K.S. (2003).A microRNAarray revealsextensiveregulationof microRNAsduringbraindevelopment.RNA9, 1274-1281. Lagos-Quintana,M., Rauhut, R., Lendeckel,W., and Tuschl, T. (2001).Identificationof novel genescoding for small expressed RNAs.Science294, 853-858. Lagos-Quintana,M., Rauhut,R., Yalcin,A., Meyer,J., Lendeckel, W.,andTuschl,T. (2002).Identificationof tissue-specificmicroRNAs from mouse.Curr.Biol. 12,735-739. Lagos-Quintana,M., Rauhut, R., Meyer, J., Borkhardt,A., and Tuschl,T. (2003).New microRNAsfrom mouse and human. RNA 9, 175-179. Lai, E.C. (2002).MicroRNAsare complementaryto 3'UTR motifs that mediate negativepost-transcriptionalregulation.Nat. Genet. 30, 363-364. Lai,E.C.(2003).MicroRNAs:runtsof thegenomeassertthemselves Curr.Biol. 13,R925-R936. Lai, E.C.,Tomancak,P., Williams,R.W.,and Rubin,G.M. (2003). Computationalidentificationof DrosophilamicroRNAgenes.GenomeBiol. 4:R42,1-20. Lau, N.C., Lim, L.P.,Weinstein,E.G.,and Bartel,D.P. (2001).An abundant class of tiny RNAswith probable regulatoryroles in Caenorhabditiselegans.Science294, 858-862. Lee, R.C.,and Ambros,V. (2001).Anextensiveclassof smallRNAs in Caenorhabditiselegans.Science294, 862-864. Lee, R.C.,Feinbaum,R.L.,and Ambros,V. (1993).The C. elegans heterochronicgenelin-4 encodessmall RNAswith antisensecomplementarityto lin-14.Cell75, 843-854. Lim, L.P., Glasner,M.E., Yekta, S., Burge,C.B., and Bartel, D.P. (2003a).VertebratemicroRNAgenes.Science299, 1540. Lim, L.P., Lau, N.C.,Weinstein,E.G.,Abdelhakim,A., Yekta, S., Rhoades,M.W.,Burge,C.B.,and Bartel,D.P. (2003b).The microRNAsof Caenorhabditiselegans.GenesDev. 17,991-1008. Lin, S.Y.,Johnson,S.M.,Abraham,M., Vella,M.C.,Pasquinelli,A., Gamberi,C., Gottlieb, E.,and Slack,F.J. (2003).The C. elegans hunchbackhomolog,hbl-1, controlstemporalpattemingand is a probablemicroRNAtarget. Dev.Cell4, p639-p650. Llave,C.,Xie,Z., Kasschau,K.D.,and Carrington,J.C.(2002).Cleavageof scarecrow-likemRNAtargetsdirectedby a classof Arabidopsis miRNA.Science297, 2053-2056. Moss,E.G.,Lee,R.C.,andAmbros,V. (1997).Thecold shockdomain protein LIN-28controlsdevelopmentaltiming in C. elegansand is regulatedby the lin-4 RNA.Cell88, 637-646. Moss,E.G.,and Tang,L. (2003).Conservationof the heterochronic regulatorLin-28,its developmental expressionand microRNAcomplementarysites. Dev.Biol. 258, 432-442. Mourelatos,Z., Dostie,J., Paushkin,S.,Sharma,A., Charroux,B., Abel,L.,Rappsilber,J., Mann,M.,andDreyfuss,G.(2002).miRNPs:a novelclass of ribonucleoproteins containingnumerousmicroRNAs. GenesDev. 16,720-728. Nussinov,R. (1981).Nearestneighbornucleotidepatterns.Structural and biologicalimplications.J. Biol. Chem.256,8458-8462. Palatnik,J.F.,Allen,E.,Wu,X.,Schommer,C.,Schwab,R.,Carrington, J.C., and Weigel,D. (2003).Controlof leaf morphogenesisby microRNAs.Nature20, 257-263.PublishedonlineAugust20,2003. 10.1038/nature01 958. Pasquinelli,A.E.,Reinhart,B.J.,Slack,F.,Martindale,M.Q.,Kuroda, M., Mailer,B.,Srinivasan,A., Fishman,M., Hayward,D.,Ball, E.,et al. (2000).Conservationacross animalphylogenyof the sequence and temporal regulationof the 21 nucleotidelet-7 heterochronic regulatoryRNA.Nature408,86-89. Reinhart,B.J.,Slack,F.J., Basson,M., Bettinger,J.C., Pasquinelli, A.E., Rougvie,A.E., Horvitz,H.R.,and Ruvkun,G. (2000).The 21 nucleotidelet-7 RNAregulatesdevelopmental timingin Caenorhabditis elegans.Nature403, 901-906. Cell 798 Rhoades,M.W.,Reinhart,B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel,D.P. (2002).Predictionof plant microRNAtargets. Cell 110,513-520. Stark,A.,Brennecke,J., Russell,R.B.,andCohen,S.M.(2003).Identificationof DrosophilamicroRNAtargets.PLOSBiol.,in press.PublishedonlineOctober13, 2003.10.1371/joumal.pbio.0000060. Tang, G., Reinhart,B.J., Bartel,D.P.,and Zamore,P.D. (2003).A biochemicalframeworkfor RNAsilencing in plants. Genes Dev. 17,49-63. Wightman,B., Ha, I., and Ruvkun,G. (1993).Posttranscriptional regulationof the heterochronicgenelin-14bylin-4 mediatestemporal patternformationin C. elegans.Cell75, 855-862. Xu,P.,Vemooy,S.Y.,Guo,M.,and Hay,B.A.(2003).TheDrosophila MicroRNAMir-14 suppressescell deathand is requiredfor normal fat metabolism.Curr.Biol. 13,790-795.