MAY 2 2011 LIBRARIES 5

advertisement
Examination of mammalian microRNAs by high-throughput sequencing
By
HyoJin Rosaria Chiang
B.S., Molecular Biophysics and Biochemistry and Economics (2005)
Yale University
SUBMITTED TO THE DEPARTMENT OF BIOLOGY IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASSACHUsETS INSTIE
OF TECHNOLOGY
DOCTOR OF PHILOSOPHY
AT THE
MAY 2 5 2011
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2011
LIBRARIES
ARCHIVES
@ 2011 Massachusetts Institute of Technology
All rights reserved
Signature of Author................................
Department of Biology
May 17, 2011
Certified by ...................................................
David P. Bartel
Professor of Biology
Thesis Supervisor
Accepted by.............................................
. 1.
........
Alan D. Grossman
Professor of Biology
Chairman, Graduate Committee
Examination of mammalian microRNAs by high-throughput sequencing
By
HyoJin Rosaria Chiang
Submitted to the Department of Biology on May 17, 2011
In Partial Fulfilment of the Requirements for the Degree of Doctor of Philosophy
ABSTRACT
Small non-coding RNAs play an important role in a wide range of cellular events.
MicroRNAs (miRNAs) are an abundant class of small RNAs that post-transcriptionally
repress expression of their target genes. Since miRNA targeting is based on its sequence,
accurate and comprehensive annotation of miRNA genes is fundamental to understanding
miRNA gene regulation.
Advances in high-throughput sequencing technology have led to discoveries of novel
small RNA genes and identifications of their properties. We describe a method for
construction of small-RNA library for Illumina sequencing platform that improves upon
previous efforts. Sequencing data from small-RNA libraries constructed using this
protocol can be used to profile small RNAs from a broad range of samples.
In particular, we sequenced 60 million small RNAs from mouse brain, ovary, testes,
embryonic stem cells, three embryonic stages, and whole newborns. The analysis of the
data provide a substantially revised list of confidently identified murine miRNAs, thereby
providing a more accurate picture of the general features of mammalian miRNAs and
their abundance in the genome. In addition, our results revealed new aspects of miRNA
biogenesis and modification, including tissue-specific strand preferences, sequential
Dicer cleavage of a metazoan pre-miRNA, cases of consequential 5' heterogeneity, newly
identified instances of miRNA editing, and widespread pre-miRNA uridylation
reminiscent of Lin28-like miRNA regulation.
Thesis Advisor: David P. Bartel
Title: Professor of Biology
I would like to thank my collaborators who have studied murine miRNAs with me,
especially Lori Schoenfeld, Wendy Johnston, Noah Spies, and Vincent Auyeung. I would
especially like to thank Graham Ruby for introducing me to computational biology and
Daehyun Baek for advices on both scientific and personal endeavors.
I would like to thank the members of the Bartel lab for their support and discussion,
especially Calvin Jan, Andrew Grimson, Mike Axtell, Anna Drinnenberg, Sue-Jean
Hong, Gina Lafkas, Huili Guo, and Vikram Agarwal.
I would like to thank my committee members for guiding me through my graduate career:
Phil Sharp, Richard Hynes, Chris Burge, and Nelson Lau. I would especially like to thank
my advisor Dave Bartel for his guidance and patience throughout the years even as I
sometimes struggled to find my way.
I would like to thank my classmates in the biology program for the exciting first year in
the Pit and friendships throughout the years. I would especially like to thank Leah
Okumura, Jen Leslie, Robin Stevens, and Jadyn Damon.
I would like to thank the members of the Sidney-Pacific graduate community, in
particular the past and present SPEC+ members for being my family away from the
family: Swati Mohan, Wendy Iskendarian, Michelle Sanders, Jane Kim, Robert Wang,
Matt Eddy, Ben Mares, Alex Lewis, Roger and Dottie Marks, Annette Kim, Roland
Tang, and Joshua Tang.
I would like to thank my friends who have supported me throughout the years: George
Burkhard, Jinhee Chung, Jane Huh, and Jennie Johnson.
I would like to thank my family for their support throughout the years, particularly for
believing in me and supporting my decision to study abroad. I would especially like to
thank my brother HyoSang Chiang for proofreading this thesis.
Finally, I would like to thank my fiance Nan Gu for our times together, for more years to
come, and for showing me that regardless of the path I choose in life, I do not have to
walk it alone.
Table of Contents
Chapter I
Chapter 2
Chapter 3
Chapter 4
Appendices
Abstract
Acknowledgements
Table of contents
Introduction
Method for construction of small RNA libraries for Illumina
high-throughput sequencing platform
Mammalian microRNAs: experimental evaluation of novel
and previously annotated genes
Future directions
Appendix A-D
3
5
7
9
41
59
129
141
Chapter tables of contents
Chapter 1
Introduction
Discovery of microRNAs
Canonical miRNA biogenesis
Transcription of pri-miRNAs
Nuclear processing of pri-miRNAs by Microprocessor
Nuclear export of pre-miRNAs by Exportin-5
Cytoplasmic processing of pre-miRNAs by Dicer
RISC loading
Noncanonical miRNA biogenesis
MicroRNA function
Global miRNA gene discovery
Computational prediction of miRNA genes
MicroRNA gene discovery by second-generation sequencing
State of miRNA annotations
Summary
Figure legends
References
Figures
Chapter 2
Method for construction of small-RNA libraries for
Illumina high-throughput sequencing platform
Abstract
Introduction
Method
Overview
Protocol
Concluding remarks
Figure legend
References
Figure
9-39
10
13
13
14
16
17
18
19
20
21
21
24
27
28
28
29
38
41-58
42
42
43
43
44
55
56
56
58
Chapter 3
Mammalian microRNAs: experimental evaluation
of novel and previously annotated genes
Abstract
Introduction
Results
MicroRNA gene discovery
Experimental evaluation of unconfirmed microRNAs
Experimental evaluation of novel microRNAs and new
candidates
MicroRNA expression profiles
General features of mammlian microRNAs
MicroRNAs processed from both arms, with occasional
tissue-specific differences in the preferred arm
Sequential Dicer cleavage of a mirtron hairpin
5' Heterogeneity
RNA editing
Untemplated nucleotide addition
Discussion
The status of microRNA gene discovery in mammals
Unknown features required for Drosha/Dicer processing
Methods
Figure legends
Acknowledgements
References
Figures and tables
Supplemental figures and tables
59-127
60
60
63
64
65
71
73
73
76
77
79
81
83
84
84
88
89
95
99
99
105
114
Chapter 4
Future directions
MicroRNA gene annotations
MicroRNA gene discovery by sequencing
Computational prediction of miRNAs
MicroRNAs mapping to multiple loci
MicroRNA isoforms
Dicer-independent and Ago2-dependent miRNAs
Arm-switching miRNAs
De novo prediction of piRNA clusters
Acknowledgements
References
129-140
130
131
132
134
136
136
137
138
139
139
Appendices
Appendix A-D
Appendix A
Appendix B
Appendix C
Appendix D
141-184
143
155
161
171
Chapter 1
Introduction
The word "gene" was coined by Wilhelm Johannsen to describe a unit of heredity
(Johannsen 1911). Subsequently, the "one gene, one enzyme" hypothesis was proposed,
suggesting that a single gene encodes one protein (Beadle and Tatum 1941). With the
identification of DNA as a carrier of genetic material (Avery et al. 1944), the definition
of a gene evolved to a DNA segment in the genome that encodes a protein. However, the
discovery of functional non-coding RNAs, such as ribosomal RNA (rRNA) and transfer
RNA (tRNA), further broadened this definition to include genomic regions that encode
non-coding RNAs.
While small non-coding RNAs smaller than tRNA were generally thought to be
degradation fragments, the discovery of RNA interference (RNAi) has shifted this
perspective. Gene silencing by RNAi was originally identified as "cosuppression" in
plants (Napoli et al. 1990; van der Krol et al. 1990) and subsequently identified in
animals (Fire et al. 1998). When a double-stranded RNA (dsRNA) is introduced into a
cell, the dsRNA is cleaved into small RNAs of -22 nucleotides (nts), known as small
interfering RNAs (siRNAs) (Zamore et al. 2000; Bernstein et al. 2001). The siRNA can
guide the RNAi machinery to its target transcript (Elbashir et al. 2001). Although
endogenous siRNAs (endo-siRNAs) are present in nematodes and insects (Ambros et al.
2003b; Czech et al. 2008; Ghildiyal et al. 2008), introduction of dsRNAs triggers
interferon response in the majority of mammalian cell types (Sen and Sarkar 2007).
Germline cells, however, do not activate interferon response in the presence of dsRNA
(Svoboda et al. 2000), and sequencing small RNAs from mouse oocytes and ES cells
revealed that they contain endo-siRNAs (Babiarz et al. 2008; Tam et al. 2008; Watanabe
et al. 2008).
Another class of small RNAs, known as PIWI-interacting RNAs (piRNAs), is
also present in gonadal cells (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006;
Lau et al., 2006). One subclass of piRNAs maps to repetitive regions of the genome, and
they have been associated with suppressing transposons (Aravin et al., 2007; Brennecke
et al., 2007). The second class of piRNAs maps to non-repetitive regions, and they are
abundant in the pachytene stage of meiosis, but their roles have not yet been clarified
(Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006; Lau et al., 2006).
The most abundant and ubiquitous class of small RNAs is microRNAs
(miRNAs). MicroRNA genes give rise to ~22 nt non-coding RNAs that can posttranscriptionally regulate gene expression (Bartel 2004). These RNAs play a role in a
wide range of biological events, such as stem cell self-renewal, differentiation,
proliferation, immunity, and cancer (Huang et al. 2010). Since its first discovery in 1993,
computational and experimental methods have been used to annotate more than 15,000
miRNA genes in miRBase, a miRNA database (Griffiths-Jones 2004; Griffiths-Jones et
al. 2006). This chapter reviews animal miRNA biogenesis, function, and discovery,
focusing on those in mammals.
Discovery of microRNAs
The first miRNA gene lin-4 was identified through a genetic screen for cell lineage
("heterochronic") aberrations in Caenorhabditiselegans (Horvitz and Sulston 1980;
Chalfie et al. 1981). Animals with loss-of-function (LOF) mutations in lin-4 contained
cells that repeated a larval developmental program similar to animals with gain-offunction (GOF) mutations in lin-14 (Chalfie et al. 1981; Ambros and Horvitz 1984).
Subsequently, it was discovered that lin-14 was required for manifestation of lin-4 LOF
mutation and that lin-4 LOF animals had higher lin-14 activity (Ambros 1989). These
results suggested that lin-4 was a negative regulator of lin-14 (Ambros 1989; Ruvkun and
Giusto 1989). When lin-14 was cloned and its two GOF mutants were analyzed, it was
revealed that the mutations mapped to the 3' untranslated region (UTR) (Ruvkun et al.
1989; Wightman et al. 1991) Together, these findings led to the hypothesis that the gene
product of lin-4 may directly bind to or activate a factor that binds to a regulatory element
in the 3' UTR of lin-14 to inhibit LIN14 protein production (Arasu et al. 1991; Wightman
et al. 1991). When lin-4 was cloned, however, its gene products turned out to be
untranslated RNA molecules of 22 and 61 nts instead of a protein (Lee et al. 1993). A
concurrent study found that the negative regulation of lin-14 by lin-4 was conserved to
Caenorhabditisbriggsae (Wightman et al. 1993). The analysis of the conserved
sequences in the lin-14 3' UTRs of C. elegans and C. briggsae revealed that there are
multiple regions in the lin-14 3' UTR that are complementary to the lin-4 RNA sequence
(Wightman et al. 1993). These results favored the model in which the lin-4 RNA binds to
the 3' UTR of the lin-14 mRNA to negatively regulate LIN14 production through
inhibition of post-transcriptional processing, transport, or translation (Lee et al. 1993;
Wightman et al. 1993). Shortly thereafter, lin-28 was identified to be another gene that
was regulated through its 3' UTR by lin-4 (Moss et al. 1997).
While discovery of the lin-4 gene product and its inhibition of lin-14 and lin-28
via 3' UTR provided a novel paradigm for gene regulation by small RNAs, it was not
until 2000 that the second non-coding small RNA gene with similar properties was
discovered (Reinhart et al. 2000; Slack et al. 2000). Like lin-4, let-7 was identified
through a genetic screen for heterochronic genes in C. elegans (Reinhart et al. 2000).
When let-7 was mapped, no protein-coding genes could be predicted from the sequence.
Instead, a 21 nt RNA transcript was detected by Northern blot (Reinhart et al. 2000).
Given the precedent of lin-4 regulation of other heterochronic genes, the 3' UTRs of
heterochronic genes were examined for complementarity to the let-7 RNA sequence. One
of the predicted targets, lin-41, was experimentally shown to be regulated by let-7
(Reinhart et al. 2000; Slack et al. 2000).
Unlike lin-4, which is only present in nematodes, let-7 is widely conserved to
other animals (Pasquinelli et al. 2000). This observation led to anticipation of more
discoveries of stage-specific small endogenous RNAs that control development, and lin-4
and let-7 became the founding members of "small temporal RNAs" (stRNAs).
With the goal of identifying other stRNAs, the Ambros, Bartel, and Tuschl labs
led the efforts to clone small RNAs from C. elegans, Drosophilamelanogaster,and HeLa
cells (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). The results of
these studies revealed that there were many more small RNA genes that resembled lin-4
and let-7. These genes mapped to the regions in the genome that could fold into stable
stem-loop structures, and the longer precursor of ~65 nts could be detected by Northern
blot for some of the genes. Although both lin-4 and let-7 mapped to the 5' arm of their
hairpin precursors, the newly cloned small RNAs could come from either the 5' or the 3'
arm. Like let-7, many of these miRNAs were conserved to other species, suggesting that
they could have conserved biological functions. Unlike lin-4 and let-7, however, many of
the small RNAs did not exhibit specific temporal expression, and some were expressed
only in specific cell types. Consequently, these small RNAs were renamed miRNAs.
Canonical miRNA biogenesis
MicroRNAs mature through three intermediates: primary miRNA transcript (primiRNA), precursor miRNA (pre-miRNA), and miRNA:miRNA* duplex (Figure 1A, top)
(Lee et al. 2002). A pri-miRNA folds into a hairpin with -33 base-pair (bp) stem after
transcription, and it is cleaved in the nucleus to produce -65 nt pre-miRNA. The resulting
hairpin is then exported to the cytoplasm and is cleaved into -22 nt miRNA. The mature
miRNA is loaded into RNA-induced silencing complex (RISC) whereas the passenger
strand (miRNA*) dissociates from the complex and is degraded.
Transcription of pri-miRNAs
When the first miRNA gene products were identified, two RNA species of ~22 nts and
-65 nts were detected, but the question remained whether they were derived from a
longer transcript. A number of miRNAs mapped to introns of protein-coding genes, and
these miRNAs were thought to be processed from the pre-mRNAs of host genes
(Rodriguez et al. 2004; Baskerville and Bartel 2005). The majority of miRNAs, however,
did not overlap previously annotated genes. When the sequences of the miRNAs were
compared to the mammalian cDNA databases, the presence of expressed sequence tags
(ESTs) overlapping miRNAs suggested that the -65 nt precursor may be processed from
an even longer primary transcript (Lagos-Quintana et al. 2001). This idea was further
supported by the fact that some of the novel genes were clustered so closely together that
they appeared to be transcribed as a single unit (Lagos-Quintana et al. 2001; Lau et al.
2001). Subsequent reverse transcription polymerase chain reaction (RT-PCR)
experiments amplifying a larger region surrounding miRNAs revealed that pre-miRNAs
were derived from a longer transcript now known as pri-miRNA (Lee et al. 2002).
Although many non-coding RNAs, such as tRNAs and U6 small nuclear RNA,
are transcribed by RNA polymerase III (pol III), RNA polymerase II (pol II) was
hypothesized to transcribe miRNA genes. The pri-miRNAs can be over a kilobase, longer
than most pol III-dependent transcripts, and they contain stretches of uridines that would
terminate pol III transcription (Lee et al. 2002). Also, the expressions of many miRNAs
are temporally or spatially restricted, which suggested pol II transcription. RNase
protection assay (RPA) and RT-PCR of pri-miRNAs from RNAs that bound to capbinding protein eIF-4E indicated that pri-miRNAs contained the 5' cap (Cai et al. 2004;
Lee et al. 2004). Furthermore, similar experiments performed with polyadenylated RNAs
and identification of putative polyadenylation signals suggested that pri-miRNAs also
had poly(A) tails (Bracht et al. 2004; Cai et al. 2004; Lee et al. 2004). Coupled with primiRNA transcription dependence on c-amanitin and pol II chromatin-IP (ChIP) results
(Lee et al. 2004), these findings confirmed that most pri-miRNAs are transcribed by pol
II.
Nuclear processing of pri-miRNAs by Microprocessor
To understand how pri-miRNAs are processed into pre-miRNAs, the cleavage sites of
pri-miRNAs were determined by mapping the 5' and 3' ends of pre-miRNAs (Basyuk et
al. 2003; Lee et al. 2003) When pre-miRNAs were characterized and folded, the hairpins
had a 5' phosphate and a -2 nt 3' overhang typical of RNase III cleavage. There are three
classes of RNase III, each class represented by Escherichiacoli RNase III, eukaryotic
Drosha, and eukaryotic Dicer. Because pri-miRNAs are processed into pre-miRNAs in
the nucleus (Lee et al. 2002), the nuclear RNase III enzyme Drosha became a primary
candidate for pri-miRNA-processing machinery (Lee et al. 2003). As predicted,
immunoprecipitated Drosha complex generated pre-miRNAs from pri-miRNAs in vitro,
and inhibition of Drosha significantly repressed mature miRNA production in vivo (Lee
et al. 2003). These findings supported the notion that Drosha cleaves pri-miRNAs into
pre-miRNAs.
Drosha has two RNase III domains (RIIIDs), a double-stranded RNA binding
domain (dsRBD), and an extended N terminus which contains a proline-rich region and
arginine- and serine-rich region (Figure 1B). The tandem RIIIDs form an intramolecular
dimer which cleaves a pri-miRNA to generate a pre-miRNA hairpin with a -2 nt 3'
overhang (Han et al. 2004). Although Drosha's dsRBD structure is similar to other RNAbinding dsRBDs (Mueller et al. 2010), it does not have significant RNA-binding activity
(Han et al. 2006).
Biochemical analysis of Drosha revealed that it existed in a complex with
DiGeorge syndrome critical region gene 8 (DGCR8) (Denli et al. 2004; Gregory et al.
2004; Han et al. 2004). DGCR8 contains two dsRBDs that are arranged with pseudo twofold symmetry in its core as well as a WW domain that can interact with proline-rich
peptides (Figure 1B) (Sohn et al. 2007). Alone, neither Drosha nor DGCR8 can process
pri-miRNAs, but together, they can efficiently cleave pri-miRNAs to generate premiRNAs in vitro (Gregory et al. 2004; Han et al. 2004). The complex consisting of
Drosha and DGCR8 is called the "Microprocessor."
A pri-miRNA consists of a stem, a terminal loop, and nonstructured flanking
sequences. Although there appeared to be no consensus sequence on the flanking regions,
they have been shown to be important for efficient pri-miRNA processing both in vitro
and in vivo (Lee et al. 2003; Chen et al. 2004; Zeng and Cullen 2005; Han et al. 2006).
The stem is -3 helical turns, and the cleavage site is ~1 helical turn (I11
bps) from the
base of the hairpin. (Han et al. 2006). Although the terminal loop has been reported to be
important for pri-miRNA processing (Zeng et al. 2005), systematic mutagenesis
experiments revealed that the site of Drosha cleavage is determined by the distance from
the ssRNA-dsRNA junction (Han et al. 2006). Thus, the current model posits that
DGCR8 binds to the base of the pri-miRNA hairpin with two dsRBDs contacting two
discontinuous segments of the stem and positions Drosha such that it cuts the stem at a
distance of -11 bps away from the base of the hairpin (Han et al. 2006; Sohn et al. 2007).
Nuclear export of pre-miRNAs by Exportin-5
After the Microprocessor cleavage, pre-miRNAs are exported from the nucleus to the
cytoplasm. Due to lack of a consensus sequence on pre-miRNAs, the export receptor was
hypothesized to recognize a structural motif. Exportin-5 (Exp5) is a Ran-dependent
nuclear transport receptor that recognizes RNA stem and a 2 nt 3' overhang, both
structural elements of pre-miRNAs (Okada et al. 2009). Exp5 forms a complex with its
cargo in presence of GTP-bound Ran and translocates to the cytoplasm. Upon export,
GTP is hydrolyzed to GDP, and the cargo is released. In order to test whether Exp5
exports pre-miRNAs, Exp5 expression was repressed using RNA interference (RNAi).
Inhibition of Exp5 resulted in reduction of pre-miRNAs and mature miRNAs in the
cytoplasm as well as decrease in miRNA function (Yi et al. 2003; Lund et al. 2004).
Furthermore, pre-miRNA binding to and export by Exp5 were dependent on Ran-GTP,
and injection of purified Exp5 into Xenopus oocyte nuclei resulted in cytoplasmic
accumulation of pre-miRNAs but not other RNAs (Lund et al. 2004). These results
provided evidence that pre-miRNAs are exported to the cytoplasm by Exp5.
Cytoplasmic processing of pre-miRNAs by Dicer
Once pre-miRNAs are exported to the cytoplasm, they need to be processed into -22 nt
mature miRNAs. Cytoplasmic RNase III enzyme Dicer had previously been implicated in
processing of dsRNAs into small interfering RNAs (siRNAs) (Bernstein et al. 2001). Due
to the similarities between miRNAs and siRNAs, the role of Dicer in pre-miRNA
processing was examined (Grishok et al. 2001; Hutvigner et al. 2001; Ketting et al.
2001). When the level of Dicer was reduced, pre-miRNAs accumulated while the level of
mature miRNAs decreased (Grishok et al. 2001; Hutvigner et al. 2001). Dicer also
cleaved pre-miRNAs efficiently in vitro (Hutvigner et al. 2001; Ketting et al. 2001).
Dicer consists of two RIIIDs, two dsRBDs, a DExD/H box RNA helicase domain,
and a Piwi/Argonaute/Zwille (PAZ) domain (Figure 1B). Although Dicer was initially
hypothesized to have two active dsRNA cleavage sites, mutagenesis experiments
provided a model in which the two RIIIDs form an intramolecular dimer to form a single
dsRNA processing center (Zhang et al. 2004). This model proposed that the PAZ domain
recognized the 3' overhang of a pre-miRNA left by Drosha cleavage, and the distance
between the PAZ domain and the RIIIDs dictated the site of Dicer cleavage. The
structure of Dicer confirmed this model (Macrae et al. 2006). Thus, Dicer serves as a
molecular ruler to measure a fixed distance from the site of Drosha cleavage to process
pre-miRNAs into a -22 nt RNA duplex.
RISC loading
After Dicer cleavage, the ~22 nt RNA duplex is loaded onto the RISC by RISC-loading
complex (RLC). The first RLC to be identified was fly Ago2-RLC, which consists of
Ago2, Dicer-2, and its dsRNA-binding partner R2D2 (Liu et al. 2003; Pham et al. 2004;
Tomari et al. 2004). Subsequently, human Ago2-RLC components were identified as
Ago2, Dicer, and TAR RNA-binding protein (TRBP) (Chendrimada et al. 2005;
Maniataki and Mourelatos 2005; Macrae et al. 2008). The association of Dicer in RLC
raised a debate on whether pre-miRNA processing and RISC-loading were coupled.
Although some studies support this model (Gregory et al. 2005; Maniataki and
Mourelatos 2005), the more widely accepted view is that the two processes are not
coupled (Murchison et al. 2005; Preall et al. 2006; Yoda et al. 2010).
Usually, only one strand (miRNA) from the miRNA:miRNA* duplex is
incorporated into the RISC while the passenger strand (miRNA*) is degraded. To
determine how the miRNA strand is chosen, RISC-capture assay and thermodynamic
profiling were performed on various RNA duplexes (Khvorova et al. 2003; Schwarz et al.
2003). In general, the species with less thermodynamically stable 5' end was incorporated
into the RISC as the mature miRNA. The degree of functional asymmetry was attributed
to the relative ease with which the 5' ends of the two strands can be unwound from the
duplex.
Noncanonical miRNA biogenesis
At least three noncanonical miRNA biogenesis pathways have been identified. The first
consists of a class of miRNAs called mirtrons that bypasses Drosha cleavage (Figure 1A,
middle upper) (Okamura et al. 2007; Ruby et al. 2007). Initially identified in D.
melanogasterand C. elegans, these miRNAs are derived from short introns of proteincoding genes, which are spliced by the spliceosome. After the excised lariat is
debranched, it folds into a pre-miRNA hairpin, which is then exported into the cytoplasm
for Dicer cleavage. Thus, mirtron pre-miRNAs are generated from pre-mRNA by the
spliceosome rather than Drosha. Because the intron needs to fold into a hairpin suitable
for Dicer processing, mirtrons generally arise from introns of length ~60 nts, the average
length of a canonical pre-miRNA. While the genomes of C. elegans and D. melanogaster
have abundance of introns with lengths similar to pre-miRNAs, many mammalian
genomes-including mouse and human-contain few such introns and thus are less
likely to evolve mirtrons (Ruby et al. 2007). As a result, although mirtrons have been
observed in mammals, they comprise a smaller fraction of the pre-miRNAs (Berezikov et
al. 2007; Babiarz et al. 2008). However, some longer introns have been observed to fold
into a hairpin with a tail at either the 5' or the 3' end, and subsequent nucleolytic
cleavage can yield a pre-miRNA-like hairpin (Figure 1A, middle lower) (Babiarz et al.
2008). This subclass of mirtrons is called tailed-mirtrons.
The second class of noncanonical miRNA is endogenous small hairpin RNAs
(shRNAs). Like exogenous shRNAs (Paddison et al. 2004), an endogenous shRNA
transcript can fold into a hairpin, but it lacks significant base-pairing beyond the premiRNA hairpin (Babiarz et al. 2008). The processing of endogenous shRNAs is not
dependent on DGCR8 but dependent on Dicer, which suggests that a pri-miRNA of an
endogenous shRNA may be processed into a pre-miRNA in a Microprocessorindependent manner (Babiarz et al. 2008). One possibility is that the pri-miRNA is
trimmed by nucleases into a pre-miRNA hairpin, which can then be processed by Dicer
into a mature miRNA:miRNA* duplex (Figure 1A, bottom).
A more recent observation has shown that the processing of miR-451 is dependent
on Ago2 instead of Dicer (Figure IA, inset) (Cheloufi et al. 2010; Cifuentes et al. 2010).
The pre-miR-451 is unusual in that it only has 17 bps in its stem-too short to be a Dicer
substrate. Furthermore, the mature miR-451 spans the loop rather than being confined to
one arm of the hairpin. The dissection of miR-451 maturation process has shown that
after Drosha cleavage and nuclear export, Ago2, rather than Dicer, is responsible for the
second cleavage. Ago2 cleavage generates a 30 nt product, which is likely trimmed by
RNases/nucleosomes to the annotated length of 22 nts. Thus far, miR-451 is the only
known miRNA to have Dicer-independent, Ago2-dependent biogenesis.
MicroRNA function
The predominant role of miRNAs is to repress gene expression (Fabian et al. 2010)
although there have been reports of miRNAs that upregulate gene expression (Vasudevan
et al. 2007; Orom et al. 2008). When a miRNA has an extensive complementarity to a
target mRNA, such as miR-196 to its target mRNA HOXB, the target mRNA is cleaved
by AGO2 (Yekta et al. 2004). However, most mammalian miRNAs lack such extensive
pairing to their targets. In case of imperfect pairing, the main site of target guidance is on
the nucleotides 2-7 of the miRNA, also known as the "seed" (Lewis et al. 2003; Lewis et
al. 2005; Grimson et al. 2007; Bartel 2009). Initially, the primary mode of such gene
downregulation was thought to be translational inhibition with little or no change at the
mRNA level (Wightman et al. 1993; O'Donnell et al. 2005; Zhao et al. 2005). However,
advanced proteomic surveys coupled with microarray analysis of miRNAs and their
target genes have shifted the paradigm of miRNA-mediated gene repression from
translational inhibition to mRNA destabilization (Baek et al. 2008; Selbach et al. 2008).
Polysome and ribosome profiling of comparable samples supported the idea that most
miRNA-mediated repression occurred primarily through decrease in target mRNA levels
(Hendrickson et al. 2009; Guo et al. 2010).
Global miRNA gene discovery
Computational prediction of miRNA genes
Although conventional cloning and sequencing small RNAs led to discovery of hundreds
of mammalian miRNAs (Lagos-Quintana et al. 2001; Lagos-Quintana et al. 2002;
Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Berezikov et al. 2006b; Landgraf et al.
2007), the number of miRNA genes identified was far from saturation due to lowthroughput sequencing and constrained expression patterns. The low-throughput Sanger
sequencing allowed only the more abundant miRNAs in the sample to be sequenced.
Furthermore, only the miRNAs that were present in the sample, rather than all the
miRNAs encoded in the genome, could be identified. Thus, some miRNAs that were
expressed at a low level or in specific cell types or conditions were not identified using
this approach.
Nonetheless, a number of properties characteristic of miRNAs were deduced from
the expanded list of miRNAs, and these features were used to computationally predict
miRNA genes from the genomic sequence. One feature that best distinguishes miRNAs is
the stem-loop structure. Because a miRNA matures through Drosha and Dicer cuts, it
must map to a locus that can fold into a stable hairpin of -33 bps. Another commonly
used feature is conservation, because miRNAs frequently have biological functions that
have been conserved through evolution. Other properties include sequence, additional
structural information, thermodynamic stability, and genomic location.
The earliest miRNA gene predictions relied heavily on conservation. For
example, the MiRscan algorithm scanned the C. elegans genome for hairpin structures
that were conserved to C. briggsae (Lim et al. 2003b). MiRscan then evaluated the
filtered hairpins for secondary structure, sequence biases, and additional conservation to
determine whether the hairpins resembled known miRNAs. The study identified 35 novel
miRNA genes in C. elegans, and a subset of the predictions was tested by Northern blots
and 5' rapid amplification of cDNA ends (RACE). Subsequently, this method was
applied to vertebrate genomes to discover 38 novel human miRNAs and 14 homologs of
previously known miRNAs (Lim et al. 2003a).
Using phylogenetic shadowing, another study observed that the stem region of the
pre-miRNA was conserved while the flanking regions and the terminal loop were not
conserved (Berezikov et al. 2005). Sixteen human miRNAs and 976 candidates were
identified by first scanning the genome for a pre-miRNA-like conservation profile and
then filtering for thermodynamically favorable hairpins. Some of the candidates were
supported by Northern blots and later by sequencing data and/or RNA-primed Arraybased Klenow Extension (RAKE) (Berezikov et al. 2006b).
In an alternate method, conservation of potential target genes rather than that of
hairpins was used to predict novel miRNA genes (Xie et al. 2005). First, 8-mer conserved
motifs in 3' UTR of mRNAs were identified. Hypothesizing that the discovered motifs
corresponded to locations where miRNA seed sequences bound, the sequences
complementary to the motifs were mapped to the human genome to identify loci that
could produce miRNAs with the corresponding seeds. If these sequences were conserved,
the flanking region surrounding the sequence was folded to determine whether it could
form a pre-miRNA-like hairpin. This method identified 129 novel candidates, some of
which were supported by 5' RACE.
Although methods using conservation have predicted plethora of novel miRNA
genes, they cannot predict nonconserved genes. A number of machine learning-based
approaches have been developed for ab initio miRNA prediction (Sewer et al. 2005; Xue
et al. 2005; Helvik et al. 2007; Jiang et al. 2007; Wang et al. 2010). These algorithms first
learn the properties characteristic of miRNAs and then build a classifier based on positive
and negative samples to determine whether a given sequence resembles a miRNA gene.
The positive samples are known miRNAs in the miRBase database; the negative samples
are usually selected from hairpins from other non-coding RNAs such as rRNA or from
mRNAs. The features that are used to describe the hairpins range from overall
thermodynamic stability to percentage of nucleotide composition in a particular region of
the hairpin. Early works demonstrated that such machine learning methods could separate
the positives from the negatives (Sewer et al. 2005; Xue et al. 2005), and many recent
efforts used a similar approach.
MicroRNA gene discovery by second-generation sequencing
Although computational approaches were able to identify novel miRNA genes and
candidates, these efforts were limited by incomplete understanding of miRNA-processing
and the low-quality positive and negative training sets. Furthermore, some predicted
miRNAs may not even be transcribed and thus lack biological function. Cloning and
sequencing small RNAs can bypass these problems. While the experimental method also
has its own limitations, they can be ameliorated by deeper sequencing of broad range of
samples.
The small RNA cloning protocols for miRNA discovery were pioneered by three
labs (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001), but the
concepts behind them were similar-ligations of adaptors to the 3' and 5' ends of the
size-fractionated small RNAs followed by reverse-transcription (RT) and polymerase
chain reaction (PCR) amplification. The amplified constructs were concatemerized,
cloned into a plasmid, and sequenced by Sanger method. With some adaptations, similar
approaches were used to construct libraries for high-throughput sequencing platforms (Lu
et al. 2007; Hafner et al. 2008).
Massively parallel signature sequencing (MPSS) was an early high-throughput
sequencing technology (Brenner et al. 2000). In order to sequence small RNAs, the 5'
and 3' adaptors were first ligated onto small RNAs, and RNAs were reverse-transcribed
into cDNAs and cloned into a vector with a unique identifier tag (Mineno et al. 2006).
After amplification, the cDNA library was hybridized and ligated to microbeads with
complementary tags (Brenner et al. 2000). Thus, each microbead carried 100,000 copies
of an identical sequence. To determine the sequence, the construct was cleaved by a
restriction enzyme to expose a 4 nt overhang, and encoded adaptors hybridized to the
overhang and ligated to the construct. The encoded adaptors contained a 4 nt overhang
with all possible nt combinations, a corresponding fluorescent label, and a restriction
enzyme recognition site. The microbeads were imaged to determine the sequence of the
overhang. The encoded adaptor was cleaved by the restriction enzyme, and the process
was repeated. From more than 500,000 reads obtained from mouse embryos, 61 novel
miRNA genes were identified after filtering for pre-miRNA-like hairpin structure
(Mineno et al. 2006).
While MPSS opened the door for high-throughput sequencing of miRNAs, it
appeared to have sequence biases. The next major development in sequencing technology
was 454 pyrosequencing (Margulies et al. 2005). Pyrosequencing technology takes
advantage of the fact that a pyrophosphate is released upon nucleotide incorporation.
Briefly, a DNA molecule is attached to a single bead by limiting dilution, and the
sequence is amplified on the bead within a droplet of emulsion. A single nucleotide is
washed over the beads, and a polymerase incorporates the nucleotide if it is
complementary to the template. When a nucleotide is incorporated, a pyrophosphate is
released and converted to ATP by ATP sulfurylase. In presence of ATP, luciferase emits
light, and the signal is detected by a camera. Unincorporated nucleotides are washed
away, and the cycle is repeated with the next nucleotide. Pyrosequencing can sequence
~1.5 million reads of 300-500 nts in length, and a number of studies have utilized this
technology for miRNA discovery in mammals (Berezikov et al. 2006a; Berezikov et al.
2006b; Berezikov et al. 2007; Calabrese et al. 2007).
Currently, the most widely used sequencing method is Illumina's reversible
terminator technology due to the number of reads it can generate per run (Seo et al.
2004). First, a cDNA library is constructed such that the small RNA sequence is flanked
by two adaptor sequences. The cDNAs are hybridized to the primers attached to the chip,
whose sequences are complementary to the adaptor sequences. The opposite strand is
synthesized by a polymerase, and the new strand, which is covalently attached to the
chip, can bend over to anneal to another primer complementary to the free end. This
process-bridge-amplification-is repeated to build clusters of DNAs. One of the strands
is removed so that each cluster contains single-stranded DNA molecules with an identical
sequence. To determine the sequence of each cluster, a sequencing primer, polymerase,
and fluorescently labeled dNTPs are added to the chip. Each dNTP has a base-unique
fluorescent label and is blocked on the 3' terminus so that only one nucleotide can be
incorporated at each cycle. After imaging the chip, the terminator and fluorescent label
are photo-cleaved. The process is repeated, and the sequence of each cluster can be
determined by tracking the fluorescent label bound to the cluster at each step. Although
this technology could initially only sequence up to 32 nts, it has been improved so that it
can now generate 200 million reads of 100 nts per run. Many recent miRNA gene
discovery efforts have utilized Illumina sequencing platform (Morin et al. 2008; Ahn et
al. 2010; Su et al. 2010).
State of miRNA annotations
Since establishment miRBase, a database of miRNA annotations, the number of
annotated miRNA genes has grown explosively (Figure 2) (Griffiths-Jones 2004;
Griffiths-Jones et al. 2006). While both computational and experimental methods have
contributed to miRNA gene discovery, almost all the novel miRNA gene annotations
since 2008 have been the result of sequencing studies (Kozomara and Griffiths-Jones
2011). Although the database is continuously updated to provide the most accurate
information, even a single study can deposit a large number of false entries.
Two major factors contribute to inaccurate miRNA gene annotations. The first is
non-stringent discovery methods. For example, a study identified fly miRNA genes based
on a single read mapping to one arm of a hairpin structure (Lu et al. 2008). When more
reads from a deeper sequencing study were mapped to the "genes," most of these entries
appeared to be degradation fragments (Berezikov et al. 2010). Although the original
guidelines for miRNA annotation only required presence of -22 nt RNA and hairpin
structure (Ambros et al. 2003a), some studies have since adopted the following additional
criteria: minimum level of expression, absence of overlap to annotated transcripts,
relatively precise Drosha and Dicer cleavage sites, and presence of miRNA* species
(Ruby et al. 2006; Grimson et al. 2008; Berezikov et al. 2010; Marco et al. 2010).
The other factor contributing to inaccurate annotation is the number of reads that
can be sequenced by contemporary technology. With an abundance of reads mapping to a
genomic region, it is relatively easy to determine whether it is a miRNA gene. With
fewer reads mapping to a locus, researchers must make the decision based on a limited
amount of information. While setting a cutoff of minimum number of reads matching the
putative miRNA can alleviate this problem, such requirement trades specificity-how
well the method discriminates against false positives-for sensitivity-how well the
method identifies all the true positives.
Summary
Since the discovery of lin-4, more than 15,000 miRNA genes have been identified. These
genes play an important role in gene regulation by repressing the expression of their
target genes. Since miRNA targeting is based on its sequence, accurate and
comprehensive annotation of miRNA genes is crucial in understanding their biological
roles.
In the following chapters, the method to construct a small RNA library for
Illumina sequencing platform (Chapter 2) and the analysis of the data derived from
mouse libraries (Chapter 3) are described. In addition to substantially revising the list of
confidently identified miRNA genes, we provided a medium-throughput method to test
questionable annotations and described the general features of murine and mammalian
miRNAs. Our analysis also revealed variations in miRNA processing with functional
consequences.
Figure Legends
Figure 1. MicroRNA biogenesis. (A) Canonical miRNA biogenesis (top), mirtron
biogenesis (middle upper), tailed-mirtron biogenesis (middle lower), endogenous shRNA
biogenesis (bottom), and Ago2-dependent, Dicer-independent biogenesis (inset). Red
strand corresponds to mature miRNA, and blue strand corresponds to miRNA*. (B)
Schematic representation of domain structures of proteins in canonical miRNA
biogenesis pathway. Figure is adapted and modified from (Nowotny and Yang 2009).
RIIID, RNase III domain; dsRBD, double-stranded RNA-binding domain; PAZ,
Piwi/Argonaute/Zwille domain.
Figure 2. Growth of miRNA gene annotations in miRBase. The data tracks the number of
mouse (green), human (red), and all miRNA gene entries (blue) from January 2004 to
September 2010, corresponding to miRBase version 3.0 to 16.0.
References
Ahn, H.W., Morin, R.D., Zhao, H., Harris, R.A., Coarfa, C., Chen, Z.-J., Milosavljevic,
A., Marra, M.A., and Rajkovic, A. 2010. MicroRNA transcriptome in the
newborn mouse ovaries determined by massive parallel sequencing. Mol Hum
Reprod 16(7): 463-471.
Ambros, V. 1989. A hierarchy of regulatory genes controls a larva-to-adult
developmental switch in C. elegans. Cell 57(1): 49-57.
Ambros, V., Bartel, B., Bartel, D., Burge, C., Carrington, J., Chen, X., Dreyfuss, G.,
Eddy, S., Griffiths-Jones, S., Marshall, M., Matzke, M., Ruvkun, G., and Tuschl,
T. 2003a. A uniform system for microRNA annotation. Rna 9(3): 277-279.
Ambros, V. and Horvitz, H.R. 1984. Heterochronic mutants of the nematode
Caenorhabditis elegans. Science 226(4673): 409-416.
Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003b. MicroRNAs
and Other Tiny Endogenous RNAs in C. elegans. CurrentBiology 13(10): 807818.
Arasu, P., Wightman, B., and Ruvkun, G. 1991. Temporal regulation of lin-14 by the
antagonistic action of two other heterochronic genes, lin-4 and lin-28. Genes &
Development 5(10): 1825-1833.
Avery, O.T., Macleod, C.M., and McCarty, M. 1944. Studies on the chemical nature of
the substance inducing transformation of pneumococcal types: induction of
transformation by a desoxyribonucleic acid fraction isolated from pneumococcus
type III. J Exp Med 79(2): 137-158.
Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. 2008. Mouse ES cells
express endogenous shRNAs, siRNAs, and other Microprocessor-independent,
Dicer-dependent small RNAs. Genes & Development 22(20): 2773-2785.
Baek, D., Villdn, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. 2008. The
impact of microRNAs on protein output. Nature 455(7209): 64-71.
Bartel, D. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
Bartel, D.P. 2009. MicroRNAs: Target Recognition and Regulatory Functions. Cell
136(2): 215-233.
Baskerville, S. and Bartel, D. 2005. Microarray profiling of microRNAs reveals frequent
coexpression with neighboring miRNAs and host genes. Rna 11(3): 241-247.
Basyuk, E., Suavet, F., Doglio, A., Bordonnd, R., and Bertrand, E. 2003. Human let-7
stem-loop precursors harbor features of RNase III cleavage products. Nucleic
Acids Res 31(22): 6593-6597.
Beadle, G.W. and Tatum, E.L. 1941. Genetic Control of Biochemical Reactions in
Neurospora. P Natl Acad Sci Usa 27(11): 499-506.
Berezikov, E., Chung, W.-J., Willis, J., Cuppen, E., and Lai, E.C. 2007. Mammalian
mirtron genes. Mol Cell 28(2): 328-336.
Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H.A., and Cuppen,
E. 2005. Phylogenetic shadowing and computational identification of human
microRNA genes. Cell 120(1): 21-24.
Berezikov, E., Liu, N., Flynt, A.S., Hodges, E., Rooks, M., Hannon, G.J., and Lai, E.C.
2010. Evolutionary flux of canonical microRNAs and mirtrons in Drosophila. Nat
Genet 42(1): 6-9; author reply 9-10.
Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E.,
and Plasterk, R.H.A. 2006a. Diversity of microRNAs in human and chimpanzee
brain. Nat Genet 38(12): 1375-1377.
Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J.,
Verloop, R., van de Wetering, M., Guryev, V., Takada, S., van Zonneveld, A.J.,
Mano, H., Plasterk, R., and Cuppen, E. 2006b. Many novel mammalian
microRNA candidates identified by extensive cloning and RAKE analysis.
Genome Res 16(10): 1289-1298.
Bernstein, E., Caudy, A.A., Hammond, S.M., and Hannon, G.J. 2001. Role for a
bidentate ribonuclease in the initiation step of RNA interference. Nature
409(6818): 363-366.
Bracht, J., Hunter, S., Eachus, R., Weeks, P., and Pasquinelli, A.E. 2004. Trans-splicing
and polyadenylation of let-7 microRNA primary transcripts. Rna 10(10): 15861594.
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., Luo, S.,
McCurdy, S., Foy, M., Ewan, M., Roth, R., George, D., Eletr, S., Albrecht, G.,
Vermaas, E., Williams, S.R., Moon, K., Burcham, T., Pallas, M., DuBridge, R.B.,
Kirchner, J., Fearon, K., Mao, J., and Corcoran, K. 2000. Gene expression
analysis by massively parallel signature sequencing (MPSS) on microbead arrays.
Nat Biotechnol 18(6): 630-634.
Cai, X., Hagedorn, C.H., and Cullen, B.R. 2004. Human microRNAs are processed from
capped, polyadenylated transcripts that can also function as mRNAs. Rna 10(12):
1957-1966.
Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis
defines Dicer's role in mouse embryonic stem cells. P Natl Acad Sci Usa 104(46):
18097-18102.
Chalfie, M., Horvitz, H.R., and Sulston, J.E. 1981. Mutations that lead to reiterations in
the cell lineages of C. elegans. Cell 24(1): 59-69.
Cheloufi, S., Dos Santos, C.O., Chong, M.M.W., and Hannon, G.J. 2010. A dicerindependent miRNA biogenesis pathway that requires Ago catalysis. Nature
465(7298): 584-589.
Chen, C., Li, L., Lodish, H., and Bartel, D. 2004. MicroRNAs modulate hematopoietic
lineage differentiation. Science 303(5654): 83-86.
Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura,
K., and Shiekhattar, R. 2005. TRBP recruits the Dicer complex to Ago2 for
microRNA processing and gene silencing. Nature 436(7051): 740-744.
Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E.,
Mane, S., Hannon, G.J., Lawson, N.D., Wolfe, S.A., and Giraldez, A.J. 2010. A
novel miRNA processing pathway independent of Dicer requires Argonaute2
catalytic activity. Science 328(5986): 1694-1698.
Czech, B., Malone, C.D., Zhou, R., Stark, A., Schlingeheyde, C., Dus, M., Perrimon, N.,
Kellis, M., Wohlschlegel, J.A., Sachidanandam, R., Hannon, G.J., and Brennecke,
J. 2008. An endogenous small interfering RNA pathway in Drosophila. Nature
453(7196): 798-802.
Denli, A.M., Tops, B.B.J., Plasterk, R.H.A., Ketting, R.F., and Hannon, G.J. 2004.
Processing of primary microRNAs by the Microprocessor complex. Nature
432(7014): 231-235.
Elbashir, S.M., Lendeckel, W., and Tuschl, T. 2001. RNA interference is mediated by 21and 22-nucleotide RNAs. Genes & Development 15(2): 188-200.
Fabian, M.R., Sonenberg, N., and Filipowicz, W. 2010. Regulation of mRNA translation
and stability by microRNAs. Annu Rev Biochem 79: 351-379.
Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E., and Mello, C.C. 1998.
Potent and specific genetic interference by double-stranded RNA in
Caenorhabditis elegans. Nature 391(6669): 806-811.
Ghildiyal, M., Seitz, H., Horwich, M.D., Li, C., Du, T., Lee, S., Xu, J., Kittler, E.L.W.,
Zapp, M.L., Weng, Z., and Zamore, P.D. 2008. Endogenous siRNAs Derived
from Transposons and mRNAs in Drosophila Somatic Cells. Science 320(5879):
1077-1081.
Gregory, R.I., Chendrimada, T.P., Cooch, N., and Shiekhattar, R. 2005. Human RISC
couples microRNA biogenesis and posttranscriptional gene silencing. Cell 123(4):
631-640.
Gregory, R.I., Yan, K.-P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and
Shiekhattar, R. 2004. The Microprocessor complex mediates the genesis of
microRNAs. Nature 432(7014): 235-240.
Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res 32(Database issue):
D109-111.
Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006.
miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids
Res 34(Database issue): D140-144.
Grimson, A., Farh, K.K.-H., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel,
D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond
seed pairing. Mol Cell 27(1): 91-105.
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N.,
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution
of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193Ul 115.
Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A.,
Ruvkun, G., and Mello, C.C. 2001. Genes and mechanisms related to RNA
interference regulate expression of the small temporal RNAs that control C.
elegans developmental timing. Cell 106(1): 23-34.
Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. 2010. Mammalian microRNAs
predominantly act to decrease target mRNA levels. Nature 466(7308): 835-840.
Hafner, M., Landgraf, P., Ludwig, J., Rice, A., Ojo, T., Lin, C., Holoch, D., Lim, C., and
Tuschl, T. 2008. Identification of microRNAs and other small regulatory RNAs
using cDNA library sequencing. Methods 44(1): 3-12.
Han, J., Lee, Y., Yeom, K.-H., Kim, Y.-K., Jin, H., and Kim, V.N. 2004. The DroshaDGCR8 complex in primary microRNA processing. Genes & Development
18(24): 3016-3027.
Han, J., Lee, Y., Yeom, K.-H., Nam, J.-W., Heo, I., Rhee, J.-K., Sohn, S.Y., Cho, Y.,
Zhang, B.-T., and Kim, V.N. 2006. Molecular basis for the recognition of primary
microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901.
Helvik, S.A., Snove, 0., and Saetrom, P. 2007. Reliable prediction of Drosha processing
sites improves microRNA gene prediction. Bioinformatics 23(2): 142-149.
Hendrickson, D.G., Hogan, D.J., McCullough, H.L., Myers, J.W., Herschlag, D., Ferrell,
J.E., and Brown, P.O. 2009. Concordant Regulation of Translation and mRNA
Abundance for Hundreds of Targets of a Human microRNA. PLoS Biol 7(11):
e1000238.
Horvitz, H.R. and Sulston, J.E. 1980. Isolation and genetic characterization of celllineage mutants of the nematode Caenorhabditis elegans. Genetics 96(2): 435454.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Huang, Y., Shen, X.J., Zou, Q., Wang, S.P., Tang, S.M., and Zhang, G.Z. 2010.
Biological functions of microRNAs: a review. J Physiol Biochem.
Hutvigner, G., McLachlan, J., Pasquinelli, A.E., Bilint, E., Tuschl, T., and Zamore, P.D.
2001. A cellular function for the RNA-interference enzyme Dicer in the
maturation of the let-7 small temporal RNA. Science 293(5531): 834-838.
Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., and Lu, Z. 2007. MiPred: classification of
real and pseudo microRNA precursors using random forest prediction model with
combined features. Nucleic Acids Res 35(Web Server issue): W339-344.
Johannsen, W. 1911. The genotype conception of heredity. Am Nat 45: 129-159.
Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H.
2001. Dicer functions in RNA interference and in synthesis of small RNA
involved in developmental timing in C. elegans. Genes & Development 15(20):
2654-2659.
Khvorova, A., Reynolds, A., and Jayasena, S.D. 2003. Functional siRNAs and miRNAs
exhibit strand bias. Cell 115(2): 209-216.
Kozomara, A. and Griffiths-Jones, S. 2011. miRBase: integrating microRNA annotation
and deep-sequencing data. Nucleic Acids Res 39(Database issue): D152-157.
Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of
novel genes coding for small expressed RNAs. Science 294(5543): 853-858.
Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. 2003. New
microRNAs from mouse and human. Rna 9(2): 175-179.
Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T.
2002. Identification of tissue-specific microRNAs from mouse. Curr Biol 12(9):
735-739.
Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice,
A., Kamphorst, A.O., Landthaler, M., Lin, C., Socci, N.D., Hermida, L., Fulci, V.,
Chiaretti, S., Foi, R., Schliwka, J., Fuchs, U., Novosel, A., Muller, R.-U.,
Schermer, B., Bissels, U., Inman, J., Phan, Q., Chien, M., Weir, D.B., Choksi, R.,
De Vita, G., Frezzetti, D., Trompeter, H.-I., Hornung, V., Teng, G., Hartmann, G.,
Palkovits, M., Di Lauro, R., Wernet, P., Macino, G., Rogler, C.E., Nagle, J.W.,
Ju, J., Papavasiliou, F.N., Benzing, T., Lichter, P., Tam, W., Brownstein, M.J.,
Bosio, A., Borkhardt, A., Russo, J.J., Sander, C., Zavolan, M., and Tuschl, T.
2007. A mammalian microRNA expression atlas based on small RNA library
sequencing. Cell 129(7): 1401-1414.
Lau, N., Lim, L., Weinstein, E., and Bartel, D. 2001. An abundant class of tiny RNAs
with probable regulatory roles in Caenorhabditis elegans. Science 294(5543): 858862.
Lee, R. and Ambros, V. 2001. An extensive class of small RNAs in Caenorhabditis
elegans. Science 294(5543): 862-864.
Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5): 843854.
Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Ridmark, 0.,
Kim, S., and Kim, V.N. 2003. The nuclear RNase III Drosha initiates microRNA
processing. Nature 425(6956): 415-419.
Lee, Y., Jeon, K., Lee, J.-T., Kim, S., and Kim, V.N. 2002. MicroRNA maturation:
stepwise processing and subcellular localization. Embo J 21(17): 4663-4670.
Lee, Y., Kim, M., Han, J., Yeom, K.-H., Lee, S., Baek, S.H., and Kim, V.N. 2004.
MicroRNA genes are transcribed by RNA polymerase II. Embo J23(20): 40514060.
Lewis, B., Burge, C., and Bartel, D. 2005. Conserved seed pairing, often flanked by
adenosines, indicates that thousands of human genes are microRNA targets. Cell
120(1): 15-20.
Lewis, B., Shih, I., Jones-Rhoades, M., Bartel, D., and Burge, C. 2003. Prediction of
mammalian microRNA targets. Cell 115(7): 787-798.
Lim, L., Glasner, M., Yekta, S., Burge, C., and Bartel, D. 2003a. Vertebrate MicroRNA
genes. Science 299(5612): 1540-1540.
Lim, L., Lau, N., Weinstein, E., Abdelhakim, A., Yekta, S., Rhoades, M., Burge, C., and
Bartel, D. 2003b. The microRNAs of Caenorhabditis elegans. Genes &
Development 17(8): 991-1008.
Liu, Q., Rand, T.A., Kalidas, S., Du, F., Kim, H.-E., Smith, D.P., and Wang, X. 2003.
R2D2, a bridge between the initiation and effector steps of the Drosophila RNAi
pathway. Science 301(5641): 1921-1925.
Lu, C., Meyers, B.C., and Green, P.J. 2007. Construction of small RNA cDNA libraries
for deep sequencing. Methods 43(2): 110-117.
Lu, J., Shen, Y., Wu, Q., Kumar, S., He, B., Shi, S., Carthew, R.W., Wang, S.M., and
Wu, C.-I. 2008. The birth and death of microRNA genes in Drosophila. Nat Genet
40(3): 351-355.
Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. 2004. Nuclear export
of microRNA precursors. Science 303(5654): 95-98.
Macrae, I.J., Ma, E., Zhou, M., Robinson, C.V., and Doudna, J.A. 2008. In vitro
reconstitution of the human RISC-loading complex. P Natl Acad Sci Usa 105(2):
512-517.
Macrae, I.J., Zhou, K., Li, F., Repic, A., Brooks, A.N., Cande, W.Z., Adams, P.D., and
Doudna, J.A. 2006. Structural basis for double-stranded RNA processing by
Dicer. Science 311(5758): 195-198.
Maniataki, E. and Mourelatos, Z. 2005. A human, ATP-independent, RISC assembly
machine fueled by pre-miRNA. Genes & Development 19(24): 2979-2990.
Marco, A., Hui, J.H.L., Ronshaugen, M., and Griffiths-Jones, S. 2010. Functional shifts
in insect microRNA evolution. Genome Biol Evol 2: 686-696.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka,
J., Braverman, M.S., Chen, Y.-J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M.,
Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Ho, C.H., Irzyk,
G.P., Jando, S.C., Alenquer, M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J.-B., Knight,
J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L.,
Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W.,
Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis,
G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A.,
Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley,
R.F., and Rothberg, J.M. 2005. Genome sequencing in microfabricated highdensity picolitre reactors. Nature 437(7057): 376-380.
Mineno, J., Okamoto, S., Ando, T., Sato, M., Chono, H., Izu, H., Takayama, M., Asada,
K., Mirochnitchenko, 0., Inouye, M., and Kato, I. 2006. The expression profile of
microRNAs in mouse embryos. Nucleic Acids Res 34(6): 1765-1771.
Morin, R.D., O'Connor, M.D., Griffith, M., Kuchenbauer, F., Delaney, A., Prabhu, A.-L.,
Zhao, Y., McDonald, H., Zeng, T., Hirst, M., Eaves, C.J., and Marra, M.A. 2008.
Application of massively parallel sequencing to microRNA profiling and
discovery in human embryonic stem cells. Genome Res 18(4): 610-621.
Moss, E.G., Lee, R.C., and Ambros, V. 1997. The cold shock domain protein LIN-28
controls developmental timing in C. elegans and is regulated by the lin-4 RNA.
Cell 88(5): 637-646.
Mueller, G.A., Miller, M.T., Derose, E.F., Ghosh, M., London, R.E., and Hall, T.M.T.
2010. Solution structure of the Drosha double-stranded RNA-binding domain.
Silence 1(1): 2.
Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005.
Characterization of Dicer-deficient murine embryonic stem cells. P Natl Acad Sci
Usa 102(34): 12135-12140.
Napoli, C., Lemieux, C., and Jorgensen, R. 1990. Introduction of a Chimeric Chalcone
Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous
Genes in trans. The Plant Cell Online 2(4): 279-289.
Nowotny, M. and Yang, W. 2009. Structural and functional modules in RNA
interference. Curr Opin Struct Biol 19(3): 286-293.
O'Donnell, K.A., Wentzel, E.A., Zeller, K.I., Dang, C.V., and Mendell, J.T. 2005. c-Mycregulated microRNAs modulate E2F1 expression. Nature 435(7043): 839-843.
Okada, C., Yamashita, E., Lee, S.J., Shibata, S., Katahira, J., Nakagawa, A., Yoneda, Y.,
and Tsukihara, T. 2009. A high-resolution structure of the pre-microRNA nuclear
export machinery. Science 326(5957): 1275-1279.
Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. 2007. The mirtron
pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130(1):
89-100.
Orom, U.A., Nielsen, F.C., and Lund, A.H. 2008. MicroRNA-10a binds the 5'UTR of
ribosomal protein mRNAs and enhances their translation. Mol Cell 30(4): 460471.
Paddison, P.J., Caudy, A.A., Sachidanandam, R., and Hannon, G.J. 2004. Short Hairpin
Activated Gene Silencing in Mammalian Cells. RNA Interference, Editing, and
Modification 265: 85-100.
Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B.,
Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., Spring, J., Srinivasan, A.,
Fishman, M., Finnerty, J., Corbo, J., Levine, M., Leahy, P., Davidson, E., and
Ruvkun, G. 2000. Conservation of the sequence and temporal expression of let-7
heterochronic regulatory RNA. Nature 408(6808): 86-89.
Pham, J.W., Pellino, J.L., Lee, Y.S., Carthew, R.W., and Sontheimer, E.J. 2004. A Dicer2-dependent 80s complex cleaves targeted mRNAs during RNAi in Drosophila.
Cell 117(1): 83-94.
Preall, J.B., He, Z., Gorra, J.M., and Sontheimer, E.J. 2006. Short interfering RNA strand
selection is independent of dsRNA processing polarity during RNAi in
Drosophila. Curr Biol 16(5): 530-535.
Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E.,
Horvitz, H.R., and Ruvkun, G. 2000. The 21-nucleotide let-7 RNA regulates
developmental timing in Caenorhabditis elegans. Nature 403(6772): 901-906.
Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. 2004. Identification of
mammalian microRNA host genes and transcription units. Genome Res 14(10A):
1902-1910.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel,
D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs
and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207.
Ruby, J.G., Jan, C.H., and Bartel, D.P. 2007. Intronic microRNA precursors that bypass
Drosha processing. Nature 448(7149): 83-86.
Ruvkun, G., Ambros, V., Coulson, A., Waterston, R., Sulston, J., and Horvitz, H.R. 1989.
Molecular genetics of the Caenorhabditis elegans heterochronic gene lin-14.
Genetics 121(3): 501-516.
Ruvkun, G. and Giusto, J. 1989. The Caenorhabditis elegans heterochronic gene lin-14
encodes a nuclear protein that forms a temporal developmental switch. Nature
338(6213): 313-319.
Schwarz, D.S., Hutvigner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. 2003.
Asymmetry in the assembly of the RNAi enzyme complex. Cell 115(2): 199-208.
Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N.
2008. Widespread changes in protein synthesis induced by microRNAs. Nature
455(7209): 58-63.
Sen, G.C. and Sarkar, S.N. 2007. The Interferon-Stimulated Genes: Targets of Direct
Signaling by Interferons, Double-Stranded RNA, and Viruses. Interferon: The
50th Anniversary 316: 233-250.
Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. 2004. Photocleavable
fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific
coupling chemistry. P Natl Acad Sci Usa 101(15): 5488-5493.
Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., Tuschl, T.,
van Nimwegen, E., and Zavolan, M. 2005. Identification of clustered microRNAs
using an ab initio prediction method. BMC Bioinformatics6: 267.
Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and Ruvkun, G. 2000. The
lin-41 RBCC gene acts in the C. elegans heterochronic pathway between the let-7
regulatory RNA and the LIN-29 transcription factor. Mol Cell 5(4): 659-669.
Sohn, S.Y., Bae, W.J., Kim, J.J., Yeom, K.-H., Kim, V.N., and Cho, Y. 2007. Crystal
structure of human DGCR8 core. Nat Struct Mol Biol 14(9): 847-853.
Su, R.-W., Lei, W., Liu, J.-L., Zhang, Z.-R., Jia, B., Feng, X.-H., Ren, G., Hu, S.-J., and
Yang, Z.-M. 2010. The integrative analysis of microRNA and mRNA expression
in mouse uterus under delayed implantation and activation. PLoS ONE 5(11):
e15513.
Svoboda, P., Stein, P., Hayashi, H., and Schultz, R.M. 2000. Selective reduction of
dormant maternal mRNAs in mouse oocytes by RNA interference. Development
127(19): 4147-4156.
Tam, O.H., Aravin, A.A., Stein, P., Girard, A., Murchison, E.P., Cheloufi, S., Hodges, E.,
Anger, M., Sachidanandam, R., Schultz, R.M., and Hannon, G.J. 2008.
Pseudogene-derived small interfering RNAs regulate gene expression in mouse
oocytes. Nature 453(7194): 534-538.
Tomari, Y., Matranga, C., Haley, B., Martinez, N., and Zamore, P.D. 2004. A protein
sensor for siRNA asymmetry. Science 306(5700): 1377-1380.
van der Krol, A.R., Mur, L.A., Beld, M., Mol, J.N.M., and Stuitje, A.R. 1990. Flavonoid
Genes in Petunia: Addition of a Limited Number of Gene Copies May Lead to a
Suppression of Gene Expression. The Plant Cell Online 2(4): 291-299.
Vasudevan, S., Tong, Y., and Steitz, J.A. 2007. Switching from repression to activation:
microRNAs can up-regulate translation. Science 318(5858): 1931-1934.
Wang, M., Song, X., Han, P., Li, W., and Jiang, B. 2010. New syntax to describe local
continuous structure-sequence information for recognizing new pre-miRNAs. J
Theor Biol 264(2): 578-584.
Watanabe, T., Totoki, Y., Toyoda, A., Kaneda, M., Kuramochi-Miyagawa, S., Obata, Y.,
Chiba, H., Kohara, Y., Kono, T., Nakano, T., Surani, M.A., Sakaki, Y., and
Sasaki, H. 2008. Endogenous siRNAs from naturally formed dsRNAs regulate
transcripts in mouse oocytes. Nature 453(7194): 539-543.
Wightman, B., Burglin, T.R., Gatto, J., Arasu, P., and Ruvkun, G. 1991. Negative
regulatory sequences in the lin- 14 3'-untranslated region are necessary to generate
a temporal switch during Caenorhabditis elegans development. Genes &
Development 5(10): 1813-1824.
Wightman, B., Ha, I., and Ruvkun, G. 1993. Posttranscriptional regulation of the
heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C.
elegans. Cell 75(5): 855-862.
Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S.,
and Kellis, M. 2005. Systematic discovery of regulatory motifs in human
promoters and 3' UTRs by comparison of several mammals. Nature 434(7031):
338-345.
Xue, C., Li, F., He, T., Liu, G.-P., Li, Y., and Zhang, X. 2005. Classification of real and
pseudo microRNA precursors using local structure-sequence features and support
vector machine. BMC Bioinformatics6: 310.
Yekta, S., Shih, I.-H., and Bartel, D.P. 2004. MicroRNA-directed cleavage of HOXB8
mRNA. Science 304(5670): 594-596.
Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. 2003. Exportin-5 mediates the nuclear
export of pre-microRNAs and short hairpin RNAs. Genes & Development 17(24):
3011-3016.
Yoda, M., Kawamata, T., Paroo, Z., Ye, X., Iwasaki, S., Liu, Q., and Tomari, Y. 2010.
ATP-dependent human RISC assembly pathways. Nat Struct Mol Biol 17(1): 1723.
Zamore, P., Tuschl, T., Sharp, P., and Bartel, D. 2000. RNAi: Double-stranded RNA
directs the ATP-dependent cleavage of mRNA at 21 to 23 nucleotide intervals.
Cell 101(1): 25-33.
Zeng, Y. and Cullen, B.R. 2005. Efficient processing of primary microRNA hairpins by
Drosha requires flanking nonstructured RNA sequences. J Biol Chem 280(30):
27595-27603.
Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA
precursors by the nuclear processing enzyme Drosha. Embo J 24(1): 138-148.
Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. 2004. Single
processing center models for human Dicer and bacterial RNase III. Cell 118(1):
57-68.
Zhao, Y., Samal, E., and Srivastava, D. 2005. Serum response factor regulates a musclespecific microRNA that targets Hand2 during cardiogenesis. Nature 436(7048):
214-220.
Figure 1
DGCR8/Drosha
cleavage
Exportin-5
transport
1b
canonical
pri-miRNA
Dicer
cleavage
I
pre-miRNA
RISC
loading
RISC
I0
miRNA:miRNA*
duplex
splicing &
debranching
DGCR8/Drosha
cleavage
RISC
loading
mirtron
RISC
Ago2
cleavage
splicing &
debranching
degrad
of
t
tailed-mirtron
endogenous
shRNA
Pro-rich Arg/Ser-rich
-
RIIID
-
RIIID dsRBD
_--
WW
DExD helicase
dsRBD
PAZ
RIIID
dsRBD dsRBD
-
RIIID
dsRBD
-
Drosha
DGCR8
Dicer
Figure 2
16000
1600
14000
1400 (>
12000
1200 a)
(D
C
10000
(D
1000 o
z
800 E
/0000"r,
/Mwww
--
8000
6000
a)
U)
600 :
AMT
4000
400 E
E
ONOW
2000
200
0
2003
2004
2005
2006
2007
2008
2009
Year
--..
Mouse
....
Human
-
All organisms
2010
0
'
Chapter 2
Method for construction of small-RNA libraries for Illumina high-throughput
sequencing platform
H. Rosaria Chiang1 ,2, Wendy K. Johnston' 2 , Lori Schoenfeld', 2, Shujun Luo 3, and David
P. Bartel1' 2
'Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute
of Technology, Cambridge, MA 02139, USA
3
Illumina Inc., Hayward, CA 94545, USA
2
S.L. provided an early draft of the protocol and information on Illumina sequencing
platform, and H.R.C. performed the experiments and revised the method. W.K.J. and L.S.
further updated the protocol. D.P.B. provided guidance throughout the project.
Abstract
Small non-coding RNAs play an important role in gene regulation. Previous efforts to
clone and sequence small RNAs have led to discoveries of novel classes of small RNAs
or identifications of additional genes and/or properties. Here the protocol for cloning
small RNAs to construct cDNA libraries ("small-RNA libraries") for Illumina sequencing
platform is described. This method can be used for gene discovery and profiling of small
regulatory RNAs such as microRNAs (miRNAs).
Introduction
Many classes of small RNAs play a regulatory role in a wide range of cellular processes,
such as differentiation and transposon silencing. Cloning and sequencing small RNAs have
contributed to better understanding of small RNAs, such as miRNAs and Piwi-interacting
RNAs (piRNAs).
The early sequencing studies aimed to identify additional miRNA genes (LagosQuintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). In order to construct smallRNA libraries enriched for miRNAs with minimal amount of degradation fragments
present, one of these studies adopted a method that took advantage of the molecular
features of miRNAs-5' phosphate and 3' hydroxyl groups (Lau et al. 2001). The
concept behind their cloning protocols was similar-ligations of adaptors to the 3' and 5'
ends of the size-fractionated RNAs followed by reverse-transcription (RT) and
polymerase chain reaction (PCR) amplification. The amplified constructs were
concatemerized, cloned into plasmids, and sequenced by Sanger method.
While many miRNA genes were identified by Sanger sequencing (LagosQuintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001; Lagos-Quintana et al. 2002;
Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Berezikov et al. 2006b; Landgraf et al.
2007), advances in high-throughput sequencing technology have facilitated small RNA
discovery efforts. For example, sequencing studies of small RNAs that associate with Piwi
proteins led to the discovery of piRNAs (Aravin et al. 2006; Girard et al. 2006; Grivna et al.
2006; Lau et al. 2006). Furthermore, the ping-pong biogenesis model of class II piRNAs
and their role in transposon silencing were identified through analyses of sequencing data
(Brennecke et al. 2007). Similarly, sequencing studies led to identification of novel miRNA
genes (Berezikov et al. 2006a; Ruby et al. 2006; Berezikov et al. 2007; Calabrese et al.
2007; Ruby et al. 2007b; Babiarz et al. 2008) as well as a new class of miRNAs known as
mirtrons (Okamura et al. 2007; Ruby et al. 2007a). Therefore, it stands to reason that much
information can be gained through high-throughput sequencing of small RNAs.
Illumina's sequencing-by-synthesis utilizes a reversible terminator-based method
(Seo et al. 2004) to provide 200 million reads of 100 nt per run. Here the method for
construction of small-RNA library based on Lau et al. (2001) is updated for Illumina
sequencing platform.
Method
Overview
The protocol is outlined in Figure 1. The total RNA isolated from a desired sample is
size-fractionated on a gel using radioactively labeled RNA markers. The pre-adenylated
3' adaptor with blocked 3' terminus is ligated to the small RNAs by a RNA ligase mutant
Rnl2(1-249)K227Q (Hafner et al. 2008) in the absence of adenosine triphosphate (ATP)
and by T4 RNA ligase 1. The 3' ligated RNAs are gel-purified, and the 5' adaptor is
ligated using T4 RNA ligase 1. The RNAs with both 5' and 3' adaptors are gel-purified
and reverse-transcribed. The RNAs are base-hydrolyzed, and resulting cDNAs are PCR-
amplified. The PCR products are purified on a formamide gel, which are then sent for
Illumina sequencing.
Protocol
1. Purification of small RNA from total RNA
To isolate small RNAs from larger RNA species, such as mRNA and rRNA, as well as
their larger degradation fragments, the total RNA is size-fractionated on a urea-gel. In
order to visualize the area to cut from the gel, radioactively labeled RNA markers are
spiked into the total RNA. This method is preferable to running a control miRNA or
RNA ladder in another lane because it eliminates the possibility of contamination and
serves as an internal control. Although some of the sequenced reads may correspond to
the RNA markers, they will only represent a minute fraction of the reads, and they can
even be used to normalize the reads across multiple samples if the samples are prepared
with the same amount of RNA markers. Other RNAs with desired sequences and lengths
can be used as markers provided that they do not match the genome from which the total
RNA is sequenced and that they do not affect downstream cloning or sequencing steps.
1.1. Kinase 5' end of RNA markers with 32p y-ATP
Individually kinase the 18-mer and 30-mer RNA markers separately and keep the
markers separate. It is important to use 3P y-ATP with a very high specific activity so
that minimal amounts of RNA markers are spiked into the total RNA. Doing so will
result in smaller fraction of reads reflecting the sequence of the markers.
18-mer marker RNA:
AGCGUGUAGGGAUCCAAA
30-mer marker RNA:
GGCAUUAACGCGGCCGCUCUACAAUAGUGA
Reagent
10 [M RNA marker
lOX PNK buffer
12P y-ATP (6000 ci/mmol, 150 mCi/mL)
dH20
PNK (10 units/L)
Amount
2 [tL
2 pL
2 [tL
13 [tL
1 [tL
e
Incubate reaction for 1 hour at 37 C.
e
(Optional: Before gel-purifying, add 5 [tL H2 0 for total of 25 [tL reaction volume and
spin through a MicroSpin G-25 column (GE Healthcare) to remove excess,
unincorporated ATP.)
-
Gel purification:
o Add 2X urea loading buffer (8M urea, 25mM EDTA, 0.025% (w/v) each
xylene cyanol, bromophenol blue) to each marker. Heat to 80 C for 5 min and
run on a 15% denaturing polyacrylamide gel until bromophenol blue dye is -1
inch from the bottom of the gel.
o
Dismantle gel apparatus and separate plates. Leave the gel on one of the glass
plates. Cover the gel with clear plastic film and visualize it by exposing to
phosphorimager plate for -10 sec. Develop image. Align a printed image of
the gel under the actual gel on the glass plate. Cut out the gel pieces
containing marker bands and put them into 1.5 mL Eppendorf tubes.
(Optional: To assist in aligning gel to picture, use a pipette tip with a small
amount of hot dye to prick gel at several spots. Expose gel to plate and
develop picture, then align the dots of dye in the gel to the dots of signal on
the picture.)
o Elute RNA: Add 450 [tL of 0.3 M NaCl to the gel slices and rotate the tubes at
4'C overnight.
o Precipitate RNA: Remove the supernatant and add 2.5 volumes of cold 100%
ethanol; vortex. (Optional: Add 1[tL of GlycoBlue (Ambion) to help visualize
RNA pellet.) Incubate at -20 C for 30 min. (Alternatively, gel-elute for 4
hours at room temperature then precipitate for 1 hour at -20'C.)
o
Spin samples at high speed for 15 min at 4'C in a microcentrifuge. Carefully
remove all supernatant and resuspend each pellet in 10-30 [tL dH 20.
e
To combine markers: Measure the activity of each marker separately and combine the
two markers so that counts per minute (CPM) of each marker are approximately
equal.
1.2 Purify small RNA from total RNA
e
Add trace but very high specific activity labeled markers to 5-30 [tL total RNA.
For example, use a Ludlum Model 3 Survey Meter to measure -20-60K CPM of
combined marker. To approximate this amount, pipette a small volume of combined
marker into a pipette tip. Hold the tip very close, but not touching, to the face of the
radiation monitor and note the number of counts. Adjust volume as necessary.
e
Gel-purify as above. When cutting bands from the gel, cut the areas containing the
labeled markers and everything in-between. Resuspend precipitated RNA in at least
10 [tL dH20.
2. 3' Adaptor ligation
If the 3' ligation step is performed under standard T4 RNA ligase reaction conditions,
RNA species with 5' phosphate and 3' hydroxyl groups, such as miRNAs, will
circularize rather than ligate to the 3' adaptors. In presence of ATP, a nucleophilic lysine
on the ligase attacks the ATP molecule to form an adenylated ligase intermediate (ApLigase).
Ligase + ATP @ Ap-Ligase + PPi
(1)
The adenylated ligase then transfers the adenylate (Ap) to an RNA molecule with a 5'
phosphate, ideally to the 3' adaptor (p-Adaptor).
Ap-Ligase + p-Adaptor @ Ligase + App-Adaptor
(2)
The ligase then joins the 5' terminus of the adenylated adaptor to the 3' terminus of a
substrate with 3' hydroxyl, releasing adenosine monophosphate (AMP) in the process.
Since the 3' terminus of the 3' adaptor is blocked, the adenylated adaptor can only ligate
to the 3' terminus of small RNAs, like a miRNA (p-miRNA).
Ligase + App-Adaptor + p-miRNA > Ligase + p-miRNA-Adaptor + AMP
(3)
However, if the adenylated ligase transfers the adenylate to a miRNA, for example, the 5'
terminus of the adenylated miRNA can ligate to its own 3' terminus and circularize. The
circularized product will be eliminated during gel-purification or PCR amplification.
To circumvent this problem, two approaches were previously used. One method
dephosphorylated the RNAs using calf intestinal alkaline phosphase (CIP) prior to 3'
ligation and rephosphorylated the ligated products before 5' ligation (Lagos-Quintana et
al. 2001; Lee and Ambros 2001). This method, however, removes the 5' phosphate that
distinguishes miRNAs from degradation fragments. An alternative method utilizes preadenylated 3' adaptors to perform the 3' ligation in absence of ATP (Lau et al. 2001).
Of the two methods, the protocol using pre-adenylated 3' adaptors gained
popularity due to its selective enrichment for miRNAs. However, even the purified
ligases are partially adenylated, and these enzymatic reactions are reversible-i.e. the
ligase can transfer the adenylate from a pre-adenylated adaptor to itself (2). The
adenylated ligase can then transfer the adenylate to a miRNA, which will lead to a
circularized miRNA.
Using Rnl2(1-249) instead of T4 RNA ligase 1 improves the problem (Pfeffer et
al. 2005) as this truncated mutant of T4 RNA ligase 2 has an impaired adenylate transfer
function (Ho et al. 2004). Rnl2(1-249)K227Q was reported to perform even better than
Rnl2(1-249) (Hafner et al. 2008) as K227 was implicated as a residue crucial for
adenylate transfer activity (Ho et al. 2004). Because RNA ligases have different sequence
preferences, one 3' adaptor ligation is performed with Rnl(1-249)K227Q and another
with T4 RNA ligase 1. The two reactions can be combined immediately before or after
the gel-purification.
2.1. Synthesize adenosine 5'-phosphorimidazolide (ImpA) (Lau et al. 2001)
e
Rinse 2 beakers in acetonitrile and air dry.
e
Make two mixtures:
Mixture A:
174 mg AMP (FW 347.2) (0.5 mmol)
15 mL Dimethylformamide
Mixture B:
262 mg Triphenylphosphine (FW 262.3) (1 mmol)
220 mg 2,2'-dipyridyldisulfide (FW 220.3) (1 mmol)
170 mg Imidazole (FW 68.08) (2.5 mmol)
0.90 mL Triethylamine (FW 101.2, d=0.726)
15 mL Dimethylformamide
e
Add Mixture A slowly into Mixture B while stirring until precipitates dissolve.
-
Cover beaker and stir for 1-1.5 hr at room temperature.
-
Make Precipitation Mixture:
1.1 g NaClO 4 (FW 122.4) (9 mmol)
225 mL Acetone
115 mL Anhydrous ethyl ether
e
Add Mixture A+B dropwise to Precipitation Mixture.
e
Remove solvent phase down to -60 mL.
e
Transfer precipitates to 50 mL conical bottom Corex or Teflon centrifuge tubes, rinse
with acetone, centrifuge at 5000 rpm (3000g in ss34 rotor) for 10 min and pour off
acetone. Repeat rinse 3 times.
e
Perform a final rinse with just ether, and spin down for 20 min.
-
Dry overnight in a vacuum vessel between 22.5-45'C.
Store at -20'C.
e
2.2. Adenylate 3' adaptor (Lau et al. 2001)
3' Adaptor:
pTCGTATGCCGTCTTCTGCTTGidT
Reagents
ImpA
MgC12
3' Adaptor
Stock conc.
2M
1.3 mM
e
Incubate at 50'C for 3 hrs.
-
Gel purify on 20% gel.
Amount
9 mg in 420 giL dH 20
7 [L
80 pL
Final conc.
50 mM
25 mM
0.2 mM
2.3. Ligate 3' adaptor to small RNAs
Reaction 1:
Reac tion
Reagent
Purified 18-30nt RNA
100 pM Pre-adenylated 3' adaptor
1OX Ligation Buffer
Amount
2.5 tL
0.5 pL
1 pL
dH 2 0
5.5 pL
Rnl2(1-249)K227Q (6.25 pig/ pL)
0.5 gL
Total reaction volume
10 pL
Reagent
Amount
Final
50 pmol
lx
_
~3 pg
2:
Purified 18-30nt RNA
2.5 pL
100 piM Adenylated 3' Adaptor
lOX Ligation Buffer
0.5 pL
1 gL
dH 2 0
5 pL
T4 RNA Ligase 1 (NEB)(20 U/gL)
1 pL
Total reaction volume
10 pL
Final
50 pmol
lx
20 units
-
Incubate Reaction 1 at 22'C for 30 min; incubate Reaction 2 at 22"C for 2 hours. Stop
reactions by adding 2X urea loading buffer.
e
(Optional: Combine Reactions 1 and 2.)
* Gel-purify on 15% gel as above. Run gel until bromphenol blue dye is close to the
bottom; expose phosphorimager plate for -15-30 min. (Optional: run small amount of
unligated material to track gel-shift.)
e
Resuspend precipitated RNA from combined reactions in 10 ptL dH2 O.
3. 5' Adaptor ligation
The 5' ligation step enriches for RNA species with a 5' phosphate, a hallmark of RNase
III cleavage. The sequence of the 5' adaptor cannot be changed without changing the
sequencing primer as this region of the final construct anneals to the Illumina sequencing
primer.
5' Adaptor
GUUCAGAGUUCUACAGUCCGACGAUC
Reagent
Purified 3' Ligation product
100 pM 5' Adaptor
lOX Ligation Buffer
T4 RNA Ligase 1
4 mM ATP
Amount
5 ptL
4 pLL
1.5 pL
1 pL
1 pL
dH2 0
2.5 pL
Total reaction volume (pL)
15 pL
Final
400 pmol
lx
20 units
4 nmol
Incubate at 22*C for -18 hours. Stop reaction by adding 2X urea loading buffer.
e
Gel-purify on 10% gel. (Optional: Also run a small amount of 3' ligated products.)
Run gel until BB dye just runs out. Expose phosphorimager plate for 2.5 hours to
overnight. Keep gel at -20'C when exposing for long periods of time to minimize
diffusion of RNA.
-
Resuspend precipitated RNA in 10 RL dH 2 0.
4. Reverse-transcription (RT) and base-hydrolysis
4.1. Reverse-transcribe ligated RNAs
RT-primer/5' PCR primer
CAAGCAGAAGACGGCATA
Reagent
Purified ligated RNA
100 uM RT-Primer/5' PCR primer
dH 20
-
Heat to 65C for 10 min, spin down briefly to cool.
e
Add following in order:
o
Amount
5 pL
1 pL
9.6 [tL
6.4 [tL 5X first strand buffer (Invitrogen)
o 7 tL
1OX dNTPs (2 mM)
o 3 [L
100 mM DTT
e
Heat to 48C for 3 min.
e
Remove 3 [tL for a RT-minus control.
-
Add 1 [tL of Superscript II RT (Invitrogen) (200 U/tL) and incubate at 44C for 1
hour.
4.2. Base-hydrolyze RNAs
e
Add 5 pL of 1 M NaOH and incubate for 10 min at 90*C.
-
Neutralize the base hydrolysis reaction with 25 pL of 1 M HEPES pH 7.0 and spin
through Microspin G-25 column to desalt. Recover about 30 pL.
5. Splicing by overlap extension by PCR (SOE-PCR)
While the reverse-transcribed cDNAs will have a length of -70 nts, they need to be
extended to a length of~-92 nts for Illumina sequencing. Illumina determined the optimal
length for cluster size and for bridge-amplification of the final construct on the flow-cell
(pers. communication). The 3' PCR primer has the extender sequence, and its 3' end can
anneal to the 3' end of the cDNA (Figure 1). To extend the cDNA to the final construct
length, three rounds of PCR cycle are performed with the 3' PCR primer. After the
extension, the 5' PCR primer is added for amplification.
Due to its length, the final construct is purified on a formamide gel rather than on
a urea gel to ensure that all double-stranded DNAs have denatured.
RT-primer/5' PCR primer
CAAGCAGAAGACGGCATA
3' PCR primer
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
e
Set up SOE-PCR. (Note: Can use less RT reaction and increase PCR cycles.):
Reagent
RT reaction
RT-minus control
5X PCR Buffer
2 mM dNTP (1Ox)
150 nM 3' PCR primer
Phusion polymerase
dH20
e
e
RT sample
10 ptL
0
20 pL
12.6 pL
2 piL (final 0.3 pmol)
1 pL
53.4 iL
RT-minus sample
0
3 ptL
20 pL
12.6 pL
2 pL (final 0.3 pmol)
1 iL
60.4 uL
Perform 3 cycles of PCR to let small RNA s extend before amplification:
98 0 C
30 sec
94 0 C
30 sec
60 0 C
30 sec
720 C
15 sec
72 0 C
10 min
3 Cycles
To each sample add:
1 pL
25 IM 5' PCR primer
1 pL
25 piM 3' PCR primer
e
Split reaction(s) into 2 x 50.5 pL.
-
Perform 15-18 cycles of PCR:
98 0 C
30 sec
94 0 C
30 sec
60 0 C
30 sec
720 C
15 sec
72 0 C
10 min
15-18 Cycles of PCR
e
Ethanol-precipitate and resuspend in 15 jiL IX formamide loading buffer (95%
formamide, 18mM EDTA, 0.025% (w/v) xylene cylenol, 0.025% (w/v) bromphenol
blue, 0.025% (w/v) SDS).
e
Mix 2 pL l0bp DNA marker (1.0 pg/pL) with 13 tL lx formamide loading buffer.
-
Heat samples and DNA marker for 10 min at 85*C and gel-purify on 90% formamide,
8% acrylamide gel.
-
Stain with SYBR Gold (Invitrogen) (1 ptL/50 mL IX TBE ). Cut and elute 85-105nt
gel piece. RT-minus sample will run at -40-50 bps.
e
Ethanol-precipitate as above, but do not add glycogen during final purification. Speed
vacuum for 30 min to remove leftover formamide.
-
Resuspend in 15 [tL of 10 mM Tris and submit sample for sequencing.
Concluding remarks
Sequencing data from small-RNA libraries constructed using this protocol can be used to
profile small RNAs from a broad range of samples. Variations of this protocol have been
used to make the following small-RNA libraries: C. elegans libraries across
developmental stages (Appendix A); Nematostella vectensis and Amphimedon
queenslandicalibraries (Appendix B); murine heart and muscle libraries (Appendix C);
murine brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole
newborns libraries (Chapter 3); and a human brain library (Appendix D). These datasets
have contributed to understanding of small RNA-ome of these samples. In particular, the
analysis of the data from mouse libraries is presented in the next chapter.
Figure Legend
Figure 1. Flowchart for construction of small-RNA library. The details on each step are
explained in the main text under the corresponding heading.
References
Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N.,
Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., Chien, M.,
Russo, J.J., Ju, J., Sheridan, R., Sander, C., Zavolan, M., and Tuschl, T. 2006. A
novel class of small RNAs bind to MILI protein in mouse testes. Nature
442(7099): 203-207.
Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. 2008. Mouse ES cells
express endogenous shRNAs, siRNAs, and other Microprocessor-independent,
Dicer-dependent small RNAs. Genes & Development 22(20): 2773-2785.
Berezikov, E., Chung, W.-J., Willis, J., Cuppen, E., and Lai, E.C. 2007. Mammalian
mirtron genes. Mol Cell 28(2): 328-336.
Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E.,
and Plasterk, R.H.A. 2006a. Diversity of microRNAs in human and chimpanzee
brain. Nat Genet 38(12): 1375-1377.
Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J.,
Verloop, R., van de Wetering, M., Guryev, V., Takada, S., van Zonneveld, A.J.,
Mano, H., Plasterk, R., and Cuppen, E. 2006b. Many novel mammalian
microRNA candidates identified by extensive cloning and RAKE analysis.
Genome Res 16(10): 1289-1298.
Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R., and
Hannon, G.J. 2007. Discrete small RNA-generating loci as master regulators of
transposon activity in Drosophila. Cell 128(6): 1089-1103.
Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis
defines Dicer's role in mouse embryonic stem cells. P Natl Acad Sci Usa 104(46):
18097-18102.
Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. 2006. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 442(7099):
199-202.
Grivna, S.T., Beyret, E., Wang, Z., and Lin, H. 2006. A novel class of small RNAs in
mouse spermatogenic cells. Genes & Development 20(13): 1709-1714.
Hafner, M., Landgraf, P., Ludwig, J., Rice, A., Ojo, T., Lin, C., Holoch, D., Lim, C., and
Tuschl, T. 2008. Identification of microRNAs and other small regulatory RNAs
using cDNA library sequencing. Methods 44(1): 3-12.
Ho, C.K., Wang, L.K., Lima, C.D., and Shuman, S. 2004. Structure and mechanism of
RNA ligase. Structure 12(2): 327-339.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of
novel genes coding for small expressed RNAs. Science 294(5543): 853-858.
Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. 2003. New
microRNAs from mouse and human. Rna 9(2): 175-179.
Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T.
2002. Identification of tissue-specific microRNAs from mouse. CurrBiol 12(9):
735-739.
Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer, S., Rice,
A., Kamphorst, A.O., Landthaler, M., Lin, C., Socci, N.D., Hermida, L., Fulci, V.,
Chiaretti, S., Foi, R., Schliwka, J., Fuchs, U., Novosel, A., MUller, R.-U.,
Schermer, B., Bissels, U., Inman, J., Phan, Q., Chien, M., Weir, D.B., Choksi, R.,
De Vita, G., Frezzetti, D., Trompeter, H.-I., Hornung, V., Teng, G., Hartmann, G.,
Palkovits, M., Di Lauro, R., Wernet, P., Macino, G., Rogler, C.E., Nagle, J.W.,
Ju, J., Papavasiliou, F.N., Benzing, T., Lichter, P., Tam, W., Brownstein, M.J.,
Bosio, A., Borkhardt, A., Russo, J.J., Sander, C., Zavolan, M., and Tuschl, T.
2007. A mammalian microRNA expression atlas based on small RNA library
sequencing. Cell 129(7): 1401-1414.
Lau, N., Lim, L., Weinstein, E., and Bartel, D. 2001. An abundant class of tiny RNAs
with probable regulatory roles in Caenorhabditis elegans. Science 294(5543): 858862.
Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and
Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes.
Science 313(5785): 363-367.
Lee, R. and Ambros, V. 2001. An extensive class of small RNAs in Caenorhabditis
elegans. Science 294(5543): 862-864.
Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. 2007. The mirtron
pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130(1):
89-100.
Pfeffer, S., Lagos-Quintana, M., and Tuschl, T. 2005. Cloning of small RNA molecules.
Curr ProtocMol Biol Chapter 26: Unit 26.24.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel,
D.P. 2006. Large-scale sequencing reveals 2 1U-RNAs and additional microRNAs
and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207.
Ruby, J.G., Jan, C.H., and Bartel, D.P. 2007a. Intronic microRNA precursors that bypass
Drosha processing. Nature 448(7149): 83-86.
Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. 2007b.
Evolution, biogenesis, expression, and target predictions of a substantially
expanded set of Drosophila microRNAs. Genome Res 17(12): 1850-1864.
Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. 2004. Photocleavable
fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific
coupling chemistry. P NatlAcadSci Usa 101(15): 5488-5493.
Figure 1
small RNA
1. Purification of small RNAs from total RNA
Use radioactive 18mer and 30mer as markers
2. 3' Adaptor Ligation
Ligate pre-adenylated 3' adaptor to small RNA without ATP
Reaction 1: Rnl2(1-249)K227Q
Reaction 2: T4 RNA ligase 1
3. 5' Adaptor Ligation
Ligate 5' adaptor using T4 RNA ligase 1
5' adaptor
4. RT & Base-Hydrolysis
-
m,
-
-
3' PCR primer
-
5. SOE-PCR
-
extender
Final Construct
5' adaptor small RNA
-
Chapter 3
Mammalian microRNAs: Experimental evaluation of novel and previously
annotated genes
H. Rosaria Chiang' 2 , Lori W. Schoenfeld' 2 , J. Graham Ruby1' 2 3 , Vincent C.
Auyeung 1,2,4, Noah Spies1 ,2,Daehyun Baek' 2 , Wendy K. Johnston', 2 , Carsten Russ 5 ,
Shujun Luo6 , Joshua E. Babiarz7 , Robert Blelloch 7 , Gary P. Schroth 6, Chad Nusbaum5 ,
David P. Bartell, 2
'Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
Hughes Medical Institute and Department of Biology, Massachusetts Institute
of Technology, Cambridge, MA 02139, USA
3
Current address: Department of Biochemistry and Biophysics, University of California
San Francisco, San Francisco, CA 94158, USA
4Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139,
USA
5
Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
6Illumina,
Inc., Hayward, CA 94545, USA
7
Institute for Regeneration Medicine, Center for Reproductive Sciences, and Department
of Urology, University of California San Francisco, San Francisco, CA 94143, USA
2Howard
H.R.C. performed the computational analysis excluding RNA editing which was
performed by V.C.A., untemplated nucleotide addition which was performed by N.S.,
and effects of miR-223 and miR-155 which was performed by D.B.. L.W.S. performed
the transfections and W.K.J. made the libraries for the overexpression experiments. C.R.,
S.L., G.P.S., and C.N. sequenced some of the mouse libraries. J.E.B. and R.B. supplied
the sequencing data from small RNA library of mouse embryonic stem cells. H.R.C.,
L.W.S., V.C.A., N.S., D.B., and D.P.B wrote the manuscript.
Supplemental Tables 3, 5, 6, and 7 as well as Supplemental Figure 2 are provided as
electronic files on the accompanying CD-ROM. Supplemental Table 3 is best opened
with a web browser.
Published as:
Chiang, H. R., Schoenfeld, L. W., Ruby, J. G., Auyeung, V. C., Spies, N., Baek, D.,
Johnston, W. K., Russ, C., Luo, S., Babiarz, J. E., Blelloch, R., Schroth, G. P., Nusbaum,
C., and Bartel, D. P. (2010) Mammalian microRNAs: experimental evaluation of novel
and previously annotated genes. Genes Dev. 24:992-1009.
Abstract
MicroRNAs (miRNAs) are small regulatory RNAs that derive from distinctive hairpin
transcripts. To learn more about the miRNAs of mammals, we sequenced 60 million
small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic
stages, and whole newborns. Analysis of these sequences confirmed 398 annotated
miRNA genes and identified 108 novel miRNA genes. More than 150 previously
annotated miRNAs and hundreds of candidates failed to yield sequenced RNAs with
miRNA-like features. Ectopically expressing these previously proposed miRNA hairpins
also did not yield small RNAs, whereas ectopically expressing the confirmed and newly
identified hairpins usually did yield small RNAs with the classical miRNA features,
including dependence on the Drosha endonuclease for processing. These experiments,
which suggest that previous estimates of conserved mammalian miRNAs were inflated,
provide a substantially revised list of confidently identified murine miRNAs from which
to infer the general features of mammalian miRNAs. Our analyses also revealed new
aspects of miRNA biogenesis and modification, including tissue-specific strand
preferences, sequential Dicer cleavage of a metazoan pre-miRNA, newly identified
instances of miRNA editing, and evidence for widespread pre-miRNA uridylation
reminiscent of miRNA regulation by Lin28.
Introduction
MicroRNAs (miRNAs) are endogenous -22-nucleotide (nt) RNAs that posttranscriptionally regulate gene expression (Bartel 2004). MicroRNAs mature through
three intermediates: a primary miRNA transcript (pri-miRNA), a precursor miRNA (pre-
miRNA), and a miRNA:miRNA* duplex. RNA Polymerase II transcribes the primiRNA, which contains one or more segments that fold into an imperfect hairpin. For
canonical metazoan miRNAs, the RNase III enzyme Drosha together with its partner, the
RNA-binding protein DGCR8, recognize the hairpin, and Drosha cleaves both strands
-1 1 base pairs from the base of the stem (Han et al. 2006). The cut leaves a 5' phosphate
and 2-nt 3' overhang (Lee et al. 2003). The liberated pre-miRNA hairpin is then exported
to the cytoplasm by Exportin-5 (Yi et al. 2003; Lund et al. 2004). There, the RNase III
enzyme Dicer cleaves off the loop of the pre-miRNA, -22 nt from the Drosha cut (Lee et
al. 2003), again leaving a 5' monophosphate and 2-nt 3' overhang. The resulting
miRNA:miRNA* duplex, comprised of -22-nt strands from each arm of the original
hairpin, then associates with an Argonaute protein such that the miRNA strand is usually
the one that becomes stably incorporated while the miRNA* strand dissociates and is
degraded.
In addition to canonical miRNAs, some miRNAs mature through pathways that
bypass Drosha/DGCR8 recognition and cleavage. Members of the mirtron subclass of
pre-miRNAs are excised as intron lariats from the pri-miRNA by the spliceosome, and
following debranching, fold into Dicer substrates (Okamura et al. 2007; Ruby et al.
2007a). For some mirtrons, known as tailed mirtrons, a longer intron is excised such that
only one end of the pre-miRNA is generated by the spliceosome, whereas the other end
of the pre-miRNA matures through the Drosha-independent trimming of a 5' or 3' tail
(Ruby et al. 2007a; Babiarz et al. 2008). Members of another subclass of pre-miRNAs,
called endogenous short-hairpin RNAs (shRNAs), are suitable Dicer substrates without
preprocessing by either Drosha or the spliceosome (Babiarz et al. 2008). Other small
silencing RNAs are generated from the sequential processing of long hairpins or long
bimolecular duplexes. These small RNAs are classified as endogenous small interfering
RNAs (siRNAs) rather than miRNAs because they derive from extended duplexes that
produce many different small RNA species, whereas miRNAs derive from distinctive
hairpins that produce one or two dominant species (Bartel 2004).
The first indication of the abundance of miRNA genes came from sequencing
small RNAs from mammals, flies and worms (Lagos-Quintana et al. 2001; Lau et al.
2001; Lee and Ambros 2001). Hundreds of mammalian miRNAs have been identified by
Sanger sequencing of cloned small-RNA-derived cDNAs (Lagos-Quintana et al. 2001;
Lagos-Quintana et al. 2002; Houbaviy et al. 2003; Lagos-Quintana et al. 2003; Berezikov
et al. 2006b; Landgraf et al. 2007). Some miRNAs, however, are expressed only in a
limited number of cells or through a limited portion of development, and their rarity
makes them difficult to detect. Computational methods have been used to identify
mammalian miRNAs initially missed by sequencing, and some of these predicted
miRNAs have been evaluated experimentally-e.g., by rapid amplification of cDNA
ends (RACE) (Lim et al. 2003; Xie et al. 2005), hybridization to RNA blots (Berezikov et
al. 2005), microarrays (Bentwich et al. 2005), and RNA-primed array-based Klenow
extension (RAKE) (Berezikov et al. 2006b). Each of these experimental methods,
however, can yield false positives. Indeed, recent work in invertebrates and plants
(Rajagopalan et al. 2006; Ruby et al. 2006; Ruby et al. 2007b) has shown that the fraction
of erroneously annotated miRNAs can be quite high, depending on the quality of the
initial computational predictions. Even when miRNA genes are predicted correctly, the
resolution of the prediction is often insufficient to confidently determine the precise 5'
end of the mature miRNA. Because miRNAs repress target mRNAs by pairing to the
seed sequence, which is defined relative to the position of the miRNA 5' end, singlenucleotide resolution of 5'-end annotations is required for useful downstream analysis of
their physiological consequences (Bartel 2009).
Another approach for finding miRNAs and other small RNAs missed in the early
sequencing efforts is high-throughput sequencing (Lu et al. 2005). In mammals, highthroughput sequencing methods that have contributed to miRNA discovery efforts have
included massively parallel signature sequencing (MPSS) (Mineno et al. 2006), miRNA
serial analysis of gene expression (miRAGE) (Cummins et al. 2006), 454 pyrosequencing
(Berezikov et al. 2006a; Berezikov et al. 2007; Calabrese et al. 2007) and Illumina
sequencing (Babiarz et al. 2008; Kuchenbauer et al. 2008).
Here we use the Illumina sequencing-by-synthesis platform (Seo et al. 2004) for
miRNA discovery in mouse. Analyses of these reads, combined with experimental
evaluation of newly identified miRNAs as well as previous annotations, has led us to
substantially revise the set of confidently identified murine miRNAs, thereby providing a
more accurate picture of the general features of mammalian miRNAs and their abundance
in the genome. In addition, our results revealed new aspects of miRNA biogenesis and
modification, including tissue-specific strand preferences, sequential Dicer cleavage of a
metazoan pre-miRNA, rare instances of 5' heterogeneity, newly identified instances of
miRNA editing, and widespread pre-miRNA uridylation reminiscent of Lin28-like
miRNA regulation.
Results
We sequenced small-RNA libraries from three mouse tissues, brain, ovary, and testes, as
well as embryonic days 7.5 (e7.5), 9.5 (e9.5), 12.5 (e12.5) and newborn. Combining
these data with data collected similarly from mouse embryonic stem (ES) cells (Babiarz
et al. 2008) yielded 28.7 million reads between 16 and 27 nt in length that perfectly
matched the mouse genome assembly (Supplemental Table 1). Of these reads, 79.3%
mapped to miRNA hairpins, and 7.1% mapped to other annotated noncoding-RNA genes
(Supplemental Table 2). Because the sequencing protocol was selective for RNAs with
5' monophosphate and 3' hydroxyl groups, this dominance of miRNA species was
expected (Lau et al. 2001).
MicroRNA gene discovery
As when analyzing high-throughput data from invertebrates (Ruby et al. 2006; Ruby et
al. 2007b; Grimson et al. 2008), we identified miRNA genes in mouse by applying the
following criteria: 1) expression of the candidate miRNA, with a relatively uniform 5'
terminus, 2) pairing characteristics of the predicted hairpin, 3) absence of annotation
suggesting non-miRNA biogenesis, 4) absence of proximal reads suggesting that the
candidate is a degradation intermediate, and 5) presence of reads corresponding to a
miRNA* species with potential to pair to the miRNA candidate with -2-nt 3' overhangs.
Using a low-stringency genomic search strategy that considered the first four criteria, 736
miRNA candidates were identified from the total dataset of mouse reads. Manual
inspection of these candidates, focusing on all five criteria, narrowed the list to 465
canonical miRNA genes, 377 of which were already annotated in miRBase v.14.0
(Griffiths-Jones 2004) and 88 of which were novel (Fig. 1A; Supplemental Fig. 51;
Supplemental Table 3). We also found 14 mirtrons (including ten tailed mirtrons), four of
which were already annotated, and 16 endogenous shRNAs, six of which were previously
annotated (Figure IB). When added to the 88 novel canonical miRNA genes, the newly
identified mirtons and shRNAs raised the total number of novel genes to 108.
Of these 108 genes, 36 appeared to be close paralogs of previously annotated
miRNA genes (most of which were paralogs of mir-466, mir-467, or mir-669), producing
miRNA reads that were identical to the previously annotated miRNAs, creating
ambiguity as to which loci contributed to the sequenced reads. Most of these close
paralogs (35/36) as well as 14 other novel loci were clustered with annotated miRNAs.
The 72 novel genes with reads distinguishable from those of previously identified genes
were expressed at a lower levels than the previously annotated genes (median read
counts, 27 and 8206, respectively), and compared to previously annotated miRNAs, a
higher fraction of these novel miRNAs were located within introns of annotated [RefSeq
(Pruitt et al. 2005)] mRNAs (47% and 26%, respectively).
Experimental evaluation of unconfirmed miRNAs
Of 564 miRBase-annotated miRNA genes that map to mm8 genome assembly, 157
annotated miRNAs did not pass the filters for miRNA candidates (Fig IA, B;
Supplemental Fig. SI; Supplemental Table 4). Of these 157, 26 mapped to annotated
rRNA and tRNA loci, 52 had no reads mapping to them, and another 72 had some reads
but in numbers deemed insufficient for confident annotation. The remaining seven either
had reads with very heterogeneous 5' ends, which suggested non-specific degradation of a
non-pri-miRNA transcript (mir-464, mir-1937a, and mir-1937b), had many reads that
mapped well into the loop of the putative hairpin, which were inconsistent with Dicer
processing (mir-451, mir-469, mir-805), or did not give a predicted fold with the requisite
pairing involving the candidate and predicted miRNA* (mir-484) (Supplemental Fig. S2).
For five of these seven, we have no reason to suspect that they might be authentic
miRNA genes. Among the remaining two, mir-484 might be regarded as a miRNA
candidate because manual refolding was able to generate a hairpin with the requisite
pairing, but even so, this candidate lacked reads for the predicted miRNA*. miR-451 is a
noncanonical miRNA generated from an unusual hairpin without production of a
miRNA:miRNA* duplex (S. Cheloufi and G. Hannon, personal communication). We do
not suspect that any other annotated miRNA genes failed to pass our filters for the same
reason as mir-451.
An additional 20 annotated miRNA hairpins were in our set of candidates but
failed the manual inspection because they lacked predicted miRNA* reads even after
allowing for alternate hairpin structures. Hundreds of candidates from other miRNA
discovery efforts (Xie et al. 2005; Berezikov et al. 2006b) also failed to pass the filters,
usually because no reads mapped to them.
One of the annotated miRNA genes missing from our datasets was mir-220,
which had been predicted computationally using MiRscan as a miRNA gene candidate
conserved in human, mouse and fish, and was supported experimentally using RACE
analysis of zebrafish small RNAs (Lim et al. 2003). In contrast, the other 37 miRNAs
newly annotated by Lim et al. (2003) were among our 387 confirmed miRNAs. The
absence of mir-220 in our datasets might have reflected either very low expression in the
sequenced samples or inaccuracy of its annotation. Similarly, mir-207, annotated in a
contemporaneous study that cloned novel miRNAs from mouse tissues, was missing from
our dataset, but another 27 miRNAs annotated from that study were confirmed (LagosQuintana et al. 2003).
To evaluate whether the missing annotated miRNAs and candidates represented
authentic miRNAs, we developed a moderate-throughput assay to examine if their
respective hairpins can be processed as miRNAs in cultured cells (Fig. 2A). If these
putative miRNAs were missing from our datasets because they were not expressed in the
sequenced tissues or stages, we reasoned that they would probably be detected in cells
ectopically expressing their respective hairpins, because most authentic miRNAs are
correctly processed from heterologous transcripts that include the full hairpin flanked by
~100 nucleotides of genomic sequence on each side of the hairpin (Chen et al. 2004;
Voorhoeve et al. 2006). Alternatively, if these putative miRNAs were missing because
they were not authentic miRNAs and therefore lacked the features needed for Drosha and
Dicer processing, they would not be sequenced from cells ectopically expressing their
hairpins. To evaluate many hairpins simultaneously, we transfected pools of hairpinexpressing constructs into HEK293T cells and isolated small RNAs for high-throughput
sequencing.
The performance of 26 positive controls, chosen from canonical human/mouse
miRNAs confirmed by our sequencing from mouse, illustrated the value of the assay.
For all but one of these controls, miRNA and miRNA* reads were more abundant in the
cells ectopically expressing the hairpin than in the cells without the hairpin constructs
(Fig. 2B-D; Supplemental Figs. S3, S4). For example, both hsa-miR-193b and mmumiR-137 (from human and mouse, respectively) were >10 fold over-expressed (Fig. 2B).
The positive controls included genes of tissue-specific miRNAs, including mir-122
(liver), mir-133 (muscle), mir-223 (neutrophil) and several neuron-specific miRNAs,
with the idea that hairpins of tissue-specific miRNAs might require tissue-specific factors
for their processing and therefore might be sensitive to the potential absence of such
factors in HEK293T cells. Differences were observed, ranging from -100 to 10,000
reads above the control transfection (Fig. 2C; hsa-mir-214 and hsa-mir-9-1, respectively),
consistent with the idea that factors absent in HEK293T cells might play a role in
processing of some miRNAs. Alternatively, some miRNA hairpins might be processed
less efficiently in all cell types, perhaps because our vectors might not present the
hairpins in an optimal context for processing. Perhaps hsa-mir-192, the control gene that
did not over-express in our assay lacked crucial processing determinants needed in all
cells. In either scenario, the very high sensitivity of high-throughput sequencing enabled
miRNAs to be observed from most of the less efficiently processed hairpins.
From the 52 annotated mouse miRNAs that our study did not sequence, 17
miRNAs, including mir-220 and mir-207, were tested in the ectopic-expression assay.
One, mir-698, generated a single read corresponding to the annotated miRNA, and the
rest failed to generate any reads representing the annotated miRNA (Fig. 2D). From the
72 annotated miRNAs that we could not identify due to insufficient number of reads, 28
were tested, and only four of these were found to be over-expressed (Fig. 2D). The
difficulty in over-expressing a canonical control miRNA (hsa-miR-192) illustrates that
our ectopic-expression assay cannot be used to prove conclusively that a particular
hairpin does not represent an authentic miRNA gene. However, the inability to overexpress every one of the 17 unsequenced miRNAs as well as most of the 28 insufficiently
sequenced miRNAs strongly indicated that, overall, these annotations have been faulty
and that our failure to detect previously annotated miRNAs in mouse samples was not
merely due to inadequate sequencing coverage.
We also tested ten of the 20 annotated miRNA genes that we had identified as
candidates but did not confidently classify as miRNA genes because the predicted
miRNA* species was not sequenced. Four of seven genes without a miRNA* read and
one of three genes with substantially offset miRNA* reads produced the predicted
miRNA* species in our ectopic-expression assay (Fig. 2D). mir-184 and mir-489, which
both tested positive in this assay, are conserved. mir-184 is conserved throughout
mammals, and mir-489 is conserved to chicken, although the miRNA seed, which is
highly conserved in mammals and chicken, differs in mouse and rat. Thus, these two
genes, as well as mir-875, which is a broadly conserved gene without a miRNA* read,
were added to our set of confidently identified miRNA genes. Also added were mir-290,
mir-291a, mir-291b, mir-292, mir-293, mir-294, and mir-295, which were missing in the
genome assembly (mm8) used in our analysis because they fall in the region of the
genome that is difficult to assemble. Including these 10 genes, plus mir-451, brings the
total number of confidently identified miRNA genes to 506.
Our sets of confirmed and novel murine miRNAs also provided the opportunity to
evaluate results of other computational efforts to find miRNAs conserved among
mammals. One set of studies predicted miRNAs based on phylogenetic conservation and
then tested these and additional murine-specific hairpins using RAKE and cloning
(Berezikov et al. 2005; Berezikov et al. 2006b). Among the 322 candidates supported by
these experiments, 11 were in our sets of miRNAs (two in our confirmed set and nine in
our novel set), and another nine did not satisfy our annotation criteria but had at least one
read consistent with the predictions. Another study started with MiRscan predictions
conserved in four mammals and filtered these predictions for potential seed pairing to
conserved motifs in 3' UTRs (Xie et al. 2005). Of their 144 final candidates, 45 were
paralogs of miRNAs already published at the time of prediction. Of the remaining 99
candidates, 27 were in our sets of miRNAs (26 in our confirmed set and one in our novel
set), and one did not satisfy our annotation criteria but had three reads consistent with the
miRNA* of the predicted miRNA. However, only four of the 27 confirmed miRNA
genes (4% of the 99 novel predictions) gave rise to the mature miRNA with the predicted
seed, suggesting that filtering MiRscan predictions for potential seed pairing provided
little, if any, added benefit. This conclusion concurs with a recent analysis of miRNA
targeting: miRNAs that are not conserved beyond mammals do not have enough
preferentially conserved sites to place these sites as among the most conserved UTR
motifs (Friedman et al. 2009). Therefore, it stands to reason that preferentially conserved
UTR motifs would provide little value for predicting such miRNAs.
To investigate whether the computational candidates might have been missed
because of low expression in tissues and stages from which we sequenced, we included
representatives from each study in our ectopic-expression assay. We randomly selected
12 Xie et al. candidates and eight Berezikov et al. 2006 candidates that our study did not
sequence, as well as four human candidates from the Berezikov et al. 2005 set whose
mouse orthologs were not sequenced. None generated reads representing the candidate
miRNAs (Fig. 2C, D). Taken together, our results raise new questions regarding the
authenticity of these candidates and suggest that previous extrapolation from these
candidates, which had suggested that mammals have a surprisingly high number of
conserved miRNA genes (as many as 1,000) (Berezikov et al. 2005) should be revised
accordingly.
Experimental evaluation of novel miRNAs and new candidates
We also used the ectopic-expression assay to evaluate novel miRNAs identified from our
sequencing. Of the 25 evaluated hairpins, 18 (72%) generated a significant number of
miRNA-like reads in HEK293T cells, indicating that most, although perhaps not all, of
our 108 novel annotations represented authentic miRNAs (Fig. 3; Supplemental Figs. S5,
S6). These 25 hairpins were arbitrarily selected for evaluation, except for a preference
for rare miRNAs, i.e., those that had less than ten mature miRNA reads. The rare
miRNAs and the higher-abundance miRNAs performed similarly (5/7 and 11/14
positives, respectively).
To evaluate Drosha- and Dicer-dependence of the over-expressed hairpins, the
experiment was repeated with and without a plasmid encoding a dominant-negative allele
of either Drosha or Dicer (Han et al. 2009) (Fig. 3A). All but two canonical miRNA
controls and most of the novel canonical miRNAs (16/17) responded to TNdrosha coexpression (Fig. 3B; Supplemental Fig. S7). Fewer responded to TNdicer, suggesting
that this construct was less disruptive of normal miRNA processing (Supplemental Fig.
S7).
The tested hairpins included several noncanonical miRNA precursors. The level
of mmu-miR-1224, an annotated mirtronic miRNA (Berezikov et al. 2007), increased in
presence of TNdrosha, as expected if this pre-miRNA had more access to Exportin-5 and
Dicer when the canonical pre-miRNAs were reduced (Grimm et al. 2006). Although
mmu-miR-1839, an annotated shRNA (Babiarz et al. 2008), did not over-express, mmumiR-344e and mmu-miR-344f, novel shRNAs, did over-express from our vector, and as
expected for shRNAs, their biogenesis was Drosha-independent (Fig. 3B; Supplemental
Figs. S5-7). Repeating the ectopic-expression assay in Dicer-knockout and control cells
confirmed that mmu-miR-344e biogenesis was Dicer-dependent (data not shown).
We also evaluated our candidates that had not satisfied our criteria for confident
annotation as miRNAs, usually because they lacked reads representing the predicted
miRNA*. We tested three sets of these candidates. One set represented our candidates
that lacked predicted miRNA* reads yet based on small-RNA sequencing results from
wild-type and mutant ES cells (Babiarz et al. 2008) appeared DGCR8- and Dicerdependent. Another set represented candidates that appeared conserved in syntenic
regions of other mammalian genomes, and the third set was selected at random from
among the remaining candidates. All but one of the 28 tested candidates failed to
generate miRNA-like reads, and the processing of the candidate that did generate
miRNA-like reads in HEK293T cells was not dependent on Dicer, based on its presence
in Dicer-knockout ES cells (Babiarz et al. 2008).
The results evaluating the novel miRNAs and candidates illustrated the
importance of requiring a convincing miRNA* read as a criterion for confident miRNA
annotation. Five previously annotated miRNAs that were initially rejected due to lack of
a convincing miRNA* read had tested positive in our over-expression assay (Fig. 2D),
which indicated that this criterion was too stringent for some of the previously annotated
genes. However, the results for the newly identified miRNAs and candidates showed that
the presence of a convincing miRNA* read was the primary criterion that distinguished
the novel canonical miRNAs (most of which tested positive) from the remaining
candidates (nearly all of which tested negative). By requiring a convincing miRNA* read
in addition to the other four annotation criteria, our approach accurately distinguished
miRNA reads from the millions of other small-RNA reads generated by high-throughput
sequencing, with relatively few false positives among the novel annotations and few false
negatives among the rejected candidates.
MicroRNA expressionprofiles
To compare expression levels of each miRNA in different sequenced samples, we
constructed relative miRNA expression profiles (Fig. 4; Supplemental Table 5), and to
compare the relative expression of various miRNAs with each other, we generated a table
of overall miRNA abundance (Supplemental Table 5). Most miRNAs had substantially
stronger expression in some tissues or stages than in others, in agreement with previous
observations (Wienholds et al. 2005). We expect that strong tissue- or stage-specific
expression preferences inferred from our limited sample set will be revised as more
tissues and stages are surveyed.
Generalfeatures of mammalian miRNAs
Our analyses of high-throughput sequencing data and subsequent experimental evaluation
reshaped the set of known murine miRNAs, setting aside 173 questionable annotations
and adding 108 novel miRNA genes to bring the total number of confidently identified
murine genes to 506. A majority (60%) of the 506 genes appeared conserved in other
mammals (Supplemental Fig. SI; Supplemental Table 6). However, only 15 of the 108
novel miRNA genes were conserved in other mammals, suggesting that the number of
nonconserved miRNA genes will soon surpass that of conserved ones as high-throughput
sequencing is applied more deeply and more broadly.
Five novel miRNAs (mir-3065, mir-3071, mir-3074-1, mir-3074-2, and mir-3111)
mapped to the antisense strand of previously annotated miRNAs (mir-338, mir-136, mir24-1, mir-24-2, and mir-374, respectively), which when added to the previously
identified mir-1-2/mir-1-2-as pair brings the total number of sense/antisense miRNA
pairs to six. In addition, the mir-486 hairpin has a palindromic sequence, which resulted
in the same reads mapping to both the sense (mir-486) and antisense (mir-3107) hairpins.
Analysis of the antisense loci of all 498 miRNA genes identified six additional loci that
gave rise to some antisense reads resembling miRNAs (antisense loci of mir-21, mir-126,
mir-150, mir-337, mir-434, mir-3073). As more high-throughput data is acquired, these
as well as other antisense loci are likely to be annotated as miRNA genes. However, <
0.00002 of our miRNA reads corresponded to miRNAs from antisense loci (excluding
the reads mapping ambiguously to mir-4861mir-3107), raising the possibility that none of
the murine antisense miRNAs have a function comparable to that of miR-iab-as in flies
(Bender 2008; Stark et al. 2008; Tyler et al. 2008).
Our substantially revised set of miRNA genes provided the opportunity to speak
to the general features of 475 canonical miRNAs in mouse, with the properties of the 295
conserved genes applying also to the conserved genes of humans and other mammals
(Table 1). Most canonical miRNA genes (61%) were clustered in the genome, falling
within 50 kb of another miRNA gene, on the same genomic strand. Even when
excluding the four known megaclusters (Calabrese et al. 2007), which are on
chromosomes 2, 12 (two clusters), and X (with 69, 35, 16, and 18 genes, respectively), a
sizable fraction of the remaining genes (153/337) were in clusters of 2-7 genes. As
observed in humans (Baskerville and Bartel 2005), miRNAs from these loci within 50 kb
of each other tended to have correlated expression, consistent with their processing from
polycistronic pri-miRNA transcripts (Supplemental Fig. S8). In a scenario of one
transcript per cluster, the 475 canonical miRNA genes would derive from 245
transcription units. In addition, many miRNA hairpins mapped to introns. Just over a
third (38%) of the hairpins fell within introns of annotated mRNAs. Several lines of
evidence, including coexpression correlations, chromatin marks, and directed
experiments, indicate that miRNAs can be processed from introns (Baskerville and Bartel
2005; Kim and Kim 2007; Marson et al. 2008). In this scenario, as many as 107 (44%) of
the 245 transcription units could double as pre-mRNAs. Other hairpins were found
within transcripts that lacked other annotated functions, falling either within introns or
exons, or in transcripts without evidence of splicing.
MicroRNA hairpins are generally thought to each give rise to a single dominant
mature guide RNA. This was usually the case for the murine miRNAs, although as in
other species this result relied on grouping together as a single functional species all the
isoforms that share the same 5' terminus. This grouping is justified based on the current
understanding of miRNA target recognition, which stipulates that heterogeneity often
observed at miRNA 3' termini should have no effect on miRNA target recognition (Bartel
2009). Most mature miRNA reads (97%) were 20-24 nt in length, with 20mer, 21mer,
22mer, 23mer, and 24mer comprising 5%, 19%, 47%, 21% and 4% of the reads,
respectively (Supplemental Fig. S9). Although a single dominant mature species appears
to be the most frequent outcome of miRNA biogenesis, some miRNA hairpins give rise
to two or more species that each could function to target different sets of mRNAs. This
expanded targeting potential arises from multiple mechanisms, including utilization of
both strands of the miRNA:miRNA* duplex with similar frequency, 5' heterogeneity,
sequential Dicer cleavage, and RNA editing. Addition of untemplated nucleotides to the
3' termini of the miRNAs can also occur, and although not thought to change targeting
specificity, these changes could indicate posttranscriptional regulation of miRNA
stability. Occurrence of each of these phenomena is described below.
MicroRNAs from both arms, with occasional tissue-specific differences in the preferred
arm
Most canonical miRNA genes produced one dominant mature miRNA species, either
from the 5' or from the 3' arm of the pre-miRNA hairpin, with an overall tendency to
derive from the 5' arm (Table 1), as reported for previously annotated human miRNAs
(Hu et al. 2009). Some, however, yielded a similar number of reads from both arms,
suggesting that the two species enter the silencing complex with similar frequencies. For
these genes, mature species from the 5' and 3' arms were annotated using the -5p and -3p
suffixes, as is conventional in such cases (Griffiths-Jones 2004). Discrimination favoring
one arm over the other was less pronounced for both the nonconserved miRNAs and the
less highly expressed miRNAs (Fig. 5A), although for the miRNAs with very few reads
this trend was likely enhanced by our requirement for a miRNA* read. Overall, the
discrimination was high, with the species from the less dominant arm comprising 4.1% of
the reads that map to a miRNA or miRNA*. For the ten most abundant miRNAs
(sampling just the most abundant member in cases of repetitive miRNAs), discrimination
was even higher, with the less dominant arm comprising only 1.3% of the reads.
Nevertheless, the miRNA* species of these more highly expressed miRNAs were
sequenced at a median frequency 13-fold greater than that of the median non-conserved
miRNA, suggesting that a search for biological function for these miRNA* species might
be at least as fruitful as that for the poorly expressed non-conserved miRNAs.
If the mature miRNA accumulated preferentially from one arm of the pre-miRNA
hairpin, the preferred arm generally remained consistent across the various libraries. For
a few miRNAs, however, the preferred arms switched between samples (Fig. 5B), as
reported previously using PCR-based miRNA quantification (Ro et al. 2007). For
example, miR-142-5p was sequenced more frequently in ovary, testes and brain, and
miR-142-3p was sequenced more frequently in embryonic and newborn samples. These
results imply a developmental switch in targeting preferences. A similar arm-switching
phenomena has been reported for a sponge miRNA (Grimson et al. 2008) and was
observed for 20 other non-repetitive mouse miRNA genes (Fig. 5B).
SequentialDicer cleavage of a mirtron hairpin
In plants, a few pri-miRNA hairpins with long, continuous RNA duplexes are cleaved
sequentially by Dicer to generate two adjacent miRNA:miRNA* duplexes (Kurihara and
Watanabe 2004; Rajagopalan et al. 2006). Those precursors bear little resemblance to the
shorter, imperfectly base-paired hairpins of metazoan miRNA genes. In mice, similar
precursors are found in the form of hairpin siRNA (hp-siRNA) precursors, but their
expression appears to be limited to germ-line tissues and totipotent ES cells, which lack a
robust interferon response to intracellular dsRNA (Babiarz et al. 2008; Tam et al. 2008;
Watanabe et al. 2008). However, we detected two miRNA:miRNA* duplexes deriving
from the mmu-mir-3102 pre-miRNA hairpin, an apparent mirtron as evidenced by reads
mapping to both boundaries of an intron (Fig. 5C; Supplemental Table 3). After splicing
and debranching, the excised intron was predicted to fold into a 131-nt pre-miRNA
hairpin-substantially longer than the average pre-miRNA length of 61 nts (calculated
from the set of confirmed miRNAs). Reads from this locus suggested that Dicer cleaved
this pre-miRNA twice, with the first cut generating the outer miRNA:miRNA* duplex
and the second cut generating the inner miRNA:miRNA* duplex (Fig. 5C). The inner
miRNA (miR-3102.2-3p) was among a set of proposed miRNA candidates (Berezikov et
al. 2006b), but the most frequently sequenced species from this hairpin was the outer
miRNA (miR-3102.1, Fig. 5C). Of the five genomes examined, the extended mir-3102
hairpin with both the inner and outer miRNAs appeared conserved only in rat, although
the orthologous loci in cow, dog, and human also could fold into shorter hairpins, with
miR-3102.1 potentially conserved in cow.
We suspect that it is more than a coincidence that the single metazoan example of
a sequentially diced miRNA is initially processed by the spliceosome rather than by
Drosha. One way to explain this observation is that DGCR8/Drosha interacts directly
with the loop of pri-miRNA stem-loops when recognizing its substrates (Zeng et al.
2005) and that the lack of sequentially diced Drosha-dependent miRNA hairpins in
animals reflects the limited reach of this complex.
5' Heterogeneity
Most conserved miRNAs had very precise 5' processing, with alternative 5' isoforms
comprising only 8% of all miRNA reads (Fig. 6A, B). These results, analogous to those
observed in worms and flies (Ruby et al. 2006; Ruby et al. 2007b), are consistent with the
idea that selective pressure to avoid off-targeting acts to optimize precision of the
cleavage event that produces the 5' terminus of the dominant species so as to prevent a
consequential number of molecules with seed sequences in the wrong register.
Moreover, 5' termini of conserved miRNAs were more precise than those of miRNA*
reads (4% and 12% offset reads, respectively, excluding those that produce comparable
numbers of small RNAs from each arm). For cases in which Dicer produced the 5'
terminus of the miRNA, the Dicer cut appeared somewhat more precise than the Drosha
cut (5% offset reads for miRNAs on the 3' arm, compared to 7% offset reads for miRNA*
on the 5' arm), hinting that features of the pre-miRNA structure may supplement the
distance from the Drosha cut as determinants of Dicer cleavage specificity (Ruby et al.
2006; Ruby et al. 2007b).
A few miRNAs had less uniform 5' termini (Fig. 6A, B). For some miRNAs, 5'
heterogeneity has been previously documented (Ruby et al. 2007b; Stark et al. 2007;
Azuma-Mukai et al. 2008; Wu et al. 2009), the most prominent example being hsa-miR124, a conserved neuronal miRNA for which the 5'-shifted isoform was initially
annotated as the miRNA and eventually replaced by the more prominent isoform
following more extensive sequencing (Lagos-Quintana et al. 2002; Landgraf et al. 2007).
Another prominent miRNA with unusually diverse 5' termini was miR-133a. This
conserved miRNA, which is highly expressed in heart and muscle, had a second
dominant isoform (miR-133a.2), which was shifted one nucleotide downstream from the
annotated miRNA (miR-133a.1) (Fig. 6C; Supplemental Table 3). To test whether this
heterogeneity might be explained by differential processing of the two mir-133a
paralogous hairpins, as observed for the two Drosophilamir-2 hairpins (Ruby et al.
2007b), we tested the two mir-133a hairpins in our ectopic-expression assay. Although
mir-133a-1 was somewhat more prone to produce the miR-133a.2 isoform, both hairpins
produced a substantial amount of both isoforms (Fig. 6C).
To investigate the functional consequences of miRNA 5' heterogeneity, we
examined published array data showing the responses of mRNAs after deleting either
mir-223, a miRNA with substantial heterogeneity, or mir-155, a miRNA with little
heterogeneity. miR-223 is highly expressed in neutrophils, and analysis of small-RNA
sequences from isolated neutrophils (Baek et al. 2008) was consistent with our
sequencing results (Supplemental Table 3) in showing 5' heterogeneity, with 81% of the
reads mapping to the 5' end of the major isoform miRNA and 12% mapping to the 5' end
of a second isoform that was shifted by one nucleotide in the 3' direction (Fig. 6D). As
expected, mRNAs with canonical 7-8-mer sites (Bartel 2009) matching the seed of the
major isoform were significantly derepressed in the mir-223 deletion mutant [p < 10-1,
Kolmogorov-Smirnov (K-S) test, comparing to no-site distribution]. mRNAs with
canonical sites matching the minor isoform also showed a significant tendency to be
derepressed, albeit to a lesser degree (Fig. 6D; p = 0.0022, 0.013, and 1.7 X 104, for
8mer, 7mer-m8, and 7-8mers combined, respectively). This result could not be attributed
to the overlap between sites matching the major and minor isoforms because all mRNAs
with a 6mer seed match to the major isoform (ACUGAC) were excluded, and additional
analyses ruled out participation of the "shifted 6mer" match (Friedman et al. 2009) to the
major isoform (AACUGA; Supplemental Figure S10A). Analogous analysis of miR-155
yielded strong evidence for function of the major isoform (Rodriguez et al. 2007) but no
sign of function for the minor isoform, which comprised very few (1%) of our miR- 155
reads (Fig. 6E; Supplemental Table 3).
Taken together, our results show that some miRNAs have alternative 5' miRNA
isoforms that are expressed at levels sufficient to direct the repression of a distinct set of
endogenous targets and thereby broaden the regulatory impact of the miRNA genes.
Therefore, we suggest that rather than choosing one isoform over the other for annotation
as the authentic miRNA, more of these alternative isoforms should be annotated, with the
expectation that for some highly expressed miRNAs, more than one 5' isoform
contributes to miRNA function.
RNA editing
RNA editing in which adenosine is deaminated and thereby converted to inosine (I) has
been reported for some miRNA precursors (Blow et al. 2006; Landgraf et al. 2007;
Kawahara et al. 2008). Because I pairs with C, such edits could change miRNA target
recognition. Reasoning that the mammalian adenosine deaminases (ADARs) responsible
for A-to-I editing are primarily expressed in the brain, we searched for sequencing reads
from brain that did not match the genome and had as their closest match a mature miRNA
or miRNA*. After filtering for mismatches occurring more than 2 nt from the 3' end, a
step taken to avoid considering instances of untemplated 3'-terminal addition, only 4% of
the reads had a single mismatches to the genome (Supplemental Fig. S1 1A). Moreover,
the fraction of sequences with A-to-G changes (indicative of A-to-I editing) was only
0.61%, a fraction resembling that of other mismatches (Supplemental Fig. S lIA). This
fraction was also similar to that of the A-to-G changes in our synthetic internal standards
used for preparing the sequencing libraries. These results indicate that mature edited
miRNAs are very rare and difficult to distinguish above the background level of
sequencing errors. The low frequency of editing in mature miRNAs was consistent with
the findings that edited processed miRNAs are more than fourfold less common in mouse
relative to humans (Landgraf et al. 2007) and are less common than edited miRNA
precursors (Kawahara et al. 2008). The latter observation might be due to rapid
degradation or impaired processing, which has been shown for miR-142 (Yang et al.
2006) and miR-151 (Kawahara et al. 2007a).
Although editing did not appear to be a widespread phenomenon among all
mature miRNAs, editing at specific sites might still be important for a few individual
miRNAs. To investigate this possibility, mismatch fractions were calculated as the
fraction of reads bearing a particular mismatch over all reads covering that genomic
position. For each library, a change was considered significant if the fraction exceeded
5% and at least ten reads contained the mismatch. Additional filters designed to remove
sequencing errors, alignment artifacts and instances of untemplated nucleotide addition
preferentially retained G-to-A changes while removing nearly all other events
(Supplemental Fig. S I1B). Sixteen A-to-G events passed the filters and subsequent
manual examination, all of which occurred only in the brain library (Table 2). Five of
these inferred editing sites were also observed in a low-throughput sequencing effort in
human brain samples (Kawahara et al. 2008), indicating that editing of some miRNAs is
conserved between mammals. Consistent with that study, eight of 16 editing sites
occurred in a UAG motif. A separate examination of read alignments with up to three
mismatches showed that the vast majority of edited reads were edited at one position,
suggesting either that editing of multiple sites in the same RNA molecule is rare, or that
multiply-edited RNAs are more rapidly degraded.
A-to-I editing of a seed nucleotide would dramatically affect targeting. In
addition to editing in the miR-376 cluster described previously (Kawahara et al. 2007b;
Kawahara et al. 2008), we found another eight miRNAs that are edited within the seed of
either the miRNA or the star strand. A-to-I editing could also affect miRNA loading and
thereby indirectly affect targeting. Indeed, the editing of miR-540 might help explain
why the 5' arm is more abundant in the brain than in other tissues, although editing is too
infrequent to fully explain the switch in strand bias. Altering Drosha and Dicer
processing could also indirectly affect targeting. Analysis of 5' ends showed that seven
of 16 instances of editing were associated with a statistically significant (p <0.05) shift in
the 5' nucleotide, presumably due to changes in the Drosha and Dicer cleavage site
(Supplemental Fig. S1 ID).
Untemplatednucleotide addition
Much more prevalent than editing of internal nucleotides was addition of untemplated
nucleotides to miRNA 3' termini. As previously reported for miRNAs in mammals
(Landgraf et al. 2007) and also observed for those of worms and flies (Ruby et al. 2006;
Ruby et al. 2007b), nucleotides most frequently added to murine miRNAs were U and A
(Fig. 7A). Addition of C or G was no higher than background, as estimated by
monitoring apparent addition to tRNA fragments (Fig. 7A). Possible sources of the
background rate could be sequencing error, transcription error, or a low level of
biological nucleotide addition. Some miRNAs were much more frequently extended than
others (Supplemental Table 7). One very frequently extended miRNA was miR-143, for
which the extended reads outnumbered the non-extended ones (196,565 compared to
114,980 reads, respectively).
For extension by U, RNAs from the pre-miRNA 3' arm were three-times more
frequently extended than were those from the 5' arm (Fig. 7A; Fig. 7B, p = 2.3x10-4, KS
test). This preference, not observed for A extension (Fig. 7A, C), suggests that much of
the U extension occurs to the pre-miRNA, prior to Dicer cleavage-a state in which the
3' arm but not the 5' arm would be available for extension (Fig. 7D). TUT4-catalyzed
poly(U) addition to the let-7 pre-miRNA, which is specified by Lin28, plays an important
role in posttranscriptional repression of let-7 expression (Heo et al. 2008; Hagan et al.
2009; Heo et al. 2009). Our analyses indicating untemplated U extension to many other
pre-miRNAs hints that this type of regulation may not limited to let-7 but that at
analogous pathways, presumably using mediators other than Lin28, act to regulate the
expression of other murine miRNAs.
Discussion
The status of miRNA gene discovery in mammals
Our current study sets aside nearly a third (173/564) of the miRBase v. 14.0 gene
annotations for lack of convincing evidence that these produce authentic miRNAs. It also
adds another 108 novel miRNA loci, raising the question of how many more authentic
loci remain undiscovered. This question is difficult to answer. Ever since the recognition
that the poorly conserved miRNAs are also the ones expressed at lower levels in
mammals and thus are the most difficult to detect by both computational and
experimental methods, we have known that it is impossible to provide a meaningful
estimate of the number of mammalian miRNA genes remaining to be discovered (Bartel
2004). The broadly conserved miRNAs are another matter. Only three of the 88 novel
canonical miRNAs had recognizable orthologs sequenced in chicken, lizard, frog, or fish,
and these three were antisense to previously annotated broadly conserved miRNA genes.
Therefore, apart from miRNAs expressed at very low levels from the antisense strand of
known genes, we suspect that the list of broadly conserved miRNA gene families is
nearing completion. The current set of murine miRNA genes includes 192 genes that fall
into 89 broadly conserved miRNA gene families (Supplemental Table 6).
Another 107 miRNA gene families appeared conserved in other mammals
(Supplemental Table 6). These were represented by 120 murine genes, including 14
novel genes that were conserved in other mammals. Of these novel genes, 11 were
founding members of novel conserved gene families. Some of these were identified with
only 11 reads, indicating that additional pan-mammalian gene families remain to be
found, although we have no evidence supporting the idea that the number of conserved
gene families will rise to the very high levels suggested by some earlier computational
studies (Xie et al. 2005; Berezikov et al. 2006b). For now, we can say that mammals
have at least 196 conserved miRNA gene families represented in mouse by at least 312
pre-miRNA hairpins (303 canonical and nine noncanonical hairpins) produced from at
least 194 unique transcription units.
Because a single miRNA hairpin can produce multiple functional isoforms,
generated by either 5' processing heterogeneity or utilization of both arms of the miRNA
duplex, a single conserved hairpin can produce more than one conserved miRNA
isoform. Because the different isoforms have different seed sequences, they fall into
different families of mature miRNAs. Thus, the number of conserved families of
miRNAs (i.e., mature guide RNAs) will exceed the number of conserved families of
genes (i.e., hairpins). Perhaps the best known example of a hairpin with two broadly
conserved isoforms is mir-9, for which conserved miRNAs from both arms of the hairpin
are readily detected by using in situ hybridization in both zebrafish and marine annelids
(Wienholds et al. 2005; Christodoulou et al. 2010). Numerous conserved genes produce
more than one miRNA isoform (Fig. 5A, 6A), but for most of these we do not yet know
whether production of the alternative isoform is conserved in other species. Highthroughput sequencing from other species will help identify many additional conserved
isoforms. We anticipate that the discovery of multiple conserved isoforms will contribute
much more to the future growth in the list of broadly conserved miRNA families than
will the discovery of new conserved genes.
As expected, the conserved miRNAs tended to be expressed at much higher levels
than were the nonconserved ones, with the median read frequency of conserved miRNAs
44-fold greater than that of the nonconserved miRNAs (Fig. 4A, 5B). Therefore, even if
many nonconserved miRNA genes remained to be found, these would add little to the
number of annotated miRNA molecules in a given cell or tissue, and presumably even
less to the impact of miRNAs on gene expression (Bartel 2009). Indeed, even more
pressing than the question of how many poorly conserved miRNAs remain undetected is
the question of whether any of the known poorly conserved miRNAs have any
consequential function in the animal.
Most of these poorly conserved miRNAs could have derived from transcripts that
fortuitously acquired hairpin regions with features needed for some Drosha/Dicer
processing. In this scenario, most of these newly emergent miRNAs will be lost during
the course of evolution before ever acquiring the expression levels needed to have a
targeting function sufficient for their selective retention in the genome. Consistent with
the hypothesis that most of these miRNAs play inconsequential regulatory roles, these
miRNAs generally accumulated to much lower levels in our ectopic-expression assay,
(Fig. 3B, median read frequencies of 58 and 844 for nonconserved and conserved
miRNAs, respectively), and they displayed weaker specificity for one arm of the hairpin
(Fig. 5A), as would be expected if there was no advantage for the cell to efficiently utilize
their respective hairpins. Nonetheless, some were efficiently processed, and at least a
few poorly conserved miRNAs probably have acquired consequential species-specific
functions. Although none have known functions, such hairpins are worthy of annotation
as miRNA loci (just as protein-coding genes can be annotated before the protein is known
to be functional), and as a class these newly emergent miRNAs could provide an
important evolutionary substrate for the emergence of new regulatory activities.
The major challenge for miRNA gene discovery stems from the difficulty in
proving that a nonconserved, poorly expressed candidate is an authentic miRNA,
combined with the even greater difficulty in proving that a questionable candidate is not
an authentic miRNA. This challenge has become all the more acute as miRNA discovery
has reached the point to which nearly all of the novel candidates are both nonconserved
and poorly expressed. Our approach of testing pools of candidates in an ectopicexpression assay provides useful data for evaluating miRNA authenticity. However, our
approach cannot provide conclusive proof for or against the authenticity of a proposed
candidate, leaving open the possibility that some of the nonconserved, poorly expressed
candidates that we classify as "confidently identified miRNAs" are false positives. When
considering the limitations of the current tools for miRNA gene identification, this
possibility cannot be avoided. Therefore, if any nonconserved, poorly expressed
miRNAs are annotated as miRNAs, the resulting list of miRNAs will have to be
somewhat fuzzy, with an expectation that some of the annotated genes will not be
authentic miRNAs. This expectation should not be viewed as advocating the
indiscriminant annotation of all candidates as miRNAs. Our proposal is that miRNA
gene-discovery efforts should annotate as miRNAs only those novel candidates that are
both found in high-thoughput sequencing libraries and pass a set of criteria that is
sufficiently stringent such that a majority of the novel canonical miRNAs are cleanly
processed in a Drosha-dependent manner when using the ectopic-expression assay.
Although implementing this proposal would not prevent all false-positives from entering
the databases, it would preserve a higher quality set of miRNAs while eliminating few
authentic annotations. Those wanting to take additional measures to avoid false-positives
could focus only on the subset of miRNAs that both meet these criteria and are conserved
in other species.
Unknownfeatures requiredfor Drosha/Dicerprocessing.
Before learning the results of our experiments, we wondered whether any ectopically
over-expressed hairpin of suitable length would be processed as if it were a miRNA, a
result that would have rendered our assay too permissive to be of value. In this scenario,
most of the specificity that distinguished authentic miRNA genes from other regions of
the genome with potential to produce transcripts that fold into seemingly miRNA-like
hairpins would have been a function of whether or not the regions were transcribed. This
scenario was not realized, however, and our assay turned out to be informative, which
illustrates how much of Drosha/Dicer substrate recognition still remains unknown. Many
of the previously proposed miRNA hairpins that had no reads in our mouse samples were
indistinguishable from authentic miRNA hairpins with regard to the known determinants
for Drosha/Dicer recognition, yet none of these unconfirmed hairpins produced miRNA
and miRNA* molecules in our very sensitive assay (Fig. 2C, D). These results showing
that major processing specificity determinants still remain undiscovered point to the
importance of finding these determinants--efforts which, if successful, will mark the
next substantive advance in accurately predicting and annotating metazoan miRNAs.
Methods
Librarypreparation
Total RNA samples from mouse ovary, testes, and brain were purchased from Ambion,
and total RNA from mouse e7.5, e9.5, e12.5 and newborn were obtained from the Chess
lab. The small RNA cDNA libraries were made as described (Grimson et al. 2008),
except for the 3' adaptor ligation, which was 5' adenylated
pTCGTATGCCGTCTTCTGCTTGidT. For a detailed protocol, see
http://web.wi.mit.edu/bartel/pub/protocols.html.
MicroRNA discovery
The reads with inserts of 16-27 nt were processed as described (Babiarz et al. 2008). The
miRNA candidates were identified using reads matching genomic regions that were not
very highly repetitive (reads with <500 genomic matches). Reads from all datasets were
combined and grouped by their 5' terminal loci, requiring that each candidate 5' locus
pass five criteria listed in the text. 1) To pass the expression criterion, a candidate
required >10 normalized reads. 2) To address the hairpin requirement, the secondary
structure of the candidate was evaluated by selecting for each 5' terminal locus the most
abundant sequence and extending its 5' end by 2 nt to define the range of one strand of
the potential miRNA/miRNA* duplex. Three genomic windows were extracted with the
5' end extended an additional 10 nt and the 3' end extended either 50, 100, or 150 nt.
Three more windows were extracted extending the 3' end by 10 nt and the 5' end another
50, 100, or 150 nt. The secondary structure of each of the six windows was predicted
using RNAfold (Hofacker et al. 1994), and the number of hairpin base pairs (denoted
using bracket notation) involving the 5'-extended miRNA candidate was calculated as the
absolute value of [(# 5'-facing brackets) minus (# 3'-facing brackets)]. A candidate with a
minimum of 16 base pairs using at least one of the six genomic windows satisfied the
hairpin criteria. 3) The candidates with non-miRNA biogenesis were found by mapping
to annotated non-coding RNA loci (rRNA, tRNA, snRNA, srpRNA). 4) The candidates
likely produced by degradation were defined as those failing the 5' homogeneity
requirement. A candidate satisfied the 5' homogeneity requirement if at least half the
reads within 30 nt of the candidate locus were present within 2 nt of the candidate locus
and if the candidate locus comprised at least half the reads within 2 nt of the candidate
locus or if there was only one other locus within 30 nt of the candidate locus that had
more than half of the reads mapping to the candidate locus. 5) Manual inspection of
reads mapped to predicted secondary structures identified candidates accompanied by
potential miRNA* reads. For ten previously annotated miRNAs and seven novel
miRNAs, a suitable miRNA* read was found only after considering alternative hairpin
folds predicted to be suboptimal using mfold (Mathews et al. 1999; Zuker 2003).
For the analysis of mir-290, mir-291a, mir-291b, mir-292, mir293, mir-294, and
mir-295, which are not present in mm8 genome assembly, we mapped all reads to mm9
genome assembly corresponding to the region (chr7(+): 3218627-3220842).
For conservation analysis, a candidate was considered broadly conserved if the
hairpin structure and the seed sequence were conserved to chicken, fish, frog, or lizard
(galGal3, danRer5, xenTro2, and anoCarl, respectively) in the UCSC whole-genome
alignments (Kuhn et al. 2009). To identify a candidate conserved in mammals, we
looked at 12 additional genomes (bosTau3, canFam2, cavPor2, equCabl, hgl8, loxAfrl,
monDom4, ornAnal, panTro2, ponAbe2, rheMac2, and rn4) and calculated the branchlength score from a phylogenetic tree trained on mouse 3' UTR data (Friedman et al.
2009), using the cutoff score of 0.7. A gene was considered to be in a conserved miRNA
gene family if the hairpin produced a miRNA with a seed matching that of a conserved
miRNA (Supplemental Table 6).
Ectopic over-expression assays
To generate expression constructs, pre-miRNA hairpins and the surrounding regions were
amplified from human genomic DNA (NCI-BL2126) or from mouse BL6 genomic DNA
using Pfu Ultra II polymerase (Stratagene) and primers with Gateway (Invitrogen)compatible ends designed to anneal -100 nt upstream and downstream from the miRNA
hairpins. PCR products were inserted into Gateway vector pDONR221 and subsequently
into pcDNA3.2/V5-DEST, and the resulting plasmids were transformed into DH5-a cells.
Positive clones were selected by colony PCR and sequenced. Clones that did not have a
mutation within pre-miRNA hairpins were selected. Plasmid DNA from the confirmed
expression clones was purified for transfection using the Plasmid Mini Kit (Qiagen). For
each standard assay, plasmids for up to ten hairpin expression constructs were mixed in
equal amounts to create seven or eight pools of -1.4 pg DNA each, with each pool
including 1-3 positive-control hairpins.
HEK293T cells were cultured in DMEM supplemented with 10% FBS and plated
in 12-well plates -24 hours prior to transfection to reach -80-90% confluency. Each
well of cells was transfected with one pool of DNA using Lipofectamine 2000
(Invitrogen). For the standard assays, 145-200 ng of pMaxGFP (Amaxa) was
cotransfected with each pool to enable transfection efficiency to be confirmed by GFP
expression. Control wells (no hairpin plasmid) were transfected only with 145 ng
pMaxGFP. For the Drosha/Dicer-dependency assays, 7-8 hairpin constructs were
combined to create six pools of -400 ng each. Each pool was mixed with 1.2 g of the
pCK-Drosha-FLAG(TN) (TNdrosha), pCK-FLAG-Dicer(TN) (TNdicer), or pCKdsRed.T4 (control vector, constructed by replacing the Drosha coding sequence of
TNdrosha with dsRed coding sequence) and used to transfect one well of HEK293T cells
as above. Control wells were transfected with 1.2 ptg of either TNdrosha, TNdicer, or
control vector. For the dependency assays, each transfection was performed in duplicate
wells. Cells from all assays were harvested 39-48 hours after transfection. Cells from
each treatment were combined, total RNA was extracted using TriReagent (Ambion), and
small-RNA libraries were prepared for Illumina sequencing.
The reads were processed as above, and RNA species were matched to the
transfected hairpins. In the standard assay, reads were normalized by the median of the
30 most frequently sequenced endogenous miRNAs. For assays testing Drosha/Dicerdependency, reads were normalized based on the number of reads corresponding to an
18-nt internal standard that had been spiked into equivalent amounts of total RNA prior
to beginning library preparation. Reads matching the transfected hairpins were grouped
by their 5' termini (5' terminal locus). The locus with the largest number of reads was
considered the 5' terminal locus of the mature miRNA produced by the hairpin, and
similarly, the most dominant 5' locus on the opposite arm was considered the miRNA*.
The normalized miRNA and miRNA* read numbers were summed to calculate the
expression level.
If an overexpressed hairpin generated mature miRNA with the dominant 5'
terminal locus corresponding to the expected locus and at least one read corresponding to
the miRNA* with a -2-nt 3' overhang, it was considered expressed. A hairpin was
classified as over-expressed if there were at least three-fold more reads in the hairpin
transfection than in the control transfection, after adding psuedocounts of five to both. A
hairpin was classified as Drosha- or Dicer-dependent if the knockdown was at least
threefold.
Identification of arm-switchingmiRNAs
To determine the read numbers from the 5' and the 3' arm, reads from each sample were
grouped based on their 5' termini, and the read numbers were tallied for those
corresponding to the miRNA or miRNA* 5' terminus. Only samples with >5 reads on
either arm was considered. The fold enrichment was calculated as the ratio of 5' and 3'
arm reads after adding pseudocounts of one.
RNA editing analysis
Sequencing libraries from individual tissues were combined and mapped to the genome
using the Bowtie alignment tool (Langmead et al. 2009). The alignments were filtered
for sequences that uniquely align to the genome, contain at most one mismatch to the
genome, and have 5' ends that map to within one nucleotide of an annotated miRNA or
miRNA* 5' end. The 12 possible mismatch types were then quantified at each position
covered by the filtered reads. For example, to screen for A-to-G mismatches indicative
of A-to-I editing sites, the editing fraction was calculated as the number of reads
containing an A-to-G mismatch at a particular position, divided by the number of filtered
reads covering that position. Sites were considered editing candidates if the editing
fraction was greater than 5%, had at least ten A-to-G mismatch reads, and did not occur
in the last two nucleotides of the corresponding miRNA or miRNA*. Candidate editing
sites were then manually examined and discarded if an alternative explanation was more
parsimonious. For example, the only non-brain editing candidate mapped to let-7c-1 but
was most likely due to a handful of let-7b reads containing untemplated nucleotide
additions that fortuitously matched the let-7c- 1 locus. Consistent with this explanation,
the putatively edited reads were unusually long and at unusually low abundance.
Candidate editing sites were also checked in the Perlegen SNP database (Frazer et al.
2007) and dbSNP; no editing candidates corresponded to known SNPs.
Untemplatednucleotide analysis
To examine untemplated nucleotide addition, non-genome-mapping reads were filtered
for those that match miRNA or miRNA* sequences but also include a non-genomic
poly(N) at the 3' end. The untemplated nt addition rate was calculated as the ratio of
reads with the untemplated nt to the sum of the reads with and without the untemplated
nt. After excluding miRNAs that map to multiple loci and any miRNAs or miRNA*s
with a genomic T at the position immediately 3' of the annotated sequence, there were
343 miRNA/miRNA* species with untemplated U on the 5' arm and 318 on the 3' arm.
Similarly, there were 287 5' arm species with untemplated A on the 5' arm and 324 on the
3' arm. The background tRNA untemplated U addition rate was calculated similarly. A
two-sided Kolmogorov-Smirnov test was used to assess significant differences in
distributions.
Figure legends
Figure 1. Mouse miRNAs and candidates identified by high-throughput sequencing. (A)
Overlap between previously annotated miRNA hairpins (miRBase v. 14.0; green),
miRNA candidates identified in the current study, and the subset of these candidates that
met our criteria for classification as confidently identified canonical miRNAs (red). (B)
Overlap between previously annotated mirtrons and shRNAs and the mirtrons and
shRNAs supported by our study, colored as in A.
Figure 2. Experimental evaluation of annotated miRNAs and previously proposed
candidates. (A) Schematic of the expression vector transfected into HEK293T cells. (B)
Examples of the standard ectopic-expression assay, transfecting plasmids indicated in the
key. Reads from the control transfection (no hairpin plasmid) were from endogenous
expression in HEK293T cells. (C) Assay results for annotated human miRNAs and
published candidates. Bars are colored as in B; asterisks indicate detectable overexpression (>1 read from both the anticipated miRNA and miRNA*, with miRNA and
miRNA* combined expressed more than threefold over endogenous levels. (D) Assay
results for unconfirmed annotated mouse miRNAs and published candidates. Mouse
controls were selected from miRNAs that were sequenced from our mouse samples. Bars
are colored as in B; detectable overexpression is indicated (asterisks). Shown are the
results compiled from two experiments (Supplemental Figs. S3, S4).
Figure 3. Experimental evaluation of novel miRNAs and candidates. (A) Examples of
assays evaluating Drosha- and Dicer-dependence, transfecting plasmids indicated in the
key. (B) Assay results for control miRNAs, novel miRNAs, and miRNA candidates.
Bars are colored as in A; detectable overexpression (black asterisks), overexpression
attempted but not detected (black minus), detectable Drosha-dependence (orange
asterisks), and Drosha-dependence assayed but not detected (orange minus) are all
indicated. Shown are the results compiled from three experiments (Supplemental Figs.
S5-7).
Figure 4. MicroRNA relative expression profiles. Profiles of mature miRNAs were
constructed as described (Ruby et al. 2007b). The relative contribution of each miRNA
from each sample and the sum of the normalized reads of all samples are provided
(Supplemental Table 5).
Figure 5. Reads from both arms of a hairpin, and sequential reads from the same arm.
(A) Fraction and abundance of miRNA reads from each miRNA hairpin. To calculate the
fraction, the miRNA reads were divided by the total number of miRNA and miRNA*
reads, considering on each arm only the major 5' terminus. The dashed lines indicate the
median fraction of miRNA reads and the median number of miRNA reads for conserved
(red) and nonconserved (blue) miRNAs. (B) Switching of the dominant arm in different
samples. For each sample, the fold enrichment of miRNA reads produced from the 5'
arm over those produced from the 3' arm and vice versa was calculated. Shown are
results for non-repetitive miRNAs that switch dominant arms, with at least a fivefold
differential between two samples. The samples are color-coded (key), and an asterisk
indicates samples with statistically significant enrichment of miRNAs produced from one
arm over the other (p < 0.05, Chi-squared test). (C) Sequential Dicer cleavage. Predicted
secondary structure of mmu-mir-3102 pre-miRNA (Hofacker et al. 1994).
Figure 6. MicroRNAs with 5' heterogeneity. (A) The distribution of conserved (red) and
nonconserved (blue) miRNAs with reads ±5 nt offset at their 5' terminus. (B) The
fraction of offset reads and abundance of reads for each miRNA hairpin, colored as in A.
The dashed lines indicate the median level of reads for conserved (red) and nonconserved
(blue) miRNAs. (C) 5' Heterogeneity of miR-133a. Data from mouse heart (Rao et al.
2009) and newborn are mapped to the mmu-mir-133a-1 hairpin (top), and data from the
ectopic-expression assay are mapped to the indicated transfected hairpin. The lines
indicate miR- 133a. 1 (dark blue) and miR- 133a.2 (light blue), and red nucleotides indicate
those that differ between mmu-mir-133a-1 and mmu-mir-133a-2. (D) Effect of losing
miR-223 on messages with 3'UTR sites for miR-223 major and minor isoforms. SmallRNA sequencing data from mouse neutrophils (Baek et al. 2008) were mapped to the
mir-223 hairpin (top) as in C. For each set of messages with the indicated 3'UTR site for
miR-233 (major isoform sites, bottom left; minor isoform sites, bottom right), the fraction
that changed at least to the degree indicated following loss of miR-223 is plotted, using
data published for neutrophils differentiated in vivo (Baek et al. 2008). (E) Effect of
losing miR-155 on messages with 3'UTR sites for miR-155 major and minor isoforms,
plotted as in D using published data from T cells (Rodriguez et al. 2007). Sequencing
data from our study are mapped to the mir-155 hairpin (top) as in C. The mRNAs with
8mer and 7mer-A 1 sites for the minor isoform were excluded from the analysis because
these sites overlapped with 7mer-m8 sites for the major isoform.
Figure 7. Untemplated nucleotide addition. (A) Untemplated nucleotide addition rate for
miRNA and miRNA* reads from the indicated arm. Rates for each miRNA are provided
(Supplemental Table 6). As a control, tRNA degradation fragments were analyzed
similarly. (B) Distribution of rates for untemplated U addition to RNAs from the 5' arm
(blue) and from the 3' arm (red). (C) Distribution of rates for untemplated A addition to
RNAs from the 5' arm (blue) and from the 3' arm (red). (D) Schematic of the biogenesis
stage in which U could be added to the RNA of only one arm (pre-miRNA, left), and the
stage in which U could be added to the RNA of either arm (mature miRNA and miRNA*,
right).
Acknowledgements
We thank N. Lau and A. Chess for embryonic and newborn total RNA, R. Friedman for
calculating branch-length scores for the analysis of conservation, A. Marson and N.
Hannet for technical advice, and V. N. Kim for TNdrosha and TNdicer plasmids.
Supported by a grant from the NIH (GM06703 1) to D. B.
Accession numbers
All small RNA reads are available at the GEO database with accession number
GSE20384.
References
Azuma-Mukai, A., Oguri, H., Mituyama, T., Qian, Z.R., Asai, K., Siomi, H., and Siomi,
M.C. 2008. Characterization of endogenous human Argonautes and their miRNA
partners in RNA silencing. ProcNatlAcad Sci USA 105(23): 7964 - 7969.
Babiarz, J.E., Ruby, J.G., Wang, Y.M., Bartel, D.P., and Blelloch, R. 2008. Mouse ES
cells express endogenous shRNAs, siRNAs, and other Microprocessorindependent, Dicer-dependent small RNAs. Genes & Development 22(20): 27732785.
Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. 2008. The
impact of microRNAs on protein output. Nature 455(7209): 64-71.
Bartel, D.P. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
-. 2009. MicroRNAs: Target Recognition and Regulatory Functions. Cell 136(2): 215233.
Baskerville, S. and Bartel, D.P. 2005. Microarray profiling of microRNAs reveals
frequent coexpression with neighboring miRNAs and host genes. RNA 11(3):
241-247.
Bender, W. 2008. MicroRNAs in the Drosophila bithorax complex. Genes &
Development 22(1): 14-19.
Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, 0., Barzilai, A.,
Einat, P., Einav, U., Meiri, E., Sharon, E., Spector, Y., and Bentwich, Z. 2005.
Identification of hundreds of conserved and nonconserved human microRNAs.
Nature Genetics 37(7): 766-770.
Berezikov, E., Chung, W.J., Willis, J., Cuppen, E., and Lai, E.C. 2007. Mammalian
mirtron genes. Molecular Cell 28(2): 328-336.
Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H.A., and Cuppen,
E. 2005. Phylogenetic shadowing and computational identification of human
microRNA genes. Cell 120(1): 21-24.
Berezikov, E., Thuemmler, F., van Laake, L.W., Kondova, I., Bontrop, R., Cuppen, E.,
and Plasterk, R.H.A. 2006a. Diversity of microRNAs in human and chimpanzee
brain. Nature Genet 38(12): 1375-1377.
Berezikov, E., van Tetering, G., Verheul, M., van de Belt, J., van Laake, L., Vos, J.,
Verloop, R., van de Wetering, M., Guryev, V., Takada, S., van Zonneveld, A.J.,
Mano, H., Plasterk, R., and Cuppen, E. 2006b. Many novel mammalian
microRNA candidates identified by extensive cloning and RAKE analysis.
Genome Research 16(10): 1289-1298.
Blow, M.J., Grocock, R.J., van Dongen, S., Enright, A.J., Dicks, E., Futreal, P.A.,
Wooster, R., and Stratton, M.R. 2006. RNA editing of human microRNAs.
Genome Biol 7(4): R27.
Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis
defines Dicer's role in mouse embryonic stem cells. Proceedingsof the National
Academy of Sciences of the United States ofAmerica 104(46): 18097-18102.
Chen, C.-Z., Li, L., Lodish, H.F., and Bartel, D.P. 2004. MicroRNAs Modulate
Hematopoietic Lineage Differentiation. Science 303(5654): 83-86.
Christodoulou, F., Raible, F., Tomer, R., Simakov, 0., Trachana, K., Klaus, S., Snyman,
H., Hannon, G.J., Bork, P., and Arendt, D. 2010. Ancient animal microRNAs and
the evolution of tissue identity. Nature 463(7284): 1084-1088.
Cummins, J.M., He, Y.P., Leary, R.J., Pagliarini, R., Diaz, L.A., Sjoblom, T., Barad, 0.,
Bentwich, Z., Szafranska, A.E., Labourier, E., Raymond, C.K., Roberts, B.S.,
Juhl, H., Kinzler, K.W., Vogelstein, B., and Velculescu, V.E. 2006. The
colorectal microRNAome. Proceedingsof the NationalAcademy of Sciences of
the United States ofAmerica 103(10): 3687-3692.
Frazer, K.A., Eskin, E., Kang, H.M., Bogue, M.A., Hinds, D.A., Beilharz, E.J., Gupta,
R.V., Montgomery, J., Morenzoni, M.M., Nilsen, G.B., Pethiyagoda, C.L., Stuve,
L.L., Johnson, F.M., Daly, M.J., Wade, C.M., and Cox, D.R. 2007. A sequence-
100
based variation map of 8.27 million SNPs in inbred mouse strains. Nature
448(7157): 1050-1053.
Friedman, R.C., Farh, K.K.H., Burge, C.B., and Bartel, D.P. 2009. Most mammalian
mRNAs are conserved targets of microRNAs. Genome Res 19(1): 92-105.
Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Research 32: D109DI11.
Grimm, D., Streetz, K.L., Jopling, C.L., Storm, T.A., Pandey, K., Davis, C.R., Marion,
P., Salazar, F., and Kay, M.A. 2006. Fatality in mice due to oversaturation of
cellular microRNA/short hairpin RNA pathways. Nature 441(7092): 537-541.
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N.,
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution
of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193U1115.
Hagan, J.P., Piskounova, E., and Gregory, R.I. 2009. Lin28 recruits the TUTase Zcchc 11
to inhibit let-7 maturation in mouse embryonic stem cells. Nat Struct Mol Biol
16(10): 1021-1025.
Han, J., Pedersen, J.S., Kwon, S.C., Belair, C.D., Kim, Y.-K., Yeom, K.-H., Yang, W.Y., Haussler, D., Blelloch, R., and Kim, V.N. 2009. Posttranscriptional
Crossregulation between Drosha and DGCR8. Cell 136(1): 75-84.
Han, J.J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y.J.,
Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary
microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901.
Heo, I., Joo, C., Cho, J., Ha, M., Han, J.J., and Kim, V.N. 2008. Lin28 Mediates the
Terminal Uridylation of let-7 Precursor MicroRNA. Molecular Cell 32(2): 276284.
Heo, I., Joo, C., Kim, Y.-K., Ha, M., Yoon, M.-J., Cho, J., Yeom, K.-H., Han, J., and
Kim, V.N. 2009. TUT4 in Concert with Lin28 Suppresses MicroRNA Biogenesis
through Pre-MicroRNA Uridylation. Cell 138(4): 696-708.
Hofacker, I.L., Fontana, W., Stadler, P.F., Bonhoeffer, L.S., Tacker, M., and Schuster, P.
1994. FAST FOLDING AND COMPARISON OF RNA SECONDARY
STRUCTURES. Monatshefte Fur Chemie 125(2): 167-188.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Hu, H., Yan, Z., Xu, Y., Hu, H., Menzel, C., Zhou, Y., Chen, W., and Khaitovich, P.
2009. Sequence features associated with microRNA strand selection in humans
and flies. BMC Genomics 10(1): 413.
Kawahara, Y., Megraw, M., Kreider, E., lizasa, H., Valente, L., Hatzigeorgiou, A.G., and
Nishikura, K. 2008. Frequency and fate of microRNA editing in human brain.
Nucleic Acids Res 36(16): 5270-5280.
Kawahara, Y., Zinshteyn, B., Chendrimada, T.P., Shiekhattar, R., and Nishikura, K.
2007a. RNA editing of the microRNA-151 precursor blocks cleavage by the
Dicer-TRBP complex. EMBO Rep 8(8): 763-769.
Kawahara, Y., Zinshteyn, B., Sethupathy, P., lizasa, H., Hatzigeorgiou, A.G., and
Nishikura, K. 2007b. Redirection of silencing targets by adenosine-to-inosine
editing of miRNAs. Science 315(5815): 1137-1140.
101
Kim, Y.-K. and Kim, V.N. 2007. Processing of intronic microRNAs. EMBO J26(3):
775-783.
Kuchenbauer, F., Morin, R.D., Argiropoulos, B., Petriv, 0.1., Griffith, M., Heuser, M.,
Yung, E., Piper, J., Delaney, A., Prabhu, A.L., Zhao, Y.J., McDonald, H., Zeng,
T., Hirst, M., Hansen, C.L., Marra, M.A., and Humphries, R.K. 2008. In-depth
characterization of the microRNA transcriptome in a leukemia progression model.
Genome Res 18(11): 1787-1797.
Kuhn, R.M., Karolchik, D., Zweig, A.S., Wang, T., Smith, K.E., Rosenbloom, K.R.,
Rhead, B., Raney, B.J., Pohl, A., Pheasant, M., Meyer, L., Hsu, F., Hinrichs, A.S.,
Harte, R.A., Giardine, B., Fujita, P., Diekhans, M., Dreszer, T., Clawson, H.,
Barber, G.P., Haussler, D., and Kent, W.J. 2009. The UCSC Genome Browser
Database: update 2009. Nucl Acids Res 37(suppl 1): D755-761.
Kurihara, Y. and Watanabe, Y. 2004. Arabidopsis micro-RNA biogenesis through Dicerlike 1 protein functions. Proceedingsof the NationalAcademy of Sciences of the
United States ofAmerica 101(34): 12753-12758.
Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of
novel genes coding for small expressed RNAs. Science 294(5543): 853-858.
Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. 2003. New
microRNAs from mouse and human. Rna-a Publicationof the Rna Society 9(2):
175-179.
Lagos-Quintana, M., Rauhut, R., Yalcin, A., Meyer, J., Lendeckel, W., and Tuschl, T.
2002. Identification of tissue-specific microRNAs from mouse. CurrentBiology
12(9): 735-739.
Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer, S., Rice,
A., Kamphorst, A.O., Landthaler, M., Lin, C., Socci, N.D., Hermida, L., Fulci, V.,
Chiaretti, S., Foa, R., Schliwka, J., Fuchs, U., Novosel, A., Muller, R.U.,
Schermer, B., Bissels, U., Inman, J., Phan, Q., Chien, M.C., Weir, D.B., Choksi,
R., De Vita, G., Frezzetti, D., Trompeter, H.I., Hornung, V., Teng, G., Hartmann,
G., Palkovits, M., Di Lauro, R., Wernet, P., Macino, G., Rogler, C.E., Nagle,
J.W., Ju, J.Y., Papavasiliou, F.N., Benzing, T., Lichter, P., Tam, W., Brownstein,
M.J., Bosio, A., Borkhardt, A., Russo, J.J., Sander, C., Zavolan, M., and Tuschl,
T. 2007. A mammalian microRNA expression atlas based on small RNA library
sequencing. Cell 129(7): 1401-1414.
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. 2009. Ultrafast and memoryefficient alignment of short DNA sequences to the human genome. Genome
Biology 10(3): R25.
Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. 2001. An abundant class of tiny
RNAs with probable regulatory roles in Caenorhabditis elegans. Science
294(5543): 858-862.
Lee, R.C. and Ambros, V. 2001. An Extensive Class of Small RNAs in Caenorhabditis
elegans. Science 294(5543): 862-864.
Lee, Y., Ahn, C., Han, J.J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, 0.,
Kim, S., and Kim, V.N. 2003. The nuclear RNase III Drosha initiates microRNA
processing. Nature 425(6956): 415-419.
Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. 2003. Vertebrate
MicroRNA genes. Science 299(5612): 1540-1540.
102
Lu, C., Tej, S.S., Luo, S., Haudenschild, C.D., Meyers, B.C., and Green, P.J. 2005.
Elucidation of the Small RNA Component of the Transcriptome. Science
309(5740): 1567-1569.
Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. 2004. Nuclear export
of microRNA precursors. Science 303(5654): 95-98.
Marson, A., Levine, S.S., Cole, M.F., Frampton, G.M., Brambrink, T., Johnstone, S.,
Guenther, M.G., Johnston, W.K., Wernig, M., Newman, J., Calabrese, J.M.,
Dennis, L.M., Volkert, T.L., Gupta, S., Love, J., Hannett, N., Sharp, P.A., Bartel,
D.P., Jaenisch, R., and Young, R.A. 2008. Connecting microRNA Genes to the
Core Transcriptional Regulatory Circuitry of Embryonic Stem Cells. Cell 134(3):
521-533.
Mathews, D.H., Sabina, J., Zuker, M., and Turner, D.H. 1999. Expanded sequence
dependence of thermodynamic parameters improves prediction of RNA secondary
structure. JournalofMolecular Biology 288(5): 911-940.
Mineno, J., Okamoto, S., Ando, T., Sato, M., Chono, H., Izu, H., Takayama, M., Asada,
K., Mirochnitchenko, 0., Inouye, M., and Kato, I. 2006. The expression profile of
microRNAs in mouse embryos. Nucleic Acids Research 34(6): 1765-1771.
Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. 2007. The mirtron
pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130(1):
89-100.
Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2005. NCBI Reference Sequence (RefSeq):
a curated non-redundant sequence database of genomes, transcripts and proteins.
Nucl Acids Res 33(suppl_l): D501-504.
Rajagopalan, R., Vaucheret, H., Trejo, J., and Bartel, D.P. 2006. A diverse and
evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes &
Development 20: 3407-3425.
Rao, P.K., Toyama, Y., Chiang, H.R., Gupta, S., Bauer, M., Medvid, R., Reinhardt, F.,
Liao, R., Krieger, M., Jaenisch, R., Lodish, H.F., and Blelloch, R. 2009. Loss of
Cardiac microRNA-Mediated Regulation Leads to Dilated Cardiomyopathy and
Heart Failure. Circ Res 105(6): 585-594.
Ro, S., Park, C., Young, D., Sanders, K.M., and Yan, W. 2007. Tissue-dependent paired
expression of miRNAs. Nucleic Acids Res 35(17): 5944 - 5953.
Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van
Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., Vetrie, D., Okkenhaug, K.,
Enright, A.J., Dougan, G., Turner, M., and Bradley, A. 2007. Requirement of
bic/microRNA-155 for Normal Immune Function. Science 316(5824): 608-611.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel,
D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs
and endogenous siRNAs in C-elegans. Cell 127(6): 1193-1207.
Ruby, J.G., Jan, C.H., and Bartel, D.P. 2007a. Intronic microRNA precursors that bypass
Drosha processing. Nature 448(7149): 83-U87.
Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. 2007b.
Evolution, biogenesis, expression, and target predictions of a substantially
expanded set of Drosophila microRNAs. Genome Research 17(12): 1850-1864.
Seo, T.S., Bai, X.P., Ruparel, H., Li, Z.M., Turro, N.J., and Ju, J.Y. 2004. Photocleavable
fluorescent nucleotides for DNA sequencing on a chip constructed by site-specific
103
coupling chemistry. Proceedingsof the NationalAcademy of Sciences of the
United States ofAmerica 101(15): 5488-5493.
Stark, A., Bushati, N., Jan, C.H., Kheradpour, P., Hodges, E., Brennecke, J., Bartel, D.P.,
Cohen, S.M., and Kellis, M. 2008. A single Hox locus in Drosophila produces
functional microRNAs from opposite DNA strands. Genes & Development 22(1):
8-13.
Stark, A., Kheradpour, P., Parts, L., Brennecke, J., Hodges, E., Hannon, G.J., and Kellis,
M. 2007. Systematic discovery and characterization of fly microRNAs using 12
Drosophila genomes. Genome Res 17(12): 1865-1879.
Tam, O.H., Aravin, A.A., Stein, P., Girard, A., Murchison, E.P., Cheloufi, S., Hodges, E.,
Anger, M., Sachidanandam, R., Schultz, R.M., and Hannon, G.J. 2008.
Pseudogene-derived small interfering RNAs regulate gene expression in mouse
oocytes. Nature 453(7194): 534-538.
Tyler, D.M., Okamura, K., Chung, W.-J., Hagen, J.W., Berezikov, E., Hannon, G.J., and
Lai, E.C. 2008. Functionally distinct regulatory RNAs generated by bidirectional
transcription and processing of microRNA loci. Genes & Development 22(1): 2636.
Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J.M., Stoop, H., Nagel, R., Liu, Y.P., van Duijse, J., Drost, J., Griekspoor, A., Zlotorynski, E., Yabuta, N., De Vita,
G., Nojima, H., Looijenga, L.H.J., and Agami, R. 2006. A Genetic Screen
Implicates miRNA-372 and miRNA-373 As Oncogenes in Testicular Germ Cell
Tumors. Cell 124(6): 1169-1181.
Watanabe, T., Totoki, Y., Toyoda, A., Kaneda, M., Kuramochi-Miyagawa, S., Obata, Y.,
Chiba, H., Kohara, Y., Kono, T., Nakano, T., Surani, M.A., Sakaki, Y., and
Sasaki, H. 2008. Endogenous siRNAs from naturally formed dsRNAs regulate
transcripts in mouse oocytes. Nature 453(7194): 539-543.
Wienholds, E., Kloosterman, W.P., Miska, E., Alvarez-Saavedra, E., Berezikov, E., de
Bruijn, E., Horvitz, H.R., Kauppinen, S., and Plasterk, R.H.A. 2005. MicroRNA
Expression in Zebrafish Embryonic Development. Science 309(5732): 310-311.
Wu, H., Ye, C., Ramirez, D., and Manjunath, N. 2009. Alternative Processing of Primary
microRNA Transcripts by Drosha Generates 5,A< End Variation of Mature
microRNA. PLoS ONE 4(10): e7566.
Xie, X.H., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander,
E.S., and Kellis, M. 2005. Systematic discovery of regulatory motifs in human
promoters and 3 ' UTRs by comparison of several mammals. Nature 434(7031):
338-345.
Yang, W., Chendrimada, T.P., Wang, Q., Higuchi, M., Seeburg, P.H., Shiekhattar, R.,
and Nishikura, K. 2006. Modulation of microRNA processing and expression
through RNA editing by ADAR deaminases. Nat Struct Mol Biol 13(1): 13-2 1.
Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. 2003. Exportin-5 mediates the nuclear
export of pre-microRNAs and short hairpin RNAs. Genes & Development 17(24):
3011-3016.
Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA
precursors by the nuclear processing enzyme Drosha. EMBO J24(1): 13 8-148.
Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction.
Nucl Acids Res 31(13): 3406-3415.
104
Figure 1
miRNA candidates (706)
Confidently
identified canonical
( miRNAs(465)
miRBase 14.{
Mirtrons (14)
shRNAs (16)
105
Figure 2
-
Genomic DNA'
CMV prmqer\
Paly(A) signal
Cadidate hairpin
Pol111
expression vector
I-m
1
2r1
*
ha-m
I
No0I
hari*ls
amr10.*
he
h
lisa-r 252.
famr-4283
hsa-mir-49hsa-mir88
hsa-mir-1-
-
-
E
hsa-mir-220a
candl41
oE
-ru,4
candl42'
candl81l
cand3ld'
10
102
10'
10'
mmu-mir-122
mmu-mirr433e-l
mmu-mir-137
mmu-mir-138-1
mmu-rnir-139
mmij-ir-I53
mmu-mir-2M8
mmu-mir-216a
mmu-jr-_17
mmu-nmir-223
mmu-mir-224
mmu-mir-375
mmu-mir-105
mmu-mir-207
rnmu-rnir-220
mmu-mir-327
mmu-mir-343
mmu-rnir-453
mmu-mir-8
mmu-nmir-654
mmu-mir-678
mmu-m
jr-S80-3
mmu-rnir-687
mmu-m
Jr-697
mmu-rmr-698
mmu-mjr-717
mmu-mir-719
mmu-mir-761
mmu-mir-882
mmu-mir-599
U, mmu-mr-5591
mmu-rnir-582
mmu-mir-584-1
w
mmu-mir-584-2
E mmu-mir-685
S mmu-rnir-588
mmu-rir-690
S mmu-mir-593
0
mmu-mir-704
C
mmu-mir-705
mmu-mir-707
m
mmu-mir-763
mmu-rmr-1187
E mrnu-rmr-1192
mmu-mir-1894
mm -ir190
mu-mir-14
8
C mmu-mir-1904
mmu-m
jr-I927
mmu-m
jr-I929
mmu-rm
r-1936
mmu-MIr-1937c
mmu-mjr-1940
mmu-mir1959
mmu-mir-1
960
mmu-mir-1955
mmu-mir-1970
mmu-mir-184
la-S
mmu-mnir-29
mmu-mir-466f-4
mmu-mir-489
mmu-mir-1191
mmu-mir-1953
mmu-mir-1969
mnn,-mir-449c
mmu-mir-677
-mmu-m Jr-I944
mmu-mirc-miob-MM
28
mmu-mirc-niob-MAI(67
mmu-mirc-niob-MM
75
mmuj-mjrc-niob-MM
7155
mmu-mi.rc-niob-MAof185
mmu-mirc
-niob-MAf 227
mmu-mirc-niob-MWr290
mmu-rnirc-niob-MAC298
ATR90
MIR103
MIR146
MIR165
MIR170
MIR174
MIR181
MIR192
MIR213
MIR223
MIR237
MI1R252
1C
S1
Reads
---
z
.S
0
10
102
103
Reads
106
10'
oL
10
&*C-caddae
1~~~~~3
DG*8
caddae
deedn
~ ~ ~ ~~
Other~~~~~
Cosre
00
____
NvlRr
nR~
___ ____
*
___
hdidatZ
oe
___
____
oe
____
nR~
___
ocnn
*h~
____
___CA-)
*-n
lMuecnnclcnrl
*ontrol
*
z
+- -0 +
Iz
z
W.o0W.o
IA
Reads
E
0IL
LON
1,&MI
A
U
C6
m
0
MCCL
0
oztro
00
0L
MI
0
006~
U~
0-'00
M
L.jN4Id-I
00
IL~~0
CL
dNU~~~~~~~~~~~C
mU
W
II
064I
C4
01
aA
06I0a
L'
0.
~
-
I, ggMAU
WOAW
-A
11
QOO0~~~
m~
0
C
Al
~
L
UU~
%&a
"'ACLAtUg
424
&LQA
0
~ ~A
L
U
&~
C
&&
.66
a0000000-*0U'
W~@~
A,
~ ~
N
*C*
I
0L
No-,
UOI0~0
jggLC
W"
KW
A-,Cl
U
........................................
.......................
Figure 5
A
100
e-
100
9*
80
*
**
0b
0
0
S
0
0
0
*
0
0*0
0
,O*
1
0
0a
*
00
0
0
0*
a
o
a
o.
o
0'* ?*f
e---mir 126
'-mir-296
P - -mir-219-2
10 - m ir-376b
8 - mir-14
mir-409
i
. lo-36
0
0oconserved"
0o
0
0
*
a
mir-140
.
1
3' Arm
5' Arm
mir-142
mir-154
mir-181c
mir-214
mir-292
mir-296
mir-337
mir-350
mir-384
mir-455
mir-485
mir-493
mir-505
mir-539
mir-540
mir-544
mir-664
mir-673
mir-674
mir-700
mir-1193
10
10m
10
10
miRNA reads
102
10
-mir199a
0
o
500
1
moir-142
0*90l,
0
70
r-
a m
0
*0o
o
-n -r-7
*
.**.w F.*
!t-0 L
.-- - ' '------
64
32
2
4
2
1
Fold enrichment
mES me7.5 .e9.5 me12.5 uNewborn .Brain
16
8
4
8
16
32
Testes mOvary
miR-3102.2-5p
(35 reads)
U U GA
C
G
U
G
GGG
A
CUGG UGG GCAGG AG AGAGCC
GUG GUGGCCA-GGGUG
miR-3102.1*
(1 read)
III
A
G
1111111
11111
liii
111
11111
I
111111
U
A
G
U
U
GACC ACC CGUCC UC UCUCGG
CAC CAUCGGU CCCAC
C A
G
C
GA
UC
C
GU
miR-3102.1
(820 reads)
Am
miR-3102.2-3p
(30 reads)
109
Figure 6
A
14
'U
MConserved
0
Nonconserved
12
60
~
50
100
miR-223
~
60.
' -i2.14
*M
C,*
40
*
z
*
/
*
-
,
ImM -5p
miR-101b
*miR-101a
-4b %
60
.
3
*
30-
40
U
*l
i
'
*
20
* 840
24
*
1
10
103
104
miRNA reads
102
105
106
1
81% miR-223 major isoform, 12% minor isoform
D
CUUC
U
A
N miRN-1
0 00*
e 0
Fraction of offset reads (%)
CUUC
A
IJ~
-- mi 1403
* -miR-16-3p
U
0'0.-000
0
U
-
.- -
2
o Conserved
o Nonconserved
U
C GG
yG
AGAG-----UG-UCAGUUUGU CAAAUACC
OUGUCUCA
UA
CAAGUGU GGCCAUGC
3
CUCGCACUGUACG5'
GGUUGAGUCGAACAGUUUAUGO
CA
GC
CU
GCC
~
C
GC
A G
C U
UA
U
A
G
AUA
U C3
AC
GC
UGCU
CG
U A
CG
(In.
1.00
C
AC
C
A
UA
GUCAGUU
8mer (AACUGACA)
0
t 0.75 7mer-m8 (AACUGAC)
7mer-A1(ACUGAA)
LO
LL
6mer (ACUGAC)
No Site
o
UCAGUUU
8mer (AAACUGAA)
.7mer-m8 (AAACUGA)
7mer-A1 (AACUGAA)
6mer (AACUGA)
No Site
0.50
E
E
S0.25
8
0.25
GC
CU
AU
-0.75
N
AC
-0.5 -0.25 0.0 0.25 0.5
Fold Change (log2)
AUC
GCUU
AU
AU A
A
CUG
AA
GUA
GG
A
&
CA
UA
K
Cr
U
GC
C.
(
I
GC
UA
AGGCUQUA
UG-CUUUAAUGCUAAU
3
UCCQGAACAC GACAAUUACGAUUGU C C AUCCUCAG CAGC
tit] I
I U
A^"
3
UCGU
C G
W
UC)
AU
UCU
E
0.2
0.25-
M
)
UCG
^nn
-0.75
mir-133a-1
transfection
AAUGCUA
0.75 7mer-m8 (UAGCAUU)
7mer-A1 (AGCAUUA)
6mer (AGCAUU)
0.5 No Site
to
CG
GUG AGGGGUU
8mer (UAGCAUUA)
$
UA
gGGCC
5'
UAAUGCU
8mer (AGCAUUAA)
0
7mer-m8 (AGCAUUA)
0.75, 7mer-A1 (GCAUUAA)
6mer (GCAUUA)
No Site
0.50.
5'AU 3'
AC
3
i
-0.5 -0.25 0.0 0.25 0.5
Fold Change (log2)
94% miR-155 major isoform, 1% minor isoform
AUA
GC
UAG
0.0
-0.75
mir-133a-2
transfection
-0.5
-0.25 0.0 0.25
Fold Change (log2)
110
0.75
0.0C
-0.75
-0.5 -0.25 0.0 0.25 0.5
Fold Change (log2)
C
.
...............
::::::::
.................
..
Figure 7
A
5'Arm
3'Arm
tRNA
A
C
7.3%1.2(343)
0.2%±0.1(348)
4.5%±1 .1(318)
0.2%±0.1(318)
0.9%±0.4(186)
0.3%±0.1(186)
G
0.2%±0.0(288)
T
6.5%±1.4(287)
0.2%±0.1(336)
19.9%±5.5(324)
0.5%±0.1(186)
2.8%±0.6(186)
B
1.0-
1.0
0.8
0.80
0
2 0.6
0.6 -
,
5 0.4
E
0.4
-
0.2
0.2
-
0.0
0.0.
E
p =0.30
(KS test)
0.0
Untemplated U addition rate
,3'+ (U)"
vs.
5' arm
3' arm
0.2 0.4
0.6
0.8
1.0
Untemplated A addition rate
5'
3'+ (U),
5'
3' + (U),
C
111
-
Table 1. Properties of Canonical miRNAs
Total Conserved Nonconserved
Hairpins
475
295
180
in clusters
291
163
128
in small clusters
153
129
24
in large clusters
138
34
104
not in clusters
184
132
52
in introns (same strand)
opposite introns
not in introns
180
22
273
77
18
200
103
4
73
with miRNA from 5' arm
202
with miRNA from 3' arm
141
with miRNAs from both arms 132
137
102
56
65
39
76
112
Table 2. Inferred A-to-I Editing Sites in miRNAs
miRNA
Position Fraction edited
miR-219-2-3p
15
0.064
miR-337-3p
10
0.062
miR-376a*
4
0.297
miR-376b-3p
6
0.501
miR-376c
6
0.311
miR-378
16
0.087
miR-379*
5
0.095
miR-381
4
0.125
miR-411-5p
5
0.239
miR-421
14
0.054
miR-467d
3
0.094
miR-497
2
0.104
miR-497*
20
0.699
miR-540*
3
0.080
miR-1251
6
0.431
miR-3099
7
0.209
113
..........
Supplemental Figure 1
Not sequenced (52)
d
Undet
Not enough reads (72)
annotated miRNAs
(157)
L.Failed other filters (33)
-DGCR8
& Dicer-dependent (290 226)
- DGCR8-dependent (2 2)
Confirmed miRNAs (387) Annotated miRNAs (407)
- Dioer-cependent (7,3)
Not strongly dependent (3,3)
miRAcanddate not
confidently confirmed (20)
Total candidates
(736)
N0
- Cannot determine (85, 49)
- DGCR8 & Dicer-dependent (37, 0)
Dicer-dependent (1, 0)
Not strongly dependent (1, 0)
determine (69, 15)
New candidates (329) -Cannot
DGCR8 & Dicer-dependent (45, 0)
-
miRNA candidates (221)
Dicer-dependent (5,0)
Not strongly dependent (42, 8)
-Cannot determine (129, 9)
Supplementary Figure 1. Mouse miRNA candidates identified by Illumina sequencing. MicroRNAs that are
annotated in miRBase v.14.0 are boxed in green. The miRNA hairpin loci were further categorized by DGCR8- and
Dicer-dependency using sequencing data from wild-type an mutant ES cells (Babiarz et al. 2008). The number in
parenthesis is the total number of loci in the category. If followed by another number, the second number is the number of conserved loci. A candidate was considered DGCR8- and Dicer-dependent using criteria of a previous study
(Babiarz et al. 2008), except that predicted hairpin loci replaced the 100-nt windows, with the read cutoffs scaled to
the hairpin lengths.
114
..........
.......
..
....
....
..............................
..
.....
..
........
....
...
.......... ...........................
..........
Supplemental Figure 3
hsa-mir-124-1
hsa-mir-125a
hsa-mir-128-1
hsa-mir-142
hsa-mir-150
hsa-mir-192
hsa-mir-193b
hsa-mir-205
hsa-mir-214
hsa-mir-455
hsa-mir-483
hsa-mir-499
hsa-mir-888
hsa-mir-9-1
hsa-mir-220a
cand141
cand142
cand181
cand316
mmu-mir-122 I mmu-mir-133a-1
mmu-mir-137
mmu-mir-138-1
mmu-mir-139
mmu-mir-153
mmu-mir-208a
mmu-mir-216a
mmu-mir-217
mmu-mir-223
mmu-mir-224
mmu-mir-375
mmu-mir-105
mmu-mir-207
mmu-mir-220
mmu-mir-327
mmu-mir-343
mmu-mir-453
mmu-mir-568
mmu-mir-654
mmu-mir-678
mmu-mir-680-3
mmu-mir-687
mmu-mir-697
mmu-mir-698
mmu-mir-717
mmu-mir-719
mmu-mir-761
mmu-mir-882
mmu-mir-682
mmu-mir-690
mmu-mir-707
mmu-mir-763
mmu-mirc-niob-MM 28
mmu-mirc-niob-MM 57
mmu-mirc-niob-MM 76
mmu-mirc-niob-MM 155
mmu-mirc-niob-MM 185
mmu-mirc-niob-MM 227
mmu-mirm-niob-MM 290
mmu-mitc-niob-MM 298
MIR90
MIR103
MIR146
MIR165
MIR170
MIR174
MIR181
MIR192
MIR213
MIR223
MIR237
"""" *
w"""' ""
*
Human
miRNA
controls
M6
"'"'"""
=
*
*-
I~=E~
Lim 2003
U*
I-
Berezikov
2005
-9-
"
*
" *
.......
Mouse
miRNA
controls
*
Not
sequenced
Not enough
reads
Berezikov
2006b
Xie 2005
AP9.99
10
160
10bo
1000
10000
Reads
N No hairpin plasmid
U Hairpin plasmid
Supplemental Figure S3. Ectopic-expression assay evaluating unconfirmed annotated miRNAs and predicted
miRNAs. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results.
Results of this experiment were compiled with those of Supplemental Figure S4 to produce Figure 2C and D.
115
Supplemental Figure 4
hsa-mir-193b
mmu-mir-122 mmu-mir-133a-1
mmu-mir-137
mmu-mir-138-1
mmu-mir-139
mmu-mir-153
mmu-mir-208a
mmu-mir-216a
mmu-mir-217
mmu-mir-223
mmu-mir-224
mmu-mir-375
mmu-mir-599
mmu-mir-669i
mmu-mir-684-1
mmu-mir-684-2
mmu-mir-685
mmu-mir-688
mmu-mir-690
mmu-mir-693
mmu-mir-704
mmu-mir-705
mmu-mir-707
mmu-mir-763
mmu-mir-1187 "
mmu-mir-1192
mmu-mir-1894
mmu-mir-1903
mmu-mir-1904
mmu-mir-1907
mmu-mir-1927
mmu-rmr-1929
mmu-mir-1936
mmu-mir-1937c
mmu-rir-1940
mmu-mir-1959
mmu-mir-1960
mmu-mir-1966
mmu-mir-1970
mmu-mir-184 mmu-mir-297a-6
mmu-mir-466f-4
mmu-mir-489
mmu-mir-1191
mmu-mir-1953
mmu-mir-1969
mmu-mir-449c
mmu-mir-677
mmu-mir-1944
Human miRNA
control
*1
*
nu-unu
-*
Mouse
miRNA
controls
1*
-*
-
*
-I-I!-I
U
*
i
i
Si
I
I*
Not enough
reads
-
1
i-
*
1
I
*
i-
No miRNA*
*
*
*
M
10
160
1oo
Incorrect
miRNA*
io60
100000
Reads
NNo hairpin plasmid
a Hairpin plasmid
Supplemental Figure S4. Ectopic-expression assay evaluating unconfirmed annotated miRNAs. Either GFP (red) or
miRNA hairpins (blue) were expressed in HEK293T. Asterisk indicates positive results. Results of this experiment were
compiled with those of Supplemental Figure S3 to produce Figure 2C and D.
116
..
---..........
.....
..........
....................................................
::::::
..................
Supplemental Figure 5
hsa-mir-124-1
hsa-mir-125a
hsa-mir-128-1
hsa-mir-142
hsa-mir-150
hsa-mir-192
hsa-mir-193b
hsa-mir-205
hsa-mir-214
hsa-mir-455
hsa-mir-483
hsa-mir-499
hsa-mir-888
hsa-mir-9-1
hsa-mir-220a
cand141
cand142
cand181
cand316
mmu-mir-122
mmu-mir-133a-1
mmu-mir-137
mmu-mir-138-1
mmu-mir-139
mmu-mir-153
mmu-mir-208a
mmu-mir-216a
mmu-mir-217
mmu-mir-223
mmu-mir-224
mmu-mir-375
mmu-mir-1941
mmu-mir-1964
mmu-mir-1968
mmu-mir-1912
mmu-mir-3061
mmu-mir-3072
mmu-mir-3073
mmu-mir-3075
mmu-mir-3081
mmu-mir-3089
mmu-mir-3090
mmu-mir-3093
,
-
*
I-.
-
*
" "
Human
miRNA
controls
ff-
Lim 2003
Berezikov
2005
Mouse
miRNA
controls
"
""
*
*
Novel
miRNAs
a
mmu-mir-3095
mmu-mir-3108
mmu-mir-3109
mmu-mir-3110
mmu-mir-344f
mmu-mir-3104
noStar-014
noStar-033
noStar-043
noStar-073
noStar-080
-
*
1*
Novel
shRNAs
DGCR8- &
DCR-dependent
candidates
noStar-087
nostar-117
noStar-135
noStar-150
noStar-154
noStar-166
wrongStar-016
noStar-149
s i1
Other candidate
10000 100000
1000
Reads
n Hairpin plasmid
mNo hairpin plasmid
10
100
Supplemental Figure S5. Ectopic-expression assay evaluating predicted miRNAs, novel miRNAs, and miRNA candidates. Either GFP (red) or miRNA hairpins (blue) were expressed in HEK293T Asterisk indicates positive results. Results of
this experiment were compiled with those of Supplemental Figures S6 and S7 to produce Figure 3B.
117
Supplemental Figure 6
hsa-mir-124-1
hsa-mir-125a
hsa-mir-128-1
hsa-mir-142
hsa-mir-150
hsa-mir-192
hsa-mir-193b
hsa-mir-205
hsa-mir-214
hsa-mir-455
hsa-mir-483
hsa-mir-499
hsa-mir-888
hsa-mir-9-1
hsa-mir-220a
cand141
cand142
cand181
cand316
mmu-mir-122
mmu-mir-133a-1
mmu-mir-137
mmu-mir-138-1
mmu-mir-139
mmu-mir-153
mmu-mir-208a
mmu-mir-216a
mmu-mir-217 ....
mmu-mir-223
mmu-mir-224
mmu-mir-375
mmu-mir-1188
mmu-mir-1197
mmu-mir-1933
mmu-mir-1947
mmu-mir-1224
mmu-mir-1839
mmu-mir-509
mmu-mir-3059
mmu-mir-3063
mmu-mir-3065
mmu-mir-3067
mmu-mir-3079
mmu-mir-3086
mmu-mir-3091
mmu-mir-3100
mmu-mir-3112
mmu-mir-344e
mmu-mir-3111 UnoStar-046
noStar-148
wrongStar-017
noStar-020
noStar-034
noStar-054
noStar-056
noStar-068
noStar-093
noStar-122
noStar-126
noStar- 160
wrongStar-002
wrongStar-007 |
wrongStar-009
- *
"" *
-
-
Human
miRNA
controls
*
l-t-t-*
- I~i*-
Lim 2003
Berezikov
2005
m *
*
"""
Mouse
miRNA
controls
*...
Noncanonical
controls
Early miRBase
Novel miRNA
-
m
|*
Rare novel
miRNAs
*
....
..-
W.
Novel
shRNAs
Conserved
candidates
*
Other
candidates
"
10
*
160
iobo
1000
Reads
ENo hairpin plasmid
N Hairpin plasmid
I-
100000
Supplemental Figure S6. Ectopic-expression assay evaluating novel miRNAs, miRNA candidates, predicted miRNAs,
and an unconfirmed annotated miRNA (mmu-mir-509). Either GFP (red) or miRNA hairpins (blue) were expressed in
HEK293T. Asterisk indicates positive results. Results of this experiment were compiled with those of Supplemental Figures
S5 and S7 to produce Figure 3B.
118
........................................
.....................
...............................................
Supplemental Figure 7
Human miRNAcontrol
Lim 2003 -
hsa-mir-19 3b
hsa-mir-22 Oa
mmu-mir-122
mmu-mir-133oa-2P1
mmu-mir-1 37
mmu-mir-13 8-1
mmu-mir-139
mmu-mir-153
mmu-mir-20 18ai
mmu-mir-21 6a .
Mouse
mmu-mir-2 17
miRNA
controls
mmu-mir-2 23
mmu-mir-2
mmu-mir-3 75
mmu-mir-19 33
mmu-mir-19 41
mmu-mir-19 47
mmu-mir-19 64
mmu-mir-19 68
mmu-mir-12 24
Noncanonical
controls L mmu-mir-18139
Early miRBasemmu-mir-5 09
mmu-mir-19 12
mmu-mir-30 59
mmu-mir-30 61
mmu-mir-30 72
mmu-mir-30 73
mmu-mir-30 75
Novel
miRNAs
mmu-mir-30k81
mmu-mir-30 k90
mmu-mir-30 195
mmu-mir-31 08
mmu-mir-31 09
mmu-mir-31 10
I
=
F
mmu-mir-30 63
Novel rare
miRNAs
Novel
shRNAs
mmu-mir-30P65
mmu-mir-30 79
mmu-mir-30'86=
Kmmu-mir-30
mmu-mir-344emmu-mir-3 14f
noStar-020
P56
noStar-1 22
noStar-148
wrongStar-002
wrongStar-0109
noStar-0
Candidates
<=1
10
* No hairpinplasmid+ no TNdrosha/TNdicerplasmid
EHairpin plasmid+ no TNdrosha/TNdicerplasmid
100
Reads
100
" No hairpinplasmid+ TNdroshaplasmid
"Hairpin plasmid +TNdroshaplasmid
106$0
10000
" No hairpin plasmid+ TNdicerplasmid
" Hairpin plasmid+ TNdicer plasmid
Supplemental Figure S7. Drosha/Dicer-dependent biogenesis of novel miRNAs. The selected hairpins were transfected
into HEK293T with a control vector (blue), TNdrosha (red), or TNdicer (green). Similar transfections using the control vector
instead of the hairpins are shown in light blue, orange, and light green, respectively. Results of this experiment were compiled
with those of Supplemental Figures S5 and S6 to produce Figure 3B.
119
. ...........................................................
Supplemental Figure 8
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
10
102
103
105
104
108
107
108
Genomic distance
o Not clustered
o Clustered
Supplemental Figure S8. Correlation of expression and genomic distance. The correlation of expression with clustering
was calculated as previously (Baskerville and Bartel 2005), except that miRNAs that mapped to the same pre-mRNA
transcript were considered clustered regardless of genomic distance. The clustered miRNAs (red) were more correlated than
non-clustered miRNAs (blue). Some miRNA pairs more than 50,000 nt apart were categorized as clustered with each other
due to joint proximity to intervening miRNAs, and their correlated expressions supported this clustering method. Other
miRNAs that are within 50,000 nt of each other were not considered clustered because one mapped within a pre-mRNA,
whereas the other one did not; each of these three pairs of miRNAs were not correlated in expression. Correlated expression
observed for many miRNAs located -130,000 nt apart was due to likely co-expression of two megaclusters on chr12.
120
109
.. ...............
:::::::.:
.....
. .......................
:...................
.......................
..........................................................................................
.........
........
..........
Supplemental Figure 9
350
12,000,000
300
10,000,000
250
8,000,000
200
U)
'
6,000,000
150
4,000,000
50A
100
2,000,000
0
16
17
18
19
20
21
22
23
24
25
0
26 27
miRNA length
N Conserved M Nonconserved
18
19
20
21
22
23
24
25
26
miRNA length
U Conserved N Nonconserved
Supplemental Figure S9. The distribution of lengths of conserved (red) and nonconserved (blue) mature miRNAs. (A)
Size distribution plotted in terms of number of normalized reads. (B) Size distribution plotted in terms of the dominant read
length for each miRNA.
121
Supplemental Figure 10
UCAGUUC
UCAGUUG
DO.
UCAGUUA
1.00
1.00.
o 0.7 .L
o0.75.
E
0.50
0.50'
.5
8mer
7mer-m8
7mer-Al
6mer
No Site
0.251
0.05'7'-M
-0.20
~0.25'
0o5 0.75
05
-0.75
-0.5
Fold Change (log2)
0
1.00
,
.
0.5
7mer-m8
7mer-Al
6mer
No Site
E
E
00.25
0.0(14PTi
-0.75 -0.5
0
AAUGCUG
AAUGCUU
10.
1 00
-0.25 0.0
0.25
Fold Change (log2)
8mer
0.50
8mer
7mer-m8
7mer-Al
6mer
No Site
0
0.75-.
075
-0.25
0.0
0.25
Fold Change (log2)
0
AAUGCUC
-
1.00
0.75.
0 0.75
T
U-
E
0.50.
0.25--
8mer
7mer-m8
7mer-Al
06mer
T
0.50 E
8mer
0.25.
No Site
0.0-0.75 -0.5 -0.25 0.0 0.25 0.5
Fold Change (log2)
0.
-75
-
-0.5
-0.25
.0
= 0.50a5
7mer-m8
7mer-Al
6mer
No Site
E
30.25.Se
0.5
0.00+-0.75
0.25
Fold Change (log2)
8mer
7mer-m8
7mer-Al
6mer
1
No Site
-0.5
-0.25
0.0
0.25
0.5
C
Fold Change (log2)
Supplemental Figure S10. Controls to ensure that observed mRNA derepression attributed to the minor isoform was not
due to overlap of its sites with offset 6mer sites of the major isoform. (A) Lack of statistically significant derepression by the
three control motifs that differed from the miR-223 minor site by a single nt at position 8. (B) Same as in A except for the
miR-155 minor site. The mRNAs with 8mer and 7mer-Al sites for the minor isoform were excluded from the analysis because
these sites overlapped with 7mer-m8 sites for the major isoform.
122
............................................................................
.....................
.
.....
...
Supplemental Figure 11
A
Brain miRNA-matching sequences
Spiked-in sequence controls
5endof read
typ
T>G
c-A
Perfect ialbh
G T
G>C
GPA
A T
A-c
A>G
(0.61%)
(0.92%)
(2-0%)
C
B
300
Fraction edited: >5%
.4
250
e
200
Edited reads: >10
"
217 Sequences mapped (sequences with at least 5 reads shown)
c>T
G TTGTACTTAAAGCGAGGTTGCCCTTTGTATATTCGGTTTATTGACTGGAATATACAAiGGCAAGCTCTCTGGATATCAAACC
CT TCT... .. .. .. .. .
GGCAA
.. .. .. .. .. .. .. .. .. .. . .. .. .. . .. ... .. .. ..CTATACA
. .. .. . .. ....
G CAG TCTCTGA.
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . .. ATATACAA
. .. .. .. .. ...
.. .. . ...
.. .. . .... .. .. .. .. .. .. .. .. .. .. ....TATACAAGGCAAGCTCTCTGC
G C A CT C G . .IL
. .. .. . ... .
..
...
..
...
..
... ATQCA
. ......
...... ... ...
T T C A G C C C C C G . . ... .. .. .. .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...........
. .... ..... ... .....
. ..
..
..
...
....
..
.. TATACAAGCGGCAAGCTCTC
G T . .. .. ....
. .. .. .. .. .... .. ....
..
..
..
..
. ..
..
.. TAThAAAGG
CAArCTCTCTGA.
.. .. .. .. .. ..
AGG CAGCC CTT . .. . ...
.. ...
.. ... . .. .. . ...
. ... .. . ... . ...
. ..TATA
. . . .... . ... .. .. .. .. . ...
.. .. .. .. ...
. .. .. . .. .. .. .. . ... .. . ... .. .. . ... .. .. .. .. .. .. . ... .. .... TATACAAGGGCAAGCTCTCTGTA.
TCTCTG
. .. .. .. .. . ..
. .. . ...
... . .. .. .. . ...
... ... .. .. . ... .. .. .. .. .. .. .. .. . ..TATEAAGGGCAAG
.. .. .. .. .. ....AGCGAGGTT1GCCCTTTGTA.
. .. .. . ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...
. . .. .. ..... AGCGAGGTTGCCCTTTGTAT
. .. .. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...
........ . . AGCGAGGTTGCCCTTTGTATA.
...
.. . .. ...
.. .. .. .. .. .. .. .. .. . .....
. ... .. .. .. .. .. .. ...
. . .. :. . ... ... AGCGAG;GTTGCCCTTTGTATAT
. .. . .. .. .. .. .. .. .. .. . ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .
.. . .. ... .. .. .. .. .. . .. .. ... .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. ..... .A CGGGT CCTT TATATT
. .. .. .. . .. .. .AGCGAGGTTGCCCTTTGTATATTC
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ... .. .. . .. . . .. .. . .:.
. .
. .. .. . . ............
:.
: : ....
: : ::
.............
: : . : : :.:...ATATACAAGGG
AAGCT
TC... .. ... ... . ..
... .. . .. ... .. .
. .. .. ...... .. .. ... ..
..
..
..
..
..
..
.. ATATACAAGGGCAAGCTCTCT
. .. .. ...... .. .. ... ..
..
..
..
..
..
..
.. ATATACAAGGGCAAGCTCTCTG.
.. .. .. .. .. ...
... .. .. .. ..... ..
...
..
..
..
..
..
....... ::ATATACAAGGGCA
GCTTCTGT
.. ..........
. .. .............................
.......
. ... ......
....
.. .
. .. TATACAAGGGCAAGCTCTC.
..............
. . .. .. .. .. . .... . .. .. . ... .. .. .. .. .. ... ... .. .. .. .. .. .. ..TATAC GCAGT TCT
. . .. .. .. . . .
. .. . ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . ..TATACAAGGGCAAGCTCTCTG
.. .. .. .. .. ...
. .. .. .. .. .. .. . ... .. . .... ... .. .. .. . ...
. . .... .. .. .. . .. ..TATACAAGGGCAAGC
TCCT
T... . . .. .... .
... .......
... ... ....
.. ....
...
... ...
...
AG
......
...
....
. ....
....
.. ......
.................................
......................
AMaGAaCTCTCTGT
............
.. .. ..
.. . ...
.. .. ..
150
E 100
.
.
.
30
65
Brain - mir-361
chr12(+): 110965025 - 110965112
Thresholds
20
10
Genom
ine amiRNA
eRtig eaichn nilters
EVendter
Readtere
Eventfitter4
Unque
b
.. .. .. ..
.. .. .. ..
... . .. ..
AG
>2ntftm
ATACA
G
TCTCTG
GRte:
0.125
'A
3500
210
118011
90
60
30
V 216 f
600
E
miR-337
3p arm(star strand)
CCATTCAGCTCCTATATGATGCCTTT
8
900
m750
5 3000
2500
1 2000
1501
1 120 J
p = 3.27e-13
E 450
1500
1000
O3. 500
GT
0
miR-411
5p arm (star strand)
4 300
0
ACCGTATAGCGTACG
300
p < 2.2e-16
AA
0
600
70
miR-376a
5parm(star strand)
GTAGATTCTCCTTCTATGAGT
p <2.2e-16
140
i
24.
900
32
1200
280
40
1500
350
210
Supplemental Figure S11. RNA editing. (A) An overview of mismatches from the sequences indicated. In the two spiked-in
synthetic RNAs of known sequence, mismatches were distributed throughout the length of the sequence, with no preference for
A-to-G mismatches. In miRNA-mapping small RNA sequences from brain, mismatches were concentrated in the last 2 nt ofthe
read, probably due to cellular terminal-transferase activity. (B) Loss of most mismatch events after applying filters expected to
distinguish editing events from background. Mismatch events were considered significant if a position had at least 10 reads
corresponding to a particular mismatch, and these reads accounted for at least 5% of reads covering that position. As successive
filters were applied to the genome-mapping reads, the number of significant A-to-G mismatch events remained relatively
unaffected, whereas nearly all other mismatch events were eliminated. In particular, C-to-T mismatches were mostly eliminated,
indicating that C-to-U RNA editing does not occur to any significant degree in miRNAs. A-to-G mismatch events that passed all
filters were considered editing candidates and manually examined to see if other plausible models could explain the mismatches.
(C) A display of most abundant perfectly-matching and single-mismatch reads from the mmu-mir-381 locus illustrates that
inferred A-to-I editing accounts for essentially all mismatches at the edited position, and the great majority of all mismatched
reads mapping to the miRNA or miRNA*. An analogous pattern was found for all 16 miRNAs that passed filters and manual
validation. (D) Editing of a miRNA or miRNA* was associated with significantly altered 5' end specificity. In the cases of
mmu-mir-337 andmmu-mir-411, edited reads had more homogeneous 5' ends than unedited reads.
123
Supplemental Table 1. Summary of high-throughput sequencing.
Sample
Raw reads
With linker seq Genome match (16-27nt)
Ovary
641,583
416,374
259,684
Testes
5,427,076
2,308,332
1,614,777
Brain
13,024,478
10,513,006
6,984,353
Newborn
21,967,488
16,763,972
11,045,939
e12.5
3,936,146
3,467,324
2,457,730
e9.5
5,586,229
4,104,135
2,544,507
e7.5
5,762,821
4,816,695
2,705,251
ES
3,737,635
3,061,072
1,057,274
Total
60,083,456
45,450,910
28,669,515
124
Supplementary Table 2.
Small RNA compositions.
Non-coding RNA (ncRNA) refers to any reads that map to annotated rRNA, tRl
loci. Small-interfering RNA (siRNA) and mRNA exon reads refer to reads that n
the sense strands of annotated refSeq mRNAs, respectively.
Sample
miRNA
ncRNA
siRNA mRNA exon
Ovary
180,069.19
47,827.49
944.82
1,316.74
Testes
180,547.41
57,455.81
2,442.20 12,939.83
Brain
6,261,981.23
240,935.49 154,737.47
7,559.71
Newborn
9,440,674.90
625,004.67 679,948.77 40,821.58
e12p5
2,070,477.89
199,596.51 40,199.63
4,483.84
e9p5
2,072,408.35
273,737.99
9,670.94 10,720.03
e7p5
2,052,457.81
367,164.12
4,752.57 11,128.34
ES
468,326.86
235,034.54 17,592.83
6,790.23
Total
22,726,943.64
2,046,756.63 910,289.23 95,760.30
%
79.27
7.14
3.18
0.33
125
Supplemental Table4.
Hairpin
mmu-mir-1937a
mmu-mir-1937b
mmu-mir-464
mmu-mir-1944
mmu-mir-1949
mmu-mir-449c
mmu-mir-677
mmu-mir-702
mmu-mir-1190
mmu-mir-1191
mmu-mir-184
mmu-mir-1953
mmu-mir-1965
mmu-mir-1969
mmu-mir-297a-1
mmu-mir-297a-2
mmu-mir-297a-6
mmu-mir-466f-4
mmu-mir-468
mmu-mir-489
mmu-mir-574
mmu-mir-720
mmu-mir-875
mmu-mir-1186
mmu-mir-1187
mmu-mir-1192
mmu-mir-1195
mmu-mir-1196
mmu-mir-1274a
mmu-mir-1892
mmu-mir-1893
mmu-mir-1894
mmu-mir-1900
mmu-mir-1902
mmu-mir-1903
mmu-mir-1904
mmu-mir-1906
mmu-mir-1907
mmu-mir-1927
mmu-mir-1929
mmu-mir-1932
mmu-mir-1935
mmu-mir-1936
mmu-mir-1937c
mmu-mir-1938
mmu-mir-1940
mmu-mir-1945
mmu-mir-1946b
mmu-mir-1948
mmu-mir-1950
mmu-mir-1951
mmu-mir-1954
mmu-mir-1956
mmu-mir-1957
mmu-mir-1958
mmu-mir-1959
mmu-mir-1960
mmu-mir-1962
mmu-mir-1963
mmu-mir-1966
mmu-mir-1970
mmu-mir-2137
mmu-mir-2139
mmu-mir-449b
mmu-mir-466g
mmu-mir-466i
mmu-mir-466j
mmu-mir-467f
mmu-mir-467h
mmu-mir-546
mmu-mir-599
mmu-mir-669g
mmu-mir-669i
mmu-mir-669j
mmu-mir-669n
mmu-mir-680-1
mmu-mir-680-2
mmu-mir-682
mmu-mir-684-1
mmu-mir-684-2
mmu-mir-685
mmu-mir-686
mmu-mir-688
mmu-mir-690
mmu-mir-692-1
mmu-mir-692-2
mmu-mir-693
mmu-mir-694
mmu-mir-703
mmu-mir-704
mmu-mir-705
mmu-mir-707
mmu-mir-713
mmu-mir-715
mmu-mir-763
mmu-mir-105
mmu-mir-1895
Previously annotated miRNA hairpins that did notpassour criteriafar consideration as miRNAo
Status
heterogeneous 5'
heterogeneous 5'
heterogeneous 5'
incorrect miRNA
incorrect miRNA*
incorrect miRNA*
incorrect miRNA*
incorrect miRNA
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA*
no miRNA*
notenough reads
notenough reads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenough reads
notenough reads
notenough reads
notenough reads
not enoughreads
not enoughreads
not enoughreads
not enoughreads
not enoughreads
not enoughreads
not enough reads
not enough reads
not enough reads
not enough reads
not enough reads
not enoughreads
not enoughreads
not enoughreads
not enoughreads
not enoughreads
not enoughreads
not enough reads
not enough reads
not enough reads
not enoughreads
not enoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenough reads
notenough reads
notenough reads
notenough reads
notenough reads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenough reads
notenoughreads
notenough reads
notenough reads
notenough reads
notenough reads
notenough reads
notenoughreads
notenough reads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenoughreads
notenough reads
notsequenced
notsequenced
126
Supplemental Table 4.
mmu-mir-1896
mmu-mir-1897
mmu-mir-1898
mmu-mir-1899
mmu-mir-1901
mmu-mir-1905
mmu-mir-1928
mmu-mir-1931
mmu-mir-1939
mmu-mir-1942
mmu-mir-1946a
mmu-mir-1952
mmu-mir-1961
mmu-mir-1967
mmu-mir-1971
mmu-mir-207
mmu-mir-2136
mmu-mir-220
mmu-mir-327
mmu-mir-343
mmu-mir-432
mmu-mir-453
mmu-mir-467g
mmu-mir-509
mmu-mir-568
mmu-mir-654
mmu-mir-678
mmu-mir-680-3
mmu-mir-681
mmu-mir-683-1
mmu-mir-683-2
mmu-mir-687
mmu-mir-691
mmu-mir-695
mmu-mir-697
mmu-mir-698
mmu-mir-706
mmu-mir-709
mmu-mir-710
mmu-mir-711
mmu-mir-717
mmu-mir-718
mmu-mir-719
mmu-mir-721
mmu-mir-759
mmu-mir-761
mmu-mir-762
mmu-mir-767
mmu-mir-804
mmu-mir-882
mmu-mir-2132
mmu-mir-2133-1
mmu-mir-2133-2
mmu-mir-2134-1
mmu-mir-2134-2
mmu-mir-2134-3
mmu-mir-2134-4
mmu-mir-2135-1
mmu-mir-2135-2
mmu-mir-2135-3
mmu-mir-2135-4
mmu-mir-2135-5
mmu-mir-2138
mmu-mir-2140
mmu-mir-2141
mmu-mir-2142
mmu-mir-2143-1
mmu-mir-2143-2
mmu-mir-2143-3
mmu-mir-2144
mmu-mir-2145-1
mmu-mir-2145-2
mmu-mir-2146
mmu-mir-689-1
mmu-mir-689-2
mmu-mir-1983
mmu-mir-451
mmu-mir-469
mmu-mir-484
mmu-mir-805
continued fromprevious page
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
notsequenced
nutsequenced
notsequenced
notsequenced
notsequenced
notsequenced
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlapsrRNA
overlaps rRNA
overlapsrRNA
overlapsrRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlaps rRNA
overlapsrRNA
overlaps rRNA
overlaps tRNA
noncanonical miRNA;manyreadsthatmappedwell into theioopof the putativehairpin
manvreadsthat mappedwell intothe loopof the putativehairpin
didnot givea predictedfoldwith the requisitepairinginvolng thecandidate andpredictedmiRt
many reads that mapped well intothe loopof the putativehairpin
127
Chapter 4
Future Directions
MicroRNAs (miRNAs) play an important role in gene regulation by posttranscriptionally repressing expression of their target genes (Bartel 2004). The key
determinant of miRNA targeting is the seed sequence, corresponding to nucleotides 2-7
of the mature miRNA (Lewis et al. 2003; Lewis et al. 2005; Bartel 2009). Hence,
accurate annotation of mature miRNA species as well as authentic miRNA genes is
fundamental to understanding miRNA gene regulation. However, the genomic study of
murine miRNA genes, described in the previous chapter, suggests that many previous
annotations are questionable. In addition, the study identified novel miRNA genes and
variations in the miRNA biogenesis pathway that resulted in multiple miRNA isoforms.
This chapter addresses three areas for future studies.
First, the quality of miRNA annotations should be examined. With advances in
sequencing technology, a large number of novel miRNA genes have been identified. The
next step is to review the quality of the database so that it can provide the most accurate
and comprehensive list of miRNA genes.
In addition, deeper sequencing of small RNAs revealed interesting processing
variations at each step of the miRNA biogenesis pathway. Many of these phenomena
resulted in the production and/or RISC-loading of RNA species with different seed
sequences, which led to targeting and inhibition of different sets of mRNAs. Further
work on discovery of additional miRNAs that undergo similar processing variations and
129
identification of their biogenesis mechanisms will be informative in understanding
miRNA gene regulation.
Lastly, high-throughput sequencing technology has opened the door for
integrative approaches to studying small RNAs on the genomic scale. Closer inspection
of small-RNA sequencing data coupled with those from interactome or transcriptome
studies may advance the understanding of biogenesis and/or function of small RNAs.
MicroRNA gene annotations
As the official archive of miRNA genes, miRBase should be a source of accurate
information. Although stringent discovery methods can better distinguish authentic
miRNAs from degradation fragments, incomplete understanding of miRNA biogenesis
hinders the establishment of comprehensive guidelines for miRNA gene discovery. While
the major proteins and hairpin features required for miRNA processing have been
identified, other yet unidentified features appear to also affect miRNA biogenesis. Since
the guidelines for miRNA discovery are rooted in the knowledge of how miRNAs are
processed, better understanding of miRNA biogenesis will lead to improved miRNA gene
identification methods.
Despite the best efforts, however, some false annotations will likely continue to
exist in miRBase. False entries are occasionally expunged from the database, but short of
additional reads that suggest nonspecific degradation, it is difficult to prove that an entry
is an incorrect annotation rather than a miRNA that is only produced under very specific
conditions.
130
Previously, users could gauge the confidence with which miRNA genes are
annotated using sequencing data obtained from Gene EXpression Omnibus (GEO). The
processing of raw data, however, is laborious and may have discouraged users from
utilizing this resource. To facilitate the process, miRBase has begun to incorporate
sequencing data to the entries (Kozomara and Griffiths-Jones 2011). While users who
look at individual entries will benefit most from this change, other users are more
interested in the list of all miRNA genes in the genome. To address these distinct needs,
miRBase could be separated into two databases, one of confidently identified miRNA
genes and another of candidates. In this scenario, a novel hairpin could first be registered
as a candidate and then be moved into the confidently identified list with additional
confirmation. Under such a system, users can decide whether the more accurate or the
more comprehensive list of miRNAs is appropriate for their studies. A similar suggestion
of dividing miRBase into multiple parts has been recently proposed (Kozomara and
Griffiths-Jones 2011).
MicroRNA gene discovery by sequencing
The miRNA gene discovery efforts strive to be not only accurate but also comprehensive.
While most of the conserved miRNAs appear to be identified, additional miRNA genes
will be discovered through deeper sequencing of a broader range of samples. These novel
miRNAs may correspond to biologically relevant genes that are only expressed in
specific cell types or under specific conditions. Some may correspond to low-abundance
RNA transcripts that happen to fold into hairpins and "accidentally" fall into the miRNA
biogenesis pathway but have not acquired any conserved biological function.
131
Nonetheless, even such miRNA genes can affect cell function if produced in sufficient
quantity, much like transfected short hairpin RNAs (shRNAs) or small interfering RNAs
(siRNAs).
In order to identify additional novel miRNA genes in erythrocytes, small RNA
sequencing data from murine erythrocytes at three different stages of maturation were
analyzed. The three stages corresponded to burst-forming unit erythrocyte (BFU-E),
colony-forming unit erythrocyte (CFU-E), and terminally differentiated cells identified
by Ter 19+ antibody. The analysis identified 12 novel miRNAs in mouse. Further work
on changes in miRNA-ome through erythrocyte maturation may provide insight into the
role of miRNAs in the process. If stage-specific processing variation is observed in these
samples, erythrocytes may become an attractive platform to experimentally investigate
the mechanisms and biological functions of such processing variations.
In addition to mouse sequencing data, small-RNA sequencing data from the
human brain was examined (Appendix D), and 35 novel human miRNA gene candidates
were identified. Although the human brain data has been informative in miRNA
discovery, analysis of only one tissue sample is insufficient to construct a list of
questionable human miRNA annotations. Sequencing data from additional samples will
help to better portray the state of human miRNA gene annotations.
Computational prediction of miRNAs
While small RNA sequencing studies have identified many miRNA genes, these
approaches can only identify those that are expressed above a certain level in the
132
sequenced sample. Thus, a comprehensive coverage of all miRNA genes in an organism
remains difficult to achieve through sequencing.
Alternatively, machine learning-based approaches can be used to predict all
potential miRNA genes encoded in the genome. These entries can then be submitted as
candidate genes waiting for experimental confirmation. In addition to identifying all
possible hairpins that can be processed as miRNAs, these methods can also help discover
additional features that affect miRNA processing. Understanding such features would not
only help establish a more definitive guideline for miRNA discovery but also provide
information on how to design artificial hairpins that can be more efficiently processed
into mature miRNAs.
A machine-learning algorithm learns the properties characteristic of miRNAs
(features) from a training set of known miRNAs (positives) and other pseudo-miRNA
hairpins that do not produce mature miRNAs (negatives). It then builds a classifier that
can predict whether a given sequence is an authentic miRNA hairpin.
To examine if there is room for improvement in previous studies utilizing
machine learning-based approaches, their performances were tested on a set of confirmed
miRNAs (positives) and pseudo-miRNA hairpins that failed the ectopic overexpression
assay (negatives) (Sewer et al. 2005; Helvik et al. 2007; Jiang et al. 2007). Since most of
the recently discovered miRNA genes are nonconserved, it is likely that most of the
conserved miRNA genes have already been found. Accordingly, none of the tested
methods used conservation as a feature to describe a hairpin property. The results
demonstrated that the sensitivity and especially the specificity of these methods were
133
lower than the reported value. A better training set and/or additional features may
improve the accuracy of these prediction programs.
To determine if a better training set can improve the predictions, the programs
were re-trained using a new training set. Previous methods have used contemporary
miRBase entries as the positives and other non-coding RNA or mRNA hairpins as the
negatives. Since the previous training set contained many false entries, it stands to reason
that a better training set will improve prediction accuracy. The new training set will
consist of confirmed miRNAs (positives) and unconfirmed miRNAs that did not have any
reads mapping to them (negatives). The hairpins with high sequence similarities will be
represented by a single hairpin so that the characteristics of any particular hairpin family
are not overrepresented in the training set. Also, the hairpins used to test previous
methods will be removed from the training set so that they can be used to determine if the
re-training has improved the predictions.
Although the re-trained algorithms are expected to predict miRNA genes with
higher accuracy, it is unlikely that all of the features that distinguish genuine miRNAs
from pseudo-miRNAs have been identified. Most previous works selected elements of
sequence and secondary structure as features. A new classifier can be built using the most
informative features from the re-trained programs as well as additional features that
describe the flanking regions and the tertiary structure. If any of these features contribute
to a more accurate identification of miRNA hairpins, their biological significance can be
explored by a series of hairpin mutagenesis experiments.
MicroRNAs mapping to multiple loci
134
Many miRNAs map to multiple loci in the genome, most likely due to gene duplication.
When a sequence maps to multiple loci, the read numbers are distributed equally to the
loci as though all loci have contributed equally to the production of the sequenced reads.
Therefore, it appears as though multiple loci have produced equal amounts of identical
miRNA species. In reality, at least one of the loci must generate the miRNAs, but not all
the loci may be expressed and/or processed. Even if transcripts from all the loci are
processed as miRNA hairpins, they may not produce identical mature miRNAs, as
observed for mouse mir-133 and fly mir-2 (Ruby et al. 2007).
Ectopic overexpression assay of miRNA genes that appear to produce identical
mature miRNAs can identify gene products from each locus. While the sequence
similarities may make it difficult to clone the hairpins, the information gained from
differential processing of highly related loci would be valuable. First, the information
may help identify additional features that contribute to miRNA processing. Since most of
the sequence and thus the secondary structure of the hairpins would be identical, it may
be easier to narrow down the elements responsible for differential miRNA processing.
Furthermore, identification of loci responsible for miRNA production may affect
experimental design of miRNA functional studies. For example, if transcripts from only
one of the multiple loci can be processed into mature miRNAs, it may be sufficient to
knock out the gene at just the one locus rather than at all loci. Lastly, if the loci that were
previously thought to generate identical mature miRNAs actually produced miRNAs with
different seeds, each locus would target different mRNAs and thus have distinct
biological functions.
135
MicroRNA isoforms
While most miRNA genes give rise to one mature miRNA species, some genes produce
multiple miRNA isoforms with different 5' ends. Although miRNA 5' heterogeneity had
previously been observed (Ruby et al. 2007; Stark et al. 2007; Azuma-Mukai et al. 2008;
Wu et al. 2009), they were attributed to erroneous Drosha cleavage. However, the
functional study of miRNA isoforms concluded that both isoforms could repress
transcripts with corresponding seeds when they are produced in sufficient quantity.
A point of interest is the identification of the feature that distinguishes the primiRNAs that generate isoforms from those that only generate a single dominant species.
While the presence of a sequence or structural motif in the two groups of pri-miRNAs
can be examined, there is also the possibility that additional factors are needed for
heterogeneous processing. If pri-miRNAs that produce miRNA isoforms in vivo can also
produce isoforms in an in vitro reaction with purified Microprocessor, then it can be
concluded that all the features that encode for the isoform production are present on the
pri-miRNA.
Also, a number of conserved miRNA genes produce multiple miRNA isoforms,
but it has not yet been examined whether the heterogeneous 5' processing is conserved to
other species. Conservation of isoform production can be confirmed with small RNA
sequencing data from other species.
Dicer-independent and AGO2-dependent miRNAs
MiR-451 is the only known miRNA to be generated by the noncanonical pathway
through AGO2 cleavage rather than Dicer cleavage (Cheloufi et al. 2010; Cifuentes et al.
136
2010). To identify other miRNAs generated through AGO2 cleavage, mouse sequencing
data was re-scanned for shorter hairpins with reads mapping through the terminal loop.
Although several candidates were identified, none showed significant difference in
expression in the AGO2 knockout mouse livers compared to the wild type (Cheloufi et al.
2010). Thus, the biogenesis of these candidates appeared to be AGO2-independent. In a
different approach, each chromosome of the mouse genome was scanned using 100-nt
window, and the reads from the wild type and the AGO2 knockout mouse livers were
mapped to each window to determine the loci with AGO2-dependent reads. Although no
other Dicer-independent and AGO2-dependent miRNAs were identified using these
methods, sequencing additional samples from AGO2 knockout mouse may help identify
other miR-45 1-like hairpins.
Arm-switching miRNAs
The arm-switching miRNAs produce mature miRNAs from the 5' or the 3' arm
depending on the cell-type or developmental stage (Ro et al. 2007; Grimson et al. 2008).
However the mechanism of this selection has not yet been explored. First, cell lines
where arm-switching can be observed need to be identified. To this end, the arm
preference of miRNAs in each cell line should be observed by sequencing or by
quantitative Northern blot. Once two cell lines with different arm preferences are
identified, the RISC-loading complex (RLC) can be immunoprecipitated using an
antibody against one of its components. The other proteins that are pulled down with the
complex can then be analyzed by mass spectrometry. The results from the two cell lines
can be compared to determine which proteins were uniquely present in one cell line. To
137
determine if these candidates affect strand selection, they can be ectopically expressed in
the cell line where it is normally absent. An alternative method is to reconstitute RLC in
vitro with the candidate proteins and examine whether the strand preference changes.
De novo prediction of piRNA clusters
Piwi-interacting RNAs are a class of -26-30 nt small RNAs in germ cells that
have been implicated in transposon silencing (Malone and Hannon 2009; Lau 2010). In
2006, efforts to identify RNA binding partners of Piwi proteins led to the discovery of
piRNAs (Aravin et al. 2006; Girard et al. 2006; Grivna et al. 2006; Lau et al. 2006). By
definition, the most accurate method to identify piRNAs is through sequencing small
RNAs that co-purify with Piwi proteins. Although this approach has led to identification
of -140 piRNA clusters (Aravin et al. 2006; Girard et al. 2006; Grivna et al. 2006; Lau et
al. 2006), it is more burdensome than directly cloning small RNAs from total RNA. A
computational method that can identify piRNA clusters from small-RNA sequencing data
will be a beneficial tool for detecting piRNA production with minimal effort.
Using the existing sequencing data gathered from RNAs that interact with
individual members of the Piwi protein, an algorithm can be trained to identify features
of distinct classes of piRNAs. Such feature would include known properties of piRNAs
such as length, density of reads, nucleotide composition, and genomic location. If the
trained features can adequately describe piRNAs, the algorithm should be able to identify
piRNA clusters de novo from small RNA sequencing data. This method would not only
examine the known piRNA clusters but also detect piRNA-like reads from loci that have
not been previously implicated in piRNA production.
138
Acknowledgements
I would like to thank B. Wong for the small RNA sequencing data of erythrocytes and V.
Agarwal for the collaborative work on computational prediction of miRNAs. I would also
like to thank D. Baek for discussions on arm-switching miRNAs and miRNA isoforms,
and J. G. Ruby for help in looking at piRNAs in the mouse testes small-RNA library.
References
Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N.,
Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., Chien, M.,
Russo, J.J., Ju, J., Sheridan, R., Sander, C., Zavolan, M., and Tuschl, T. 2006. A
novel class of small RNAs bind to MILI protein in mouse testes. Nature
442(7099): 203-207.
Azuma-Mukai, A., Oguri, H., Mituyama, T., Qian, Z.R., Asai, K., Siomi, H., and Siomi,
M.C. 2008. Characterization of endogenous human Argonautes and their miRNA
partners in RNA silencing. P NatlAcad Sci Usa 105(23): 7964-7969.
Bartel, D. 2004. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
Bartel, D.P. 2009. MicroRNAs: Target Recognition and Regulatory Functions. Cell
136(2): 215-233.
Cheloufi, S., Dos Santos, C.O., Chong, M.M.W., and Hannon, G.J. 2010. A dicerindependent miRNA biogenesis pathway that requires Ago catalysis. Nature
465(7298): 584-589.
Cifuentes, D., Xue, H., Taylor, D.W., Patnode, H., Mishima, Y., Cheloufi, S., Ma, E.,
Mane, S., Hannon, G.J., Lawson, N.D., Wolfe, S.A., and Giraldez, A.J. 2010. A
novel miRNA processing pathway independent of Dicer requires Argonaute2
catalytic activity. Science 328(5986): 1694-1698.
Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. 2006. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 442(7099):
199-202.
Grimson, A., Farh, K.K.-H., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel,
D.P. 2007. MicroRNA targeting specificity in mammals: determinants beyond
seed pairing. Mol Cell 27(1): 91-105.
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N.,
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution
of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193UI115.
Grivna, S.T., Beyret, E., Wang, Z., and Lin, H. 2006. A novel class of small RNAs in
mouse spermatogenic cells. Genes & Development 20(13): 1709-1714.
139
Helvik, S.A., Snove, 0., and Saetrom, P. 2007. Reliable prediction of Drosha processing
sites improves microRNA gene prediction. Bioinformatics 23(2): 142-149.
Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., and Lu, Z. 2007. MiPred: classification of
real and pseudo microRNA precursors using random forest prediction model with
combined features. Nucleic Acids Res 35(Web Server issue): W339-344.
Kozomara, A. and Griffiths-Jones, S. 2011. miRBase: integrating microRNA annotation
and deep-sequencing data. Nucleic Acids Res 39(Database issue): D152-157.
Lau, N.C. 2010. Small RNAs in the animal gonad: guarding genomes and guiding
development. Int JBiochem Cell Biol 42(8): 1334-1347.
Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and
Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes.
Science 313(5785): 363-367.
Lewis, B., Burge, C., and Bartel, D. 2005. Conserved seed pairing, often flanked by
adenosines, indicates that thousands of human genes are microRNA targets. Cell
120(1): 15-20.
Lewis, B., Shih, I., Jones-Rhoades, M., Bartel, D., and Burge, C. 2003. Prediction of
mammalian microRNA targets. Cell 115(7): 787-798.
Malone, C.D. and Hannon, G.J. 2009. Small RNAs as guardians of the genome. Cell
136(4): 656-668.
Ro, S., Park, C., Young, D., Sanders, K.M., and Yan, W. 2007. Tissue-dependent paired
expression of miRNAs. Nucleic Acids Res 35(17): 5944-5953.
Ruby, J.G., Stark, A., Johnston, W.K., Kellis, M., Bartel, D.P., and Lai, E.C. 2007.
Evolution, biogenesis, expression, and target predictions of a substantially
expanded set of Drosophila microRNAs. Genome Res 17(12): 1850-1864.
Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M.J., Tuschl, T.,
van Nimwegen, E., and Zavolan, M. 2005. Identification of clustered microRNAs
using an ab initio prediction method. BMC Bioinformatics 6: 267.
Stark, A., Lin, M.F., Kheradpour, P., Pedersen, J.S., Parts, L., Carlson, J.W., Crosby,
M.A., Rasmussen, M.D., Roy, S., Deoras, A.N., Ruby, J.G., Brennecke, J.,
Hodges, E., Hinrichs, A.S., Caspi, A., Park, S.-W., Han, M.V., Maeder, M.L.,
Polansky, B.J., Robson, B.E., Aerts, S., van Helden, J., Hassan, B., Gilbert, D.G.,
Eastman, D.A., Rice, M., Weir, M., Hahn, M.W., Park, Y., Dewey, C.N., Pachter,
L., Kent, W.J., Haussler, D., Lai, E.C., Bartel, D.P., Hannon, G.J., Kaufman, T.C.,
Eisen, M.B., Clark, A.G., Smith, D., Celniker, S.E., Gelbart, W.M., and Kellis, M.
2007. Discovery of functional elements in 12 Drosophila genomes using
evolutionary signatures. Nature 450(7167): 219-232.
Wu, H., Ye, C., Ramirez, D., and Manjunath, N. 2009. Alternative processing of primary
microRNA transcripts by Drosha generates 5' end variation of mature microRNA.
PLoS ONE 4(10): e7566.
140
Appendices
Appendix A
Appendix A has been previously published as:
Batista, P.J., Ruby, J.G., Claycomb, J.M., Chiang, R., Fahlgren, N., Kasschau,
K.D., Chaves, D.A., Gu, W., Vasale, J.J., Duan, S., Conte, D., Luo, S., Schroth,
G.P., Carrington, J.C., Bartel, D.P., and Mello, C.C. 2008. PRG-1 and 21U-RNAs
interact to form the piRNA complex required for fertility in C. elegans. Mol Cell
31(1): 67-78. © 2008 Elsevier Inc.
Appendix B
Appendix B has been previously published as:
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N.,
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. 2008. Early origins and evolution
of microRNAs and Piwi-interacting RNAs in animals. Nature 455(7217): 1193U 1115. © 2008 Macmillan Publishers Limited.
Appendix C
Appendix C has been previously published as:
Rao, P.K., Toyama, Y., Chiang, H.R., Gupta, S., Bauer, M., Medvid, R.,
Reinhardt, F., Liao, R., Krieger, M., Jaenisch, R., Lodish, H.F., and Blelloch, R.
2009. Loss of cardiac microRNA-mediated regulation leads to dilated
cardiomyopathy and heart failure. Circ Res 105(6): 585-594. C 2009 American
Heart Association, Inc.
Appendix D
Appendix D has been previously published as:
Shin, C., Nam, J.-W., Farh, K.K.-H., Chiang, H.R., Shkumatava, A., and Bartel,
D.P. 2010. Expanding the microRNA targeting code: functional sites with
centered pairing. Mol Cell 38(6): 789-802. C 2010 Elsevier Inc.
141
..........
.....
..
..
.....
.
..........
....
PR
E
S
Molecular Cell
PRG-1 and 21 U-RNAs Interact to Form the piRNA
Complex Required for Fertility in C. elegans
Pedro J. Batista,' 5-170 J.Graham Ruby,2 ,3.6,10 Julie M.Claycomb,1 Rosaria Chiang,2,3S6 Noah Fahlgren, 78,
Conte, Jr.,'
Shenghua Duan,' Darryl
,aDaniel A.Chaves,' Weifeng Gu,' Jessica J.Vasale,'
Kristin D.Kasschau,
4
2
Shujun Luo,9 Gary P.Schroth,9 James C.Carrington, 7-8 David P. Bartel, ,3,e,* and Craig C.Mellol ,*
'Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
Hughes Medical Institute
Department of Biology
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
4
Howard Hughes Medical Institute, Worcester, MA 01605, USA
5
Gulbenkian PhD Programme in Biomedicine, Rua da Quinta Grande, 6, 2780-156, Oeiras, Portugal
6
Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
7Center for Gene Research and Biotechnology
8
Department of Botany and Plant Pathology
Oregon State University, Corvallis, OR 97331, USA
91llumina, Inc., Hayward, CA 94545, USA
10
These authors contributed equally to this work
*Correspondence: dbartel@wi.mit.edu (D.P.B.), craig.mello@umassmed.edu (C.C.M.)
DOl 10.1016/j.molcel.2008.06.002
2
Howard
3
SUMMARY
In metazoans, Piwi-related Argonaute proteins have
been linked to germline maintenance, and to a class
of germline-enriched small RNAs termed piRNAs.
Here we show that an abundant class of 21 nucleotide small RNAs (21 U-RNAs) are expressed in the
C. elegans germline, interact with the C. elegans
Piwi family member PRG-1, and depend on PRG-1
activity for their accumulation. The PRG-1 protein is
expressed throughout development and localizes
to nuage-like structures called P granules. Although
21 U-RNA loci share a conserved upstream sequence
motif, the mature 21 U-RNAs are not conserved and,
with few exceptions, fail to exhibit complementarity
or evidence for direct regulation of other expressed
sequences. Our findings demonstrate that 21 URNAs are the piRNAs of C.elegans and link this class
of small RNAs and their associated Piwi Argonaute to
the maintenance of temperature-dependent fertility.
INTRODUCTION
Diverse organisms utilize sequence-specific gene regulatory
pathways that share features with RNA interference (RNAi).
The effector complex in all RNAi-related pathways consists of
a single-stranded small RNA, and a member of the AGO protein
family, which binds small-RNA termini, leaving internal nucleotides accessible for base-pairing interactions with target sequences. In canonical RNAi pathways, double-stranded RNA
(dsRNA) is processed by members of the Dicer family of multifunctional ribonucleases into 21-24 nucleotide (nt)short interfering RNAs (siRNAs) that interact with and guide AGO proteins to
complementary target sequences in the cell (reviewed in Hutvagner and Simard, 2007).
Most animals have an additional AGO subfamily called Piwi.
C. elegans has two Piwi-related genes (named prg-1 and prg2) that, like Piwi family members from a number of animal species, have been implicated in germline maintenance and fertility
(reviewed in Klattenhoff and Theurkauf, 2008). Two classes of
Piwi-interacting RNAs (piRNAs) have been identified, including
(1)repeat-associated piRNAs (originally annotated as rasiRNAs)
that appear to target transposons, and (2) a second, more mysterious class of piRNAs with no known targets (Lin, 2007). The
latter class of piRNAs is extremely abundant in small-RNA fractions isolated from pachytene-stage mouse spermatocytes: over
80,000 distinct species are derived from large genomic clusters
of up to 200 kb (Aravin et al., 2006; Grivna et al., 2006; Girard
et al., 2006; Lau et al., 2006). These clusters exhibit a marked
strand asymmetry, as though the piRNAs within a region are all
processed from one large transcript or two divergent transcripts.
Studies in C. elegans have identified several classes of endogenously expressed small RNAs (Ambros et al., 2003; Ruby et al.,
2006). However, which, if any, of these represent piRNAs has yet
to be determined. One class of small RNAs, termed 21 U-RNAs,
shares several characteristics with the piRNAs of flies and mammals, including an overwhelming bias for a 5' uracil, a 5' monophosphate, and a 3' end that is modified and resistant to periodate degradation (Ruby et al., 2006; Ohara et al., 2007; Saito et al.,
2007; Horwich et al., 2007; Kirino and Mourelatos, 2007). However, 21 U-RNAs are shorter than piRNAs in flies and mammals,
and their genomic organization is very different, with 21 U-RNAs
deriving from what appear to be thousands of individual, autonomously expressed loci broadly scattered intwo large regions of
one chromosome.
Here we show that 21 U-RNAs are expressed in the germline
and that their accumulation depends on the wild-type activity
of PRG-1. We show that PRG-1 localizes to germline P granules
Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 67
...........
PR
E
S
and that 21U-RNAs coimmunoprecipitate with PRG-1 from
worm lysates. Our analysis identifies many additional 21URNAs, bringing the total number of 21U-RNA loci to 15,722,
and confirms the expression of many 21 U-RNA loci previously
predicted based only on the presence of an upstream sequence
motif. Like the abundant pachytene piRNAs found in mammals,
21 U-RNAs encode remarkable sequence diversity and yet lack
obvious targets. Although we identify one example of a transposon-directed 21U-RNA, our findings suggest that piRNA complexes of worms, charged with the remarkable sequence diversity encoded by 21 U-RNAs, are likely to provide other essential
germline functions.
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
CTGTTTCA
L -A/T rich
mIRNAs end
endogenous
'
T31
AT rich
siRNAs
-2K n
21U-RNA
21U-RNAs
na
I
160KK
140K120,K-
~
F-.-6K
8K8
-6K
10 K-
-5K
80K-
t
4K
60K
894
2090
2560
2854
2417
2098
3K
40K-
1405
1388
1754
2K
g
20K
j-
1K
0
-30 -20 -10 0
10 20 30
1 40 80 120 160200
Reeds perlocu
21U-RNA upstream molif score
RESULTS
45K-
Identification of Over 15,000 Unique 21U-RNA Species
in C. elegans
We used Solexa sequencing technology (Seo et al., 2004) to generate 29,112,356 small-RNA cDNA reads that perfectly matched
the C. elegans genome. Among these we identified 971,981
reads from 15,458 unique loci with properties similar to previously defined 21 U-RNA loci (Ruby et al., 2006). These reads
matched 95.1% of the 5454 previously sequenced 21 U-RNAs
and 78.3% of the 10,644 previously predicted 21 U-RNAs
(Ruby et al., 2006) and brought the total number of unique experimentally confirmed 21 U-RNA loci to 15,722.
A common characteristic of 21 U-RNA loci is the presence of
an upstream sequence motif (Figure 1A; Ruby et ai., 2006). As
previously observed, RNA species 21 nt in length could be separated into two distinct sets based on the motif scores of their
genomic loci (Figure 11B). Species with a high motif score also
tended to exhibit the other essential features, including 21 nt
length and 5'-U nucleotide, that together define the 21U-RNA
class (see Figures S1A-S1C available online).
21 U-RNAs with strong upstream motif matches were concentrated in two broad regions along chromosome IV (Figure 1C;
Ruby et al., 2006). Supporting the potential importance of this
motif in 21 U-RNA biogenesis, the motif score strongly correlated
with the magnitude of 21 U-RNA expression, as indicated by the
number of sequenced reads in our data sets (Figure 1D). Despite
the presence of many high-scoring 21 U-RNA motifs in orthologous regions of the C. briggsae genome, the 21 U-RNA sequences themselves were not conserved. Even in rare cases in
which the core of the upstream motif was perfectly aligned to
a high-scoring motif within a syntenic region of the C. briggsae
genome (Blanchette et al., 2004), the sequence of the consequent 21U-RNA was essentially nonconserved (Figure 1E).
Only approximately 6% of the 21U-RNA loci and/or motifs
were unambiguously aligned within syntenic regions in C. briggsae. In these few cases, this was often due to overlap with annotated coding exons, which rarely contain 21 U-RNAs (Figure S1 D).
The only portion of the 21 U-RNA flanking regions with elevated
conservation frequencies above background was the 8 nt core
of the upstream motif (Figure S1 E).
21U-RNAs Are Expressed in the C. elegans Germllne
The developmental dynamics of 21 U-RNA expression were
examined by northern blot analysis using probes specific for
68 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc.
35K25K15K4
0
2M
I
I
I
4M
6M
8M
E
I
IM
Chrlv
coordiats
12M
14M
16M
23reeds
aanem
eno.2gc!!
'13
k
1111
I Mill111111
I
Zs~ ?C5
Il II
e31
seMCAnemca
rerme
Il 1111
I I 1:
--
-
I
lI
-
1
...
CA.
156red
221n
amsmcasermre~aenceeceremm~ACc
L.Th
lil
iiIIIII II
I
SenesseUmmasssemc..
I
I I
Il
II
iI
I
I til il
Figure 1. 21U-RNAs Can Be Distinguished from Other RNA Species
by Their Lengths and Upstream Motif Matches
(A)Aschematic representation of the 21 U-RNA upstream motif as described
previously (Ruby et al., 2006).
(B)The number of 21 nt RNA reads (blue) or unique loci (pink) corresponding to
each upstream motif score (rounded to the nearest unit). Ascore cut-off of 7
(orange) defined the 21 U-RNA population.
(C)The distribution of 21 U-RNA reads across chromosome IV.Normalized
read counts were summed for each nonoverlapping 100 kb bin (blue).
(D)Correlation between the upstream motif score and the magnitude of 21 URNA expression. For each three-bit bin of motif scores, the number of reads
was determined for every experimentally identified 21 U-RNA locus. The median read number is plotted, and the 25th and 75th percentiles are indicated
(error bars), as isthe number of loci in each bin.
(E)Two 21 U-RNA loci whose core upstream motifs are aligned (Blanchette
et al., 2004). The core motif (green) and 21 U-RNA loci (pink) are highlighted.
The C. briggsae 21U-RNA was annotated based on the highest-scoring 5'
end corresponding to the conserved core motif. The number of reads from
C.elegans isindicated, as isthe motif score for each 21 U-RNA ortholog.
21U-RNA-1 and 21U-RNA-3442. Both small RNAs were expressed at low levels from the Li to L3 stage, began to accumulate to high levels during the L4 stage, and reached maximal expression in the young adult and gravid adult stages (Figure 2A).
This pattern of expression correlated with the proliferation of
the germline and was consistent with a germline origin. Both
RNAs were expressed at approximately equal levels in maleor female-enriched populations (Figure 2B) but were absent in
....
..
............
....
.....
...........
..
.............
......
PR
E
S
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
machinery, we systematically examined RNA prepared from mutant strains lacking specific components of the RNAi pathway.
The accumulation of 21 U-RNAs did not require the wild-type activities of any of the previously described RNAi pathway components, including DCR-1 (Figure 3A, left, and Figure S2).
21UR-3442
To determine if accumulation of 21 U-RNAs is dependent on
any AGO proteins, we also analyzed mutant strains representing
miR-6
all of the C. elegans AGO family members, including several mulSL1
tiple mutant strains. Only prg-1 mutants lacked 21U-RNA-1 and
21 U-RNA-3442 (Figure 3A, right, and data not shown). Strains
mutant for prg-2, a nearly identical homolog of prg-1, did not
exhibit defects in 21U-RNA expression (Figure 3A, right). We
observed no defects in miRNA expression. However, we did
note two 21 U-RNAs that appear to have been misannotated as
41%
miRNAs (see Supplemental Results). Moreover, prg-1 mutants
4%
exhibited a wild-type RNAi response to foreign dsRNA (data
not shown). These findings suggested that prg-1 was defective
specifically in the 21 U-RNA pathway.
Consistent with the genetic requirement of prg-1 for 21U-RNA
accumulation, the stage-specific expression of PRG-1 protein
was coincident with that of 21U-RNA-1 and 21U-RNA-3442.
PRG-1 levels were reduced in L1/L2 and L2/L3 worms when
compared with L4 worms, as well as young and gravid adults
Figure 2. 21U-RNAs Are Expressed in the C.elegans Germline
(A)RNA isolated from synchronized wild-type populations at the indicated (Figure 3B). As observed for 21U-RNAs, we could also detect the
developmental stages analyzed on a northern blot, successively probing for
PRG-1 protein in embryo extracts, and we were unable to detect
two 21 U-RNAs, a miRNA, or a loading control (the SL1 precursor).
PRG-1 inthe glp-4(bn2) mutant strain, suggesting that this protein
(B)RNA isolated from wild-type worms, compared to that obtained from muis expressed in the germline. PRG-1 was also present in protein
tant strains glp-4(bn2) and eft-3(q145), which lack a germline; fog-2(q71),
extracts
from both female- and male-enriched populations. Curia male-only population; and fem-1(hc17), which lack sperm, analyzed as in (A).
(C)The expression profile for the bulk population of 21 U-RNAs as determined ously, the expression of prg-1 was reduced in wild-type worms
by large-scale sequencing. Plotted for each library isthe percent of reads that cultured at 25*C (Figure 3B). Analysis of the expression of the
represented 21 U-RNAs. Some libraries were prepared for sequencing starting prg-1/prg-2 mRNA by real-time PCR revealed an expression patwith Rnl2(1-249) ligase (light blue), and others were prepared starting with T4 tern similar tothatobservedfor the PRG-1 protein. The onlyexcepRNA ligase 1 (dark blue; see Experimental Procedures).
tion observed was in the embryonic stage (Figure 3B). Although we
could detect a high level of the PRG-1 protein in embryos, the
mRNA
was almost undetectable, supporting the idea that PRG-1
RNA samples prepared from germline-deficient glp-4(bn2) and
eft-3(q145) mutant populations (Figure 2B). Finally, both small complexes in embryos are parentally derived.
In wild-type worms, we observed a striking localization of
RNAs were present in embryos (Figure 2A), which may reflect
PRG-1 inthe cytoplasm and in prominent cytoplasmic structures
maternal and/or paternal loading.
High-throughput sequencing indicated that the developmen- in germ cells at nearly all stages of germline development. In both
hermaphrodites and males, PRG-1 formed perinuclear foci in
tal expression profile for the entire class of 21 U-RNAs was indistinguishable from that of 21 U-RNA-1 and 21 U-RNA-3442 both the mitotic and meiotic zones of the germline (Figures 3C
(Figure 2C). The number of sequenced reads for each 21 U- and 3D). In mature oocytes the staining persisted, but PRG-1
RNA species increased dramatically in late larval and adult foci lost their perinuclear association and became dispersed in
stages. Furthermore, the number of reads was reduced (130- the cytoplasm (Figure 3C and data not shown). In males, all
fold), from 5.8% to just 0.04% of total reads, in animals lacking PRG-1 staining disappeared abruptly as spermatids matured
a germline (Figure 2C). Adult hermaphrodites switch to an exclu- (Figure 3D). The pattern of PRG-1 localization, including its localsively female mode of gametogenesis and store only 200 to 300 ization during embryogenesis (Figures 3E and 3F), resembled
mature sperm. The relative abundance of various individual 21 U- that of P granules, which are components of the C. elegans
RNA species was comparable between male and adult her- germline cytoplasm, or nuage (Strome and Wood, 1982; Strome,
maphrodite populations, suggesting that very similar 21 U-RNA 2005). Indeed, the localization of PRG-1 perfectly overlapped,
populations are present in germlines undergoing oogenesis throughout development, the localization of the previously described P granule component, PGL-1 (Kawasaki et al., 1998;
and spermatogenesis.
Figure 3G; and data not shown).
A
21UR-1
PRG-1 Is Expressed in the Germline
and Required for 21U-RNA Accumulation
To examine whether the accumulation of 21 U-RNA-1 and 21 URNA-3442 was dependent on known components of the RNAi
21U-RNAs Depend on and Interact Physically
with PRG-1
To determine whether PRG-1 is required more broadly for 21 URNA accumulation, we performed high-throughput sequencing
Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 69
...................
.- - - ............
UP
R
E
S
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
21UR-3442
--
SLi
2
U,
Em
4
2_
**-
m
u
KEED.
analysis on small-RNA populations prepared from prg-1 mutant
animals and from wild-type animals reared at 20*C. For wild-type
animals approximately 11% of the 1,789,450 genome-matching
reads corresponded to the 21 U-RNAs, whereas forprg-1 mutant
animals less than 0.05% of the 1,774,442 genome-matching
reads corresponded to 21 U-RNAs (Figure 4A). This dramatic reduction in 21 U-RNAs resembled that observed in animals lacking a germline altogether (Figure 4B). However, prg-1 animals
maintained at 200C were fertile and exhibited nearly wild-type
levels of another class of germline-enriched small RNAs, the endogenous siRNAs (Figure 40). These findings indicate that prg-1
is required for the accumulation of the entire 21 U-RNA class of
small RNAs.
To examine whether the 21U-RNAs physically interact with
PRG-1, we immunoprecipitated the PRG-1 protein complex
along with associated RNA. Both 21U-RNA-1 and 21U-RNA3442 coprecipitated with the PRG-1 immune complex but not
with precipitates recovered using preimmune serum
(Figure 4D). Small-RNA species that did not require PRG-1 activity for accumulation, such as miR-66, were not detected in PRG1 immunoprecipitates (Figure 4D). In contrast, we found that
ALG-1/ALG-2 AGO-associated immune complex contained
miR-66 but not 21 U-RNA-1 or 21 U-RNA-3442 (Figure 4D).
Biochemical analysis of small RNAs recovered in the PRG-1 IP
complex demonstrated a strong bias for small RNAs with 5' U
(>91%) compared to the total input population, which was enriched for 5' G (>70%; Figure 4E). Similarly, deep sequencing
of small-RNA libraries prepared from the IP sample demonstrated a dramatic enrichment for 21 nt RNAs with 5' U in the
70 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc.
.
)
A
Figure 3. PRG-1 Protein Is Expressed in
the Germline and Required for 21U-RNA
Accumulation
(A)Northem blot analysis of 21U-RNA-1, 21URNA-3442, and miR-66 expression in wild-type
and the indicated homozygous strains. The double
mutant wasprg- 1(tm872); prg-2(tm 1094). The SL1
precursor served as a loading control.
(B)The PRG-1 developmental expression profile.
Protein lysates generated from wild-type populations at distinct developmental stages were analyzed using a westem blot (top left), as were protein lysates from wild-type worms and from the
mutant strains examined in Figure 2B (top right).
Tubulin served as a loading control. Expression
of prg-1/prg-2 mRNA was analyzed by quantitative real-time PCR, using actin (act-3) mRNA as
the normalization standard (bottom panel).
(C-F) PRG-1 immunofluorescence (red) and DNA
DAPI staining (blue) in dissected gonad arms
from an adult hermaphrodite (C)and male (D),
a two-cell embryo (E), and a four-cell embryo (F).
In (C)and (D)the mitotic (MPZ) and meiotic zones
(transition zone plus pachytene) are indicated, as
are the proximal zones containing oocytes and
sperm (respectively).
(G) Dual immunofluorescence analysis of three
oocytes in the proximal arm of a wild-type hermaphrodite gonad stained for PRG-1 and PGL-1
as indicated. Yellow represents overlap in the
merged image (bottom panel).
PRG-1 complex (Figure 4F). In addition, 21mers with high-scoring motif matches were dramatically enriched in the IP sample
(Figure 4G) and mapped comprehensively across the previously
described 21 U-RNA clusters on chromosome IV (Figure 4H). No
other RNA species was significantly enriched in the PRG-1 IP.
The above observations suggest that PRG-1 specifically binds
21 U-RNAs to form a complex important for germline function
and fertility.
prg- 1 Mutants Exhibit a Broad Spectrum
of Germline Defects
A previous study demonstrated that RNAi targeting both prg-1
and prg-2 leads to reduced fertility (Cox et al., 1998). Our examination of the phenotypic contributions of recently identified
probable null alleles revealed that most, if not all, of the germline
defects result from the absence of prg-1. For example, prg-2
mutants exhibited wild-type brood sizes at both 200C and
250C (Figure 5A) as well as normal numbers of morphologically
wild-type germ cells (compare Figures 5B and 5C). In contrast,
prg-1 mutants exhibited dramatically reduced fertility at both
temperatures (Figure 5A). Consistent with this phenotype, two
different prg-1 mutant strains and a prg-1; prg-2 double mutant
strain all exhibited a significant reduction in the total number of
germ nuclei populating the adult gonad (Figures 5D-5F). The
numbers of germ nuclei were reduced in each zone but were
most dramatically reduced in the mitotic zone in these mutants.
The reduction in germ cell numbers was observed at all temperatures, and thus does not by itself explain the sterility of prg-1
mutants at 250C.
..
..
....................................
. ......
.
S
PR
E
Molecular Cell
21U-RNAs Are C.elegans piRNAs
IP
endo-SiRNMAS
21U-RNAS
21UR-1
21UR-3442
miR-66
a-PRG-1
ct-GFP
(GFP:ALG-1/2)
Ice
VVV11"_
21U-RNA upstream motif score
F
E
a S'A E S' U
H
0 5 G a 5' C
Input
A
S16
171819 20 2122 23 24 25 26
1P
16 17 18 19 20 2122 23 24 25 26
Length (nt)
-30 -20 -10 0 10 20 30
21 U-RNA upstream motif score
Figure 4. PRG-1 Interacts with and Is Required for the Accumulation of All 21U-RNAs
(A)The percentage of 21 nt RNA reads from wild-type young adults (blue) and prg-1(tm872) young adult (pink) corresponding to each upstream motif score
(rounded to the nearest unit). Ascore cutoff of 7 (orange) defined the 21 U-RNA population.
(B)Severe depletion of 21 U-RNAs inglp-4(bn2) and prg- 1(tm872) mutant worms. Plotted for each library isthe fraction of reads corresponding to 21 U-RNAs, with
bars colored as in Figure 2C.
(C)Severe depletion of endogenous siRNAs ingfp-4(bn2)but not prg-1(tm872) mutant worms. Plotted for each library isthe fraction of reads with 5' Gnucleotides
and complete antisense overlap with coding exons (Ambros et al., 2003; Ruby et al., 2006), with bars colored as in Figure 2C.
(D)Immunoprecipitation (IP)analysis of small RNAs in PRG-1 and GFP::ALG1/2 complexes. Immunoprecipitations were performed on lysates prepared from an
otherwise wild-type transgenic strain carrying GFP-tagged ALG-1 and ALG-2. The top panels show a northem blot successively probed for the indicated small
RNAs. The lower panels show westem blots probed as indicated.
(E)Biochemical analysis of the first nucleotide of the small-RNA population that coimmunoprecipitated with the PRG-1 protein (IP). Bars show where the single
nucleotides migrate in this thin-layer-chromatography system.
(F)The length and 5' nucleotide distribution of reads from the input (top) and PRG-1 co-IP (bottom) libraries. To prevent underrepresentation of endogenous
siRNAs, which usually begin with a 5' triphosphate, these libraries were constructed using a protocol that does not require a 5' monophosphate.
(G)The percentage of 21 nt RNA reads from the input (blue) and PRG-1 co-IP (red) libraries at each upstream motif score, plotted as in (A).
(H)The mapping of 21 U-RNA reads from the PRG-1 co-IP library (red) versus the young adult wild-type library prepared starting with T4 RNA ligase 1 (see
Experimental Procedures; blue). Reads were classified as 21 U-RNAs by their motif scores, and normalized read counts were summed for each nonoverlapping
100 kb bin.
Although prg-1 mutants exhibit temperature-dependent sterility, they do not appear to encode thermo-labile products.
Rather, both alleles examined in this study are likely to represent
null mutations (Yigit et al., 2006; Cuppen et al., 2007; Figure S3A).
As expected for null mutants, the PRG-1 protein was either absent or truncated in these mutant strains at all temperatures
(Figure S3B). Furthermore, the 21 U-RNA depletion associated
with prg-1 mutants was observed at all temperatures examined,
including the semipermissive temperatures of 150C and 200C.
These findings suggest that, in addition to their role in maintaining proper germ-cell numbers at all temperatures, PRG-1/21 URNA complexes may function at higher temperatures to facilitate
an otherwise temperature-dependent germline process required
for normal fertility.
Temperature-shift experiments demonstrated that the temperature-sensitive period of prg-1 mutants occurs during the
adult stage. The fertility of animals shifted down from 250C as
young adults was substantially rescued, to an average of 40
Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 71
............
_;::.. ...........
..
UP
R
E
S
---_-.............
- __ __
I
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
Figure 5. PRG-1 Exhibits a Broad Spectrum of Germline Defects
(A)Brood size analysis of prg-1 and prg-2 mutant strains. The brood size of "n" individual animals for each strain was determined at 200Cand 250C. Left and right
lines represent the highest and lowest values, respectively. Left and right ends of each box represent the 75th and 25th percentile, respectively; the diamond
represents the average brood size; and the vertical line inside the box represents the median value.
(B-F) DAPI staining of excised gonads from wild-type, prg-1, and prg-2 strains (as indicated). Gonadal zones are indicated as in Figure 3.
progeny (n= 10). Conversely, maintaining animals at 150C during prg-1 Mutants Exhibit Surprisingly Subtle Changes
the Li to adult stage, when the germline is proliferating most rap- in Gene Expression
idly, did not significantly rescue the fertility defect. These results On chromosome IV hundreds of protein-encoding genes are insuggest that the germ cells produced in prg-1 null mutant ani- terspersed with intergenic and intronic 21 U-RNA loci over genomals (that entirely lack PRG-1 protein expression) are deficient mic regions that are millions of base pairs in length. Therefore,
in a process important for their functionality at elevated temper- tiling arrays were used to profile changes in gene expression to
ature.
determine whether the absence of 21 U-RNAs in prg-1 mutants
To examine the relative contribution of defects in sperm ver- might cause significant perturbations of gene expression either
sus oocytes to the reduced fertility of prg-1 mutants, mutant on this autosome or elsewhere. We found that prg-1 and wildhermaphrodites raised at 250C were mated to wild-type males. type animals have broadly similar patterns of gene expression.
The temperature-dependent sterility of prg-1 was partially res- Notably, genes located near 21 U-RNA loci, including genes locued, as the average number of prg-1 progeny produced by cated within and around the major clusters of 21 U-RNA loci on
animals reared at 250C was 3 (n = 10), but this number in- chromosome IV,were not significantly altered in their expression
creased to 19 (n = 10) when prg-1 mutants were mated with (Figure 6A). Among 88 groups of developmentally coregulated
wild-type males. These findings suggest that the fertility defects genes, also referred to as gene "mountains" (Kim et al., 2001),
of prg-1 hermaphrodites stem, in part, from defects in the 66 were essentially unchanged between the wild-type and
production and/or functionality of both the male and female prg-1 strains (Figure 6B). Among the 16 mountains with degametes.
creased expression in prg-1 mutants were several mountains
In summary, prg-1 mutants exhibit dramatically reduced with germline functions such as cell division and oogenesis.
germ-cell numbers at all temperatures, and the gametes pro- Among the six mountains with increased expression was one
duced are markedly more sensitive to temperature than are containing spermatogenesis-related genes.
those of wild-type animals. For example, at 250C wild-type aniIn C. elegans a large class of RdRP-derived endogenous siRmals produce ~200 progeny, about two-thirds of the brood NAs (endo-siRNAs) target transposons and repetitive sequences
size observed at 200C, while prg-1 mutants produce an average as well as numerous protein-encoding genes (Ambros et al.,
brood size of only 3 progeny at 250C, less than one-tenth the 2003; Ruby et al., 2006; W.G. and D.C., unpublished data). Albrood size of 40 observed at 200C. This reduction in brood though PRG-1 does not appear to interact directly with small
size at higher temperature correlates with a reduction inthe num- RNAs of this type (Figure 6C and Tables S2 and S3), we wonber of embryos observed, consistent with the idea that ovulation dered whether 21U-RNAs might be linked, perhaps indirectly,
or fertilization are impaired at higher temperature.
to changes in the patterns of endo-siRNA expression. In many
72 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc.
.....
........
N.,
...............................................
................
........
..............
......
....
..
......
.......
PR
E
S
Molecular Cell
21U-RNAs Are C. elegans piRNAs
modified and resistant to periodate degradation (reviewed in
Klattenhoff and Theurkauf, 2008). The C. elegans 21U-RNAs
share these characteristics but also exhibit several other unique
properties (Ruby et al., 2006). Perhaps the most remarkable distinction is that 21 U-RNAs originate from thousands of loci that
frequently share a common upstream motif and are clustered
in two large regions of one autosome. Within these two large regions of two million and four million base pairs, respectively, the
21 U-RNA loci are interspersed on both strands and rarely overlap with each other, repeat elements, or coding regions. Instead
they localize to introns and intergenic regions within these chromosomal regions at an average density of one 21U-RNA locus
every 200-300 bp.
In other organisms, piRNAs lack discemable upstream motifs
and are often found in much smaller clusters dispersed on all
chromosomes. In flies a subgroup of piRNAs, originally termed
repeat-associated siRNAs (rasiRNAs), are derived primarily
from within repeats and transposons and appear to target transposons for silencing (Brennecke et al., 2007; Gunawardane et al.,
2007; Saito et al., 2006). Furthermore, unlike 21 U-RNAs, repeatassociated piRNAs derived from opposite strands frequently
overlap.
In mammals, two types of piRNA clusters have been identified
based on their temporal expression during spermatogenesis.
Similar to Drosophila rasiRNAs, piRNAs expressed prior to meiotic pachytene in mice are derived from repeat- and transposonrich clusters. These rasi-like piRNAs interact with the MILl AGO,
which is expressed in the same developmental stages (Aravin
et al., 2007). During pachytene a second type of piRNA becomes
abundant, which is derived from clusters that differ from both
21 U-RNA clusters and rasiRNA clusters. These pachytene
piRNA clusters span tens of thousands of bases-the length of
DISCUSSION
a typical pre-mRNA transcript. Within these clusters the piRNAs
AGO-protein/small-RNA complexes mediate biological activities exhibit remarkable strand bias, as though all the piRNAs within
that fall into the two broad categories of genomic surveillance a region are processed from a single RNA-Polymerase Il tranand gene regulation. Several studies suggest that a metazoan- script or from two divergent transcripts (Aravin et al., 2006; Girspecific branch of the AGO family, called the Piwi AGOs, have ard et al., 2006; Grivna et al., 2006; Lau et al., 2006). In contrast,
become specialized to provide surveillance functions required neighboring 21 U-RNA loci, even those within the same intron of
for germline maintenance in animals (reviewed in Aravin et al., an annotated gene, appear to have autonomous biogenesis,
2007). C. elegans contains one of the largest and best studied each with their own 5' motif and deriving from the opposite
families of AGO proteins. Yet, beyond a general requirement strand about as often as from the same strand.
Despite these striking differences, mammalian pachytene piRfor fertility (Yigit et al., 2006), the function of C. elegans Piwi-related AGOs and the nature of their small-RNA cofactors had NAs are similar to 21U-RNAs in one very intriguing way. Both
not been explored. We have shown that PRG-1, a Piwi subfamily types of small RNA encode tremendous sequence diversity
AGO, interacts with 21 U-RNAs, which are encoded by over 15 and yet seem to lack obvious targets. In general, 21 U-RNAs do
thousand genomic loci broadly clustered in two regions of chro- not match repeat sequences or protein coding genes with a fremosome IV.These findings link this unusual class of small RNAs quency any higher than that expected by chance.
to an RNAi-related pathway and suggest that PRG-1 and 21 URNAs form an RNP complex required for proper germline devel- Piwi-AGO Complexes Exhibit a Conserved Localization
opment. The sequence repertoire of 21 U-RNAs appears to be in Germline Nuage
more diverse than expected by chance, and, with the exception We have shown that the PRG-1 protein localizes to the germline
of Tc3 discussed below, obvious sequence-specific targets for nuage, called P granules, in C. elegans. In other animals, Piwi
AGOs show similar localization. In both Drosophila (AGO3 and
21 U-RNAs are not found inthe C. elegans genome.
Aubergine) and zebrafish (Ziwi), Piwi proteins localize to perinuclear nuage structures (Brennecke et al., 2007; Houwing et al.,
piRNAs in Worms, Flies, and Mammals
Piwi AGOs bind small RNAs (piRNAs) with the following charac- 2007). A third Piwi protein from Drosophila, Piwi itself, exhibits
teristics: a Dicer-independent biogenesis, a 5' end with a mono- a more complex distribution, localizing to the nuclei of both
phosphate and a strong bias for Uracil, and a 3' end that is germ cells and somatic cells (Brennecke et al., 2007; Cox
instances, changes in endo-siRNA levels correlated inversely
with changes in gene expression from the corresponding interval
(Figure 6D and Table S4). However, the regions with significant
changes in endo-siRNA levels were not correlated with regions
containing 21U-RNAs or sequences with extended sequence
similarity to 21 U-RNAs.
One curious exception to this finding was the transposon Tc3,
within which resides a single 21 U-RNA. Found in all 22 Tc3 genomic loci, 21 U-RNA-15703 overlaps the 3' inverted repeat (IR)
downstream of, and in the same orientation as, the transposase
gene (Figure 6E). This sequence was identified three times
among two million reads in our small-RNA library prepared
from the PRG-1 immune complex, an apparent enrichment
when compared to only 12 reads in over thirty million from the remaining non-IP-associated data set. Examination of the endosiRNA profile across a representative Tc3 element revealed
two types of endo-siRNA reads. The first were antisense to the
transposase gene and were unaffected in prg-1(tm872) mutants
(Figure 6F). The second were directed, with a marked strand
asymmetry, toward the Tc3 IR regions and were severely depleted in prg-1(tm872) mutants (Figure 6F). Neither the IR-directed nor the transposase-directed siRNAs exhibited coimmunoprecipitation with PRG-1 (Figure 6G). Although the numbers of
endo-siRNAs targeting the transposase gene were not significantly reduced in prg-1, we nevertheless observed a 3- to 4fold upregulation of the Tc3 transposase mRNA (Figure 6H). Upregulation of the transposon mRNA, as well as a greater than
100-fold increase in Tc3 transposition frequency, were also observed for two different prg-1 mutant alleles in a parallel study
(Das et al., 2008 [this issue of Molecular CellJ; see Discussion).
Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 73
.................
...................................
UP
R
E
S
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
Lo2pobefenrchmenft
vs.wT
in pip-I(1n872)
-b 0 1 2
invened rmpegt
TbsA trmpone gene
Invered repeat
31
.~hL.LL...L.LL..ni
All1 ....
Exonic
All
21U-rlch
besL 21U-poor
probes
1+-I
1039034RNA 30
52938
986096
library
mens. sran Uwneestrn
B
Rb complex
604
DNA synthesis i~-I
2704
Oocyte-enriched i-+16074
Mount 07
E+i37300
396
Mount 32 -+Cyclin
432
Topoisomerase i+656
Histone i+1192
Mitosis
5387
DNA repair -1
3089
Mount 11 I+38381
Germ line-enriched
33261
+
Hermaphrodite-enriched I--4
4435
Melosis
1681
1893
Chromatin
Programmed cell death I+675
66 mountains
Mount 16"
Mount 041
Sperm-enriched
Mount 361
Mount 291
Mount 35
I--I
---
i-.
9844 5
2
0
oI00
200
0-
prg-1(tm872) sIRNA library
01
_
300-
43605
27892
300
100
i
200
1356
C=
Ratio: Input vs IP
0:10
1:92:83:74:85:56:4-
7:3-
*-
8:2
9:1
10:0
.
** *
.fM.k
EndosIRNA pergene 21U-RNAs
n - 329 gene
n 96 21U-RNAs
Log probeenrichment
in pig-1 (n872) vs.WT
-1 0 1 2 3
0:10L
50
o-1Plibrary
PG-1
01-
1:928 .
2:8-
3:7
23
4:6-*
Wild Type*
prg-1(pk2298)
7:3-
prg-1(Wm872)
9:1
',
10:0-
-I-
Fold change In TC3A transposase mRNA
Figure 6. prg-1 Mutants Exhibit Surprisingly Subtle Changes in Gene Expression
(A)Gene expression was not preferentially affected inthe 21 U-rich portions of the C.elegans genome. For each of the indicated probe sets, median values are
shown with error bars indicating 25th and 75th percentiles and "n" indicating the number of probes.
(B)The overall expression of some gene mountains was significantly altered inthe prg-1 (tm872) mutant. All probes overlapping the exons of all genes from each
mountain (Kim et al., 2001) were considered, and median log-fold changes were plotted as in (A)for those mountains changing by 0.4 log 2 units.
(C)21 U-RNA depletion intheprg-1(tm872) mutant and enrichment inthe PRG-1 co-IP. The x axis indicates the ratio of read frequencies between the input versus
PRG-1 co-IP libraries described inFigures 4F-4H. The y axis indicates the ratio of antisense read frequencies between the wild-type and prg-1(tm872) mutant
siRNA-enriched libraries (made using a protocol that does not require a 5' monophosphate and therefore captures endogenous siRNAs beginning with a 5' triphosphate). Each blue dot indicates the antisense read count for one gene whose wild-type siRNA-enriched read count is t500. Each red dot indicates the read
count for a 21 U-RNA species with 200 reads from the young adult wild-type library prepared starting with T4 RNA ligase 1(see Experimental Procedures)and at
least one read between the two libraries of each plot axis.
74 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc.
.
.....
......
.......
PR
E
S
Molecular Cell
21U-RNAs Are C. elegans piRNAs
A
IddW
endoWnowulNA
I
o
Gtmwaninsam
afiuelmN
.7G
PAW=
kwedsd
Repo*t(E)
Tc3A rwupasa gene
U-21UA"ISM
FI "n Vansct
PRG-1/21U-RN-15703
m7
*
"P
INregulation
N
Figure 7. Models for 21U-RNA Function
(A)Regulation of TC3 inverted repeats by PRG-1/
21U-RNA-15703.
(B)Regulation of germline transcripts by imperfect
base paring.
i*o
mwftwe
Tc3ATransposaserMA
brwrlsd Repo*(QR)
......
..
.........
..............
IRregiatofi
IRtaifioaded
second.yWAGo
recognitiondflR
Negateegulhtondftrancrptontransposese
et al., 2000). In mice, the localization of Miwi and Mili has been
analyzed, and, although their expression peaks at different
times, both are cytoplasmic proteins present in developing spermatids but absent in mature sperm (Deng and Lin, 2002; Kuramochi-Miyagawa et al., 2004).
A striking feature of PRG-1 localization was its presence in P
granules throughout development. In germline stem cells and
developing gametes of C. elegans, P granules are localized in
a perinuclear pattern and are often found in apposition to nuclear
pores (Pitt et al., 2000). They are thought to function inthe sorting
and storage of messages involved ingametogenesis and insubsequent parentally programmed zygotic development (Strome,
2005). In the fertilized egg and early embryo, the Pgranules dissociate from the nuclear periphery and are distributed in the
cytoplasm. In the male germline, P granules are present in dividing stem cells as well as meiotic spermatocytes but rapidly disappear as the spermatids mature. Finally, similar to other organisms
where piRNA expression correlates tightly with the expression of
their Piwi-class AGO binding partners (Aravin et al., 2006; Girard
et al., 2006; Houwing et al., 2007), the expression of 21 U-RNAs
closely correlated with the expression of PRG-1.
A Potential Role for 21U-RNAs in Tc3 Silencing
In C. elegans, members of an expanded worm-specific AGO
clade (the WAGOs) are required for the majority of transposon
silencing and appear to function with RdRP-derived siRNAs
(Tijsterman et al., 2002). Surprisingly, the
silencing of a single transposon family,
Tc3, appears to depend on both WAGO
family members (Vastenhouw et al.,
2003) and on prg-1 (Das et al., 2008).
We found a single 21U-RNA, 21URNA-15703, that mapped to Tc3. This
21U-RNA appeared enriched - among
small RNAs recovered from the PRG-1
immune complex but was located downstream of the transposase 3'UTR in the
sense orientation and thus could not directly silence the transposase mRNA. Interestingly, 21 U-RNA-15703 was located just upstream of a series of siRNAs associated with the Tc3 inverted repeats (IR). The
production of IR-associated siRNAs depended onprg-1 but also
required the activities of two RdRPs and of an AGO inthe WAGO
clade (data not shown).
The production of the PRG-1 -dependent IR-associated siRNAs could be explained by a two-step model similar to one previously described for RDE-1-directed silencing in C.elegans (Yigit et al., 2006; Sijen et al., 2007; Pak and Fire, 2007). If a PRG-1
complex containing 21U-RNA-15703 were to cleave a target
RNA that extended into Tc3 from the downstream genomic region (Figure 7A), it could create a template for the RdRP-dependent synthesis of the secondary IR-associated siRNAs. How the
loss of these IR-associated siRNAs might lead to activation of
Tc3 in prg-1 mutants remains unclear. Perhaps their loss leads
to alterations in chromatin structure in the IRs or to changes in
the expression of IR-associated regulatory transcripts. Such
changes could explain the 3- to 4-fold increase in transposase
mRNA levels observed by qRT-PCR and might also render the
IR genomic regions more accessible for transposase-directed
endonucleolytic cleavage. The notion that PRG-1 may serve as
an upstream AGO capable of triggering secondary siRNA production has implications for how other 21 U-RNAs may function
and could explain how loss of an exceptionally low-abundance
21 U-RNA could cause the 100-fold increase in transposition of
Tc3 (Das et al., 2008).
(D)Changes to mRNAs compared to their corresponding siRNA in the prg-1(tm872) mutants. Each point indicates a gene with 10 array probes and >500
antisense reads from the WT siRNA-enriched library overlapping annotated exons. The x axis isas in (A). The y axis isas in (C).
(E)Aschematic view of a full-length Tc3 transposon showing the inverted repeats (gray) and Tc3A transposase gene (red). The position of 21 U-RNA-15703 is
indicated with a red asterisk.
(F)Density of reads mapping to the sense (blue) and antisense (orange) strands of the Tc3 element from (E).Reads per 50 nt window are plotted for the wild-type
(top)and prg- 1tm872) mutant (bottom) siRNA-enriched libraries. Read counts are not normalized to the number of genomic matches. Dashed gray lines indicate
0.002% of each library.
(G)Density of reads mapping to the sense (blue)and antisense (orange) strands of the Tc3 element from (E).Reads per 50 nt window are shown from the input (top)
and PRG-1 co-IP (bottom) libraries. Read counts are not normalized to the number of genomic matches. Dashed gray lines indicate 0.002% of each library.
(H)Expression of the TC3A mRNA. Primers recognizing TC3A mRNA were used in quantitative RT-PCR on mRNA generated from worms with the indicated
genotypes, using actin (act-3) mRNA as the normalization standard.
Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 75
.....
... ......
CeUl
P R E S S
A Conserved Function for piRNA Complexes
in Maintaining Pluripotency
Despite differences in their expression and the types of clusters
from which they derive, our findings suggest that the overwhelming majority of 21 U-RNAs and the abundant pachytene piRNAs
of mammals share some intriguing similarities. Perhaps most notably, they share the confounding feature that, with few exceptions, they lack recognizable targets upon which they might specifically act. Although a number of genes exhibit changes in
expression inprg-1 mutants, these changes could easily reflect
alterations that arise indirectly. A parallel study has suggested
that spermatogenesis-related gene expression is downregulated in prg-1 mutant males (Wang and Reinke, 2008). Conversely, our studies revealed an apparent upregulation of several
spermatogenesis-related genes in prg-1 mutant hermaphrodites. However, inthese instances, unlike the Tc3 example, there
is no direct evidence linking specific 21U-RNAs to the regulated
genes, therefore it seems probable that these apparent discrepancies reflect indirect consequences of developmental defects
and changes in germ-cell number that occur in the prg- 1 mutant
gonads. Overall, our analyses suggest that there is no correlation
between genes whose expression is altered in prg-1 mutants
and the proximity of those genes to 21 U-RNA loci.
One possible model to explain this paradox is to imagine that
PRG-1/21 U-RNA complexes may base-pair imperfectly with targets. A precedent for this already exists with animal miRNAs and
most of their targets, for which pairing to miRNA seed nucleotides 2-8 is often sufficient for target recognition (Grimson
et al., 2007). However, if similar partial matches were sufficient
for piRNA-mediated regulation, then the entire transcriptome
could potentially be placed under 21 U-RNA-directed regulation.
Perhaps 21 U-RNAs act collectively, through partial sequence
matches, to negatively regulate gene expression broadly. For example, germline-expressed mRNAs recognized by 21U-RNA/
PRG-1 complexes could be stored in the cytoplasm (perhaps
within P granules) until a secondary factor releases repression
(Figure 7B). Such a mechanism would require the maintenance
of sequence diversity within the 21U-RNA family as a whole,
rather than conservation of specific 21 U-RNA sequences.
Out of more than 15,000 different 21U-RNAs encoded in
C. elegans, only one transposon-directed 21 U-RNA was identified, strongly suggesting that transposon silencing is not the
only function mediated by this ancient metazoan-specific group
of AGOs. It is interesting to note that many mammals, including
humans, have, at great apparent cost to their fitness (Werdelin
and Nilsonne, 1999), derived morphological adaptations that
place the male germline external to the body cavity. Perhaps
this adaptation is necessary to facilitate the same temperaturesensitive process in gametogenesis that is also facilitated in
part by PRG-1.
EXPERIMENTAL PROCEDURES
Worm Strains
The Bristol strain N2 was used as the standard wild-type strain. Alleles used in
this study are listed below, grouped by chromosome: LGI: glp-4(bn2), prg1(tm872), prg-1(pk2298), rde-3(ne3364), ego-1(om7l), rrf-1(ok589), nf2(pk2040); LGIl: rrf-3(pk1426); LGIII: dcr-1(ok247), rde-4(ne299), mut7(ne311), eft-3(q145), qC1[nes(myo2::avr-15, rol-6, unc-22(RNAi))]; LGIV:
76 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc.
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
fem-1(hc17), prg-2(ok1328), prg-2 (tm1094); LGV: fog-2(q71). AGO deletions
described in Ylgit et al. (2006) were also assayed for levels of 21U-RNA-1
and 21U-RNA-3442. C. elegans culture and genetics were as described in
Brenner (1974).
Antibody Generation
Anaspec generated and purified the PRG-1 antibody in rabbits using the following peptides: RGSGSNNSGGKDQKYL and RQQGQSKTGSSGQPQKC.
Biochemistry and Molecular Biology
Protein and RNA purifications were performed as described in Hutvagner et al.
(2004) and Duchaine et al. (2006), respectively. Antibodies used inthis study
are as follows: (1)monoclonal antibody anti-AFP 3E6 (Qbiogene), (2)an affinity-purified polyclonal anti-PRG-1 antibody, (3)HRP-conjugated secondary
antibody (Jackson Immunoresearch), (4) anti-tubulin (Accurate Chemical).
Northem blot analysis was performed as in Duchaine et al. (2006). A more detailed description can be found inthe Supplemental Experimental Procedures.
Quantitative Real-Time PCR
Real-time PCR was performed using Superscript Il Reverse Transcriptase (Invitrogen) and Applied Biosystems SYBR Green PCR Master mix according to
the supplier's instructions. Primer sequences are available upon request.
Immunostaining and Microscopy
Gonads were prepared for indirect immunofluorescence as in Pasierbek et al.
(2001) and incubated with primary antibody (K76 [Kawasaki et al., 1998] and
the anti-PRG-1 antibodies described above) ovemight at 4*C. Cy-3 antimouse IgM, and FITC or TRITC anti-rabbit secondary antibodies (Jackson Immunoresearch), were used to detect K76 anti-PGL-1 and anti-PRG-1, respectively. Slides were mounted inVectashield (Vector Labs) with DAPI. All images
were collected using a Hamamatsu Orca-ER digital camera mounted on
a Zeiss Axioplan 2 microscope and with Openlab software.
Small-RNA Cloning
Small endogenous C.elegans RNAs from embryos, five distinct larval stages
(L1,L2, L3, L4, and dauer), mixed-stage animals, young adults from glp4(bn2), prg-1(tm872), fog-2(q71) mutant backgrounds, and wild-type control
worms were prepared for sequencing using a protocol derived from Lau
et al. (2001). Libraries generated from wild-type and prg-1(tm872) were constructed as described by W.G. and D.C. (unpublished data). To generate
small-RNA libraries from PRG-1 immunocomplexes, PRG-1 IPs were performed on 70 mg of total wild-type protein as described in Duchaine et al.
(2006). For comparison, total RNA was extracted from a fraction of worms
equivalent to that used for the PRG-1 IPs. These small-RNA libraries were constructed using a method that does not require a 5' monophosphate (Ambros
et al., 2003. PCR products generated for all the samples described above
were sequenced on a Solexa sequencing platform (Illumina, Inc.) (Seo et al.,
2004). Detailed description of the cloning protocols, as well as data analysis,
can be found inthe Supplemental Experimental Procedures.
Biochemical Analysis of 5' Nucleotide
Small RNAs inthe 18-26 nt range, obtained from total RNA and the RNA fraction that coimmunoprecipitated with PRG-1, were gel purified, treated with
Calf Intestinal Alkaline Phosphatase (NEB) inthe presence of 1 Uof Super RNase Inhibitor (Ambion), and labeled at the 5' end with T4 Polynucleotide Kinase
inthe presence of yATP. The 5'end-labeled RNAs were gel purified and incubated with nuclease P1 (USBiological). Samples were spotted on a TLC plate
developed with 0.5 M lithium chloride.
Tiling Microarray Procedures
Total RNA was extracted as described above and prepared using the RiboPure total RNA isolation kit (Ambion). Labeling reactions were performed following the manufacturer's protocols with the GeneChip WT Double-Stranded
cDNA Synthesis Kit (Affymetrix), GeneChip Sample Cleanup Module (Affymetrix), and the GeneChip WT Double Stranded DNA Terminal Labeling Kit (Affymetrix). Array hybridization to GeneChip C.elegans Tiling 1.OR chips was done
using standard Affymetrix protocols and reagents. Signal values for each array
...............
:.:
.........................
PR
E
SU
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
probe were calculated using Affymetrix Tiling Analysis Software 1.1.2 (bandwidth: 30; intensities: PM/MM) with three replicates of prg-1 (tm872) experimental data sets and three control wild-type. Probe overlap with annotations
was assessed using the Affymetrix-provided ce4 coordinate, which indicates
the genomic position matching the center of the array probe.
ACCESSION NUMBERS
All RNA sequences extracted from Illumina reads as described were deposited
in the Gene Expression Omnibus with the following accession number:
GSE1 1738. Included under this accession number are the following data
sets: developmental time-course/mixed stage, 5' monophosphate-dependent; prg-1(ftm872) and fog-2(q71) mutant analysis, 5' monophosphate-dependent; prg-1(tm872) mutant analysis, 5' monophosphate-independent;
and the PRG-1 co-IP analysis. 21 U-RNA sequences are provided as a supplemental Fasta-formatted text file (Table S1). Tools for scoring 21 U-RNA loci
trained using data from Ruby et al. (2006) and applied here are available for
anonymous download at http://web.wi.mit.edu/bartel/pub/.
SUPPLEMENTAL DATA
The Supplemental Data include Supplemental Results, Supplemental Experimental Procedures, three figures, and four tables and can be found with this
article online at http://www.molecule.org/cgi/content/full/31/1/67/DC1/.
ACKNOWLEDGMENTS
We thank our labmates for many helpful discussions and comments on the
manuscript; Fan Zhang for her early efforts on this project; Eric Miska for sharing unpublished data; and R. Ketting, the CGC, and the C. elegans Gene
Knockout Consortium for providing strains. P.J.B. issupported by a predoctoral fellowship from Fundagio para Ciencia e Tecnologia (SFRH/BD/1 1803/
2003), Portugal. D.A.C. issupported by a predoctoral fellowship from Fundaeso para Ciencia e Tecnologia (SFRH/BD/1 7629/2004/H6BM). J.M.C. is an
HHMI fellow of the LSRF. C.C.M. and D.P.B. are Howard Hughes Medical Institute Investigators. This work was funded in part by the National Institutes of
Health (GM58800 and GM67031).
Received: December 21, 2007
Revised: June 3, 2008
Accepted: June 9, 2008
Published online: June 19, 2008
REFERENCES
Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. (2003).
MicroRNAs and other tiny endogenous RNAs in C. elegans. Curr. Biol. 13,
807-818.
Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino,
N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., et al.
(2006). A novel class of small RNAs bind to MILI protein in mouse testes.
Nature 442, 203-207.
Aravin, A.A., Sachidanandam, R., Girard, A., Fejes-Toth, K., and Hannon, G.J.
(2007). Developmentally regulated piRNA clusters implicate MILI intransposon
control. Science 316, 744-747.
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M.,
Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. (2004). Aligning
multiple genomic sequences with the threaded blockset aligner. Genome Res.
14, 708-715.
Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R.,
and Hannon, G.J. (2007). Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089-1103.
Brenner, S. (1974). The genetics of Caenorhabditis elegans. Genetics 77,
71-94.
Cox, D.N., Chao, A., Baker, J., Chang, L., Qiao, D., and Lin, H.(1998). Anovel
class of evolutionarily conserved genes defined by piwi are essential for stem
cell self-renewal. Genes Dev. 12, 3715-3727.
Cox, D.N., Chao, A., and Lin, H.(2000). piwi encodes a nucleoplasmic factor
whose activity modulates the number and division rate of germline stem cells.
Development 127, 503-514.
Cuppen, E., Gort, E., Hazendonk, E., Mudde, J., van de Belt, J., Nijman, I.J.,
Guryev, V., and Plasterk, R.H. (2007). Efficient target-selected mutagenesis
in Caenorhabditis elegans: toward a knockout for every gene. Genome Res.
17, 649-658.
Das, P.P., Bagijn, M.P., Goldstein, L.D., Woolford, J.R., Lehrbach, N.J., Sapetschnig, A., Buhecha, H.R., Gilchrist, M.J., Howe, K.L., Stark, R., et al.
(2008). Piwi and piRNAs act upstream of an endogenous siRNA pathway to
suppress Tc3 transposon mobility in the Caenothabditis elegans germline.
Mol. Cell 31, this issue, 79-90.
Deng, W., and Lin, H.(2002). miwi, a murine homolog of piwi, encodes a cytoplasmic protein essential for spermatogenesis. Dev. Cell 2, 819-830.
Duchaine, T.F., Wohlschlegel, J.A., Kennedy, S., Bel, Y., Conte, D.J., Pang, K.,
Brownell, D.R., Harding, S., Mitani, S., Ruvkun, G., Yates, J.R., Ill, and Mello,
C.C. (2006). Functional proteomics reveals the biochemical niche of C.elegans
DCR-1 inmultiple small-RNA-mediated pathways. Cell 124, 343-354.
Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, MA (2006). A
germline-specific class of small RNAs binds mammalian Piwi proteins. Nature
442, 199-202.
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, LP., and
Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants
beyond seed pairing. Mol. Cell 27, 91-105.
Grivna, S.T., Beyret, E., Wang, Z., and Lin, H. (2006). Anovel class of small
RNAs in mouse spermatogenic cells. Genes Dev. 20, 1709-1714.
Gunawardane, L.S., Saito, K., Nishida, K.M., Miyoshi, K., Kawamura, Y., Nagami, T., Siomi, H., and Siomi, M.C. (2007). A slicer-mediated mechanism
for repeat-associated siRNA 5' end formation in Drosophila. Science 315,
1587-1590.
Horwich, M.D., Li, C., Matranga, C., Vagin, V., Farley, G., Wang, P., and
Zamore, P.D. (2007). The Drosophila RNA methyltransferase, DmHenl, modifies germline piRNAs and single-stranded siRNAs in RISC. Curr. Biol. 17,
1265-1272.
Houwing, S., Kamminga, L.M., Berezikov, E., Cronembold, D., Girard, A., van
den Elst, H., Filippov, D.V., Blaser, H., Raz, E., Moens, C.B., et al. (2007). Arole
for Piwi and piRNAs in germ cell maintenance and transposon silencing in
Zebrafish. Cell 129, 69-82.
Hutvagner, G., and Simard, M.J. (2007). Argonaute proteins: key players in
RNA silencing. Nat. Rev. Mol. Cell Biol. 9, 22-32.
Hutvagner, G., Simard, M.J., Mello, C.C., and Zamore, P.D. (2004). Sequencespecific inhibition of small RNA function. PLoS Biol. 2, E98. 10.1371/joumal.
pbio.0020098.
Kawasaki, I., Shim, Y.H., Kirchner, J., Kaminker, J., Wood, W.B., and Strome,
S. (1998). PGL-1, a predicted RNA-binding component of germ granules, is
essential for fertility inC. elegans. Cell 94, 635-645.
Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A.,
Wylie, B.N., and Davidson, G.S. (2001). A gene expression map for Caenorhabditis elegans. Science 293, 2087-2092.
Kirino, Y., and Mourelatos, Z.(2007). The mouse homolog of HEN1 isa potential methylase for Piwi-interacting RNAs. RNA 13, 1397-1401.
Klattenhoff, C., and Theurkauf, W.(2008). Biogenesis and germline functions
of piRNAs. Development 135, 3-9.
Kuramochi-Miyagawa, S., Kimura, T., Ijiri, T.W., Isobe, T., Asada, N., Fujita, Y.,
Ikawa, M., Iwai, N., Okabe, M., Deng, W., et al. (2004). Mii, amammalian member of piwi family gene, isessential for spermatogenesis. Development 131,
839-849.
Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant
class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans.
Science 294, 858-862.
Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc. 77
................
...................
CeUl
P R E S S
Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel,
D.P., and Kingston, R.E. (2006). Characterization of the piRNA complex from
rat testes. Science 313, 363-367.
Lin, H.(2007). piRNAs in the germ line. Science 316, 397.
Ohara, T., Sakaguchi, Y., Suzuki, T., Ueda, H., Miyauchi, K., and Suzuki, T.
(2007). The 3' termini of mouse Piwi-interacting RNAs are 2'-O-methylated.
Nat. Struct. Mol. Biol. 14, 349-350.
Pak, J., and Fire, A. (2007). Distinct populations of primary and secondary
effectors during RNAj in C.elegans. Science 315, 241-244.
Pasierbek, P., Jantsch, M., Melcher, M., Schleiffer, A., Schweizer, D., and
Loidi, J. (2001). A Caenorhabditis elegans cohesion protein with functions in
meiotic chromosome pairing and disjunction. Genes Dev. 15, 1349-1360.
Pitt, J.N., Schisa, J.A., and Priess, J.R. (2000). Pgranules in the germ cells of
Caenorhabditis elegans adults are associated with clusters of nuclear pores
and contain RNA. Dev. Biol. 219, 315-333.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and
Bartel, D.P. (2006). Large-scale sequencing reveals 21 U-RNAs and additional
microRNAs and endogenous siRNAs in C.elegans. Cell 127,1193-1207.
Saito, K., Nishida, K.M., Mori, T., Kawamura, Y., Miyoshi, K., Nagami, T.,
Siomi, H., and Siomi, M.C. (2006). Specific association of Piwi with rasiRNAs
derived from retrotransposon and heterochromatic regions in the Drosophila
genome. Genes Dev. 20, 2214-2222.
Saito, K., Sakaguchi, Y., Suzuki, T., Suzuki, T., Siomi, H., and Siomi, M.C.
(2007). Pimet, the Drosophila homolog of HEN1, mediates 2'-O-methylation
of Piwi- interacting RNAs at their 3' ends. Genes Dev. 21, 1603-1608.
Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. (2004). Photocleavable fluorescent nucleotides for DNA sequencing on a chip con-
78 Molecular Cell 31, 67-78, July 11, 2008 @2008 Elsevier Inc.
Molecular Cell
21 U-RNAs Are C. elegans piRNAs
structed by site-specific coupling chemistry. Proc. Nati. Acad. Sci. USA
101, 5488-5493.
Sijen, T., Steiner, F.A., Thijssen, K.L., and Plasterk, R.H. (2007). Secondary
siRNAs result from unprimed RNA synthesis and form adistinct class. Science
315, 244-247.
Strome, S. (2005). Specification of the germ line. InWormBook, The C.elegans
Research Community, ed. 10.1895/wormbook.1.9.1, http://www.wormbook.
org.
Strome, S., and Wood, W.B. (1982). Immunofluorescence visualization of
germ-line-specific cytoplasmic granules in embryos, larvae, and adults of
Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 79,1558-1562.
Tijsterman, M., Okihara, K.L., Thijssen, K., and Plasterk, R.H. (2002). PPW-1,
a PAZ/PWI protein required for efficient germline RNAi, isdefective in anatural
isolate of C. elegans. Curr. Biol. 12, 1535-1540.
Vastenhouw, N.L., Fischer, S.E., Robert, V.J., Thijssen, K.L., Fraser, A.G., Kamath, R.S., Ahringer, J., and Plasterk, R.H. (2003). A genome-wide screen
identifies 27 genes involved intransposon silencing in C.elegans. Curr. Biol.
13,1311-1316.
Wang, G., and Reinke, V. (2008). A C. elegans Piwi, PRG-1, regulates 21 URNAs during spermatogenesis. Curr. Biol. 18, in press. Published online May
22, 2008. 10.1016/j.cub.2008.05.009.
Werdelin, L., and Nilsonne, A.(1999). The evolution of the scrotum and testicular descent in mammals: a phylogenetic view. J. Theor. Biol. 196, 61-72.
Yigit, E., Batista, P.J., Bei, Y., Pang, K.M., Chen, C.C., Tolia, N.H., Joshua-Tor,
L., Mitani, S., Simard, M.J., and Mello, C.C. (2006). Analysis of the C.elegans
Argonaute family reveals that distinct Argonautes act sequentially during RNAi.
Cell 127, 747-757.
............
nat
Vol 455130 October 2008 |dok10.1038/natureO7415
ARTICLES
Early origins and evolution of microRNAs
and Piwi-interacting RNAs in animals
Andrew Grimsoni, 2 , Mansi Srivastava4, Bryony Fahey 3, Ben J. Woodcroft 3, H. Rosaria Chiang1,2, Nicole King 4 ,
Bernard M. Degnan 3 , Daniel S. Rokhsar4'5 & David P. Bartel1,2
In bilaterian animals, such as humans, flies and worms, hundreds of microRNAs (miRNAs), some conserved throughout
bilaterian evolution, collectively regulate a substantial fraction of the transcriptome. In addition to miRNAs, other bilaterian
small RNAs, known as Piwi-interacting RNAs (piRNAs), protect the genome from transposons. Here we identify small RNAs
from animal phyla that diverged before the emergence of the Bilateria. The cnidarian Nematostella vectensis (starlet sea
anemone), a close relative to the Bilateria, possesses an extensive repertoire of miRNA genes, two classes of piRNAs and a
complement of proteins specific to small-RNA biology comparable to that of humans. The poriferan Amphimedon
queenslandica (sponge), one of the simplest animals and a distant relative of the Bilateria, also possesses miRNAs, both
classes of piRNAs and a full complement of the small-RNA machinery. Animal miRNA evolution seems to have been
relatively dynamic, with precursor sizes and mature miRNA sequences differing greatly between poriferans, cnidarians and
bilaterians. Nonetheless, miRNAs and piRNAs have been available as classes of riboregulators to shape gene expression
throughout the evolution and radiation of animal phyla.
The RNA interference (RNAi) pathway, which processes long
double-stranded RNA into small interfering RNAs and uses them
to mediate gene silencing, is present in diverse eukaryotes, presumably with a role in transposon silencing or viral defence since early in
eukaryotic evolution 1. Building on this basal pathway, which
includes the Dicer endonuclease and the argonaute (Ago) effector
protein, some eukaryotic lineages have acquired additional pathways, each using unique classes of small RNAs to guide silencing.
MicroRNAs, -21-24-nucleotide RNAs that derive from distinctive
hairpin precursors, pair to messenger RNAs to direct their posttranscriptional repression 2 . More than one-third of human genes
are under selective pressure to maintain pairing to miRNAs, implying
that these riboregulators influence the expression of much of the
transcriptome'. Piwi-interacting RNAs are longer, -25-30 nucleotides, with incompletely characterized biogenic pathways. In mammals and flies, piRNA expression is restricted to the germ line, where
they have crucial roles in transposon defence, although one class of
mammalian piRNAs, highly expressed at the pachytene stage of
sperm development, has unknown function'".
The plant and algal miRNAs have gene structure, biogenesis and
targeting properties distinct from those of animals-. These differences, considered together with the absence of miRNAs in fungi and
all other intervening lineages examined, have led to the conclusion
that miRNAs of animals and plants had independent origins'. Of the
many miRNAs reported in Bilateria (Fig. 1), -30 appear to have been
present in ancestral bilaterians"'; however, none have been reported
in the earliest branching animal lineages, leading to the hypothesis
that bilaterian complexity might, in part, be due to miRNA-mediated
regulation". Likewise, piRNAs have not been reported outside
Bilateria, raising the question of whether a rich small-RNA biology
is characteristic of more complex animals, or whether these small
RNAs might have emerged earlier in metazoan evolution.
Diverse microRNAs of the starlet sea anemone
Eumetazoa includes the Bilateria as well as the Cnidaria. Among
sequenced genomes Cnidaria is represented by the starlet sea anemone,
Nematostella vectensis". To explore whether cnidarians have miRNAs,
we sequenced complementary DNA libraries generated from 18-30nucleotide RNAs isolated from Nematostella. High-throughput
sequencing yielded 2.9 million reads perfectly matching the
Nematostella genome (Fig. 2a). To identify miRNAs, we considered
properties that have proved useful for distinguishing bilaterian
miRNAs from other types of small RNAs represented in sequencing
data"'". The first criterion was the presence of reads mapping to an
inferred RNA hairpin with pairing characteristics of known miRNA
hairpins. The second was the presence of reads from both arms of the
Homo sapiens (human, 677 miRNAs)
Mus musculus (mouse, 491 miRNAs)
Drosophila melanogaster (fly, 147 miRNAs)
Caenorhabditis elegans (nematode, 154 mIRNAs)
Schmidtea mediterranea (planarian, 61 miRNAs)
T T
g r
a
-L
,
Nematostella vectensis (sea anemone)
Trichoplax adhaerens (placozoan)
Amphimedon queenslandica (sponge)
Monosiga brevicolis (choanoflagellate)
Schizosaccharomyces pombe (yeast, 0 miRNAs)
Saccharomyces cerevisiae (yeast, 0 miRNAs)
Neurospora crassa (fungus, 0 miRNAs)
Arabidopsis thaliana (flowering plant, 199 miRNAs)
Physcomitrella patens (moss, 263 miRNAs)
Chlamydomonas reinhardti (green alga, 72 miRNAs)
Figure 1 Phylogenetic distribution of annotated miRNAs. Cladogram of
selected eukaryotes, with organisms investigated in this study indicated in
red. Branching order of Bilateria is according to ref. 28 and the references
therein, and that of basal Metazoa is according to ref. 17 (Supplementary
Discussion). Annotated miRNA tallies are from miRBase (v10.1)".
2
'Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA. Howard Hughes Medical Institute, Department of Biology,
4
Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 'School of Integrative Biology, University of Queensland, Brisbane 4072, Australia. Department of
Molecular and Cell Biology and Center for Integrative Genomics, University of California at Berkeley, Berkeley, California 94720, USA. 'Department of Energy, Joint Genome Institute,
Walnut Creek, California 94598, USA.
1193
@2008 Macmillan Publishers Limited. All rights reserved
................................ ..................
ARTICLES
NATUREI Vol 455130 October 2008
hairpin that, when paired to each other, formed a duplex with
2-nucleotide 3' overhangs. This duplex corresponds to an intermediate of miRNA biogenesis in which the miRNA and opposing segment
of the hairpin, called the miRNA*, are excised from the hairpin
through successive action of Drosha and Dicer RNase III endonucleases2 . The third criterion was homogeneity of the miRNA 5' terminus. Because pairing to miRNA nucleotides 2-8 is crucial for target
recognition', reads matching bilaterian miRNAs display less length
variability at their 5' termini than at their 3' termini"," .
As exemplified by mir-2024d (Fig. 2b, c), 40 distinct Nematostella
loci met these criteria (Fig. 2d and Supplementary Data 1; identical
hairpins were not counted because they might have arisen from
genome-assembly artefacts). Additional features, not used as selection criteria, resembled those ofbilaterian miRNAs2 , thereby increasing confidence in our annotations. For example, the loci usually
mapped between annotated protein-coding genes (31 loci) or within
introns in an orientation suitable for processing from the pre-mRNA
(8 loci). The Nematostella miRNAs also had a tight length distribution (centring on 22 nucleotides, Fig. 2d), and five groups of
miRNAs (corresponding to 13 miRNAs) mapped near to each other
in an orientation suitable for production from the same primary
transcript (Supplementary Data 1), as occurs in bilaterians2 . With
the exception of two miRNA pairs (miR-2024a,b and miR-2024fd),
the Nematostella miRNAs had unique sequences at nucleotides 2-8,
suggesting notable diversity of miRNA targeting in this simple animal.
Previous studies that explored the possibility that cnidarians might
have miRNAs searched for Nematostella homologues of the -30
miRNA families broadly conserved within the Bilateria by probing
RNA blots and examining candidate hairpin sequences",". These
studies reported the possible presence of miR-10, miR-33 and miR100 family members in Nematostella. None of our reads matched the
proposed miR-10, miR-33 or miR-100 homologues, and none
matched the proposed hairpin precursors of miR-10 or miR-33.
Such discrepancies were not unexpected, because detection of distantly related miRNAs by hybridization is prone to false-positives,
a
e
10
5'-nt
identity
IAU
iG
0901
Despite this wholesale shift in their predicted targeting, the
Nematostella and bilaterian versions of miR-100 had similarity
throughout the RNA, suggesting common origins (Fig. 2e). This
result confidently extended the inferred origin of metazoan
miRNAs back to at least the last common ancestor of these eumetazoans. Systematic comparison to annotated miRNAs did not reveal
any additional Nematostella miRNAs with similarity exceeding that
of shuffled control sequences (Supplementary Fig. 1). Although the
short length of miRNAs may cause sequence divergence to obscure
common ancestry, it is noteworthy that only one of the 40
Nematostella miRNAs appeared homologous to extant bilaterian
miRNAs, and even this one seemed to have profoundly different
targeting properties.
MicroRNAs near the base of the metazoan tree
To determine whether miRNAs might be present in more deeply
branching lineages, we generated 2.5 million genome-matching reads
from the small RNAs of the demosponge A. queenslandica, a poriferan thought to represent the earliest diverging extant animal lineage 6" (Figs 1 and 3a). Eight miRNA genes were identified in
Amphimedon adult and embryo samples (Fig. 3b and Supplementary Data 2), exemplified by mir-2018 (Fig. 3c). Six mapped
between annotated protein-coding genes; two fell within introns.
As is typical for bilaterian miRNAs 2 and is also found in
Nematostella (Fig. 2d), reads from one arm of the hairpin usually
greatly exceeded those from the other arm, enabling unambiguous
miRNA
1516 17 18 19 20 2122 23 24 25 26 27 28 29 30
Length (nt)
mir-2024d
Sequence
Hairpin miRNAmiRNA*
reads reads reads
4,973 4,011
miR-100 MHRMACCGDMJAQIGG
J()MMJAJGOJQAA
CMC
192 169
miR-2022
miR-2023 AGMGMCMBJGGG 28,993 28,174
miR-2024a UJGCAOC'CCMWJCUGA 5,576
miR-2024b
miR-2024c
miR-2024d
b
miR-2024e
miR-2024f
miR-2024g
115..
MMAGQOGGGCMA..................................
6 ..... MiMUAGAJGisi9GGAA...................................
miR-2025
1 ...... MiM
ALAJ WJMGG
......................................
miR-2026
miR-2027
1 ......... A IAa. iGCMMGs
..................................
miR-2028
1
.
DJ4A PJOGen.AAAGU............................
1
...........................................
miR-2029
1
.
M....... WGGCMMK.........................
miR-2030
1 ...
...................... AJGCAMUCAAJXiG...
miR-2031
1 ...............
............. AMCAEAX
....
1 .............................
AGAC. AUG
miR-2032a
9.....................
24
..................................... Luk~~~haw..
WD ....
C M IA.....
miR-2032b
3i7...................
5.......II~MDC~fh0
.
miR-2033
AWJ~CD~ .......
6-...........................
..................
..... ..... IA
miR-2034
1..............................
LJUMQCAMQ..::::
miR-2035
4U ............................... IULrOKAWAJ9......
34................................WACAXCCMO
. .. miR-2036
2 ................................
UGMAXACCANGUIX.. miR-2037
6 ................................
UUKeLCAUQ1a..
miR-2038
3 ................................
U KCA MMJ ...
miR-2039
1.................................zKMcA1
JAJOJIGA...
miR-2040a
C
miR-2024d*
miR-2040b
GG U A GACUUG C UU C AG A A U AU UG G UACUGGGC AA A AGGU miR-2041
I 1I 1 1
IIIII
IIII
111111~~~
miR-2042
A
U
U
CCU C
U
GG
S U C
mRN-2024d A miR-2043
miR-2044a
N. vectensis miR-100
- ACCCGUAGAUCCGAACUUGUGG
miR-2044b
miR-2045
AACCCGUAGAUCCGAACUUGUG H. sapiens miR-100
AACCCGUAGAUCCGAACUUGUGX tropicalis miR-100
miR-2046
AACCCGUAGAUCCGAACUUGUG D. reao miR-100
miR-2047
D. melanogaster miR-100 AACCCGUAAAUCCGAACUUGUGmiR-2048
AACCCGUAGAUCCGAUCUUGUGH. sapiens miR-99a
miR-2049
CACCCGUAGAACCGACCUUGCGH. sapiens mlR-99b
miR-2050
AACCCGUAGAUCCGAUCUUGUGX tropicalis miR-99
miR-2051
AACCCGUAGAUCCGAUCUUGUG D. rero miR-99
0
ACGGGUA 3.
d
20
E
and many genomic sequences can fold into hairpins. However, one
of the newly identified miRNAs arose from the hairpin of the reported
miR-100 homologue. The actual miRNA was offset by one nucleotide
compared to bilaterian miR-100 family members (Fig. 2e). Because
miRNA-targeting is defined primarily by nucleotides 2-8, this offset is
expected to alter target recognition substantially, with the
Nematostella version primarily recognizing mRNAs containing
CUACGGG and UACGGGA heptanucleotide sites and the bilaterian
versions recognizing mRNAs with two different sites, UACGGGU and
5,449
KAMMAJCCMnACUA
A
300
205
6,732
UAJGCAU4JCCMUQJOJGA
4,294 4,131
L5JCMCM51IC1AXA 8.248 5,935
IJkM.CAOJ
4,331 4,182
UMC
PAEGAUM
JQJGA 2,396 2,071
WIJJAiCCCiKGMAJJ 6,681 6,539
3,319 3,140
AUGCGALMJCCM1GAA
2,543 1,688
1.JinammA ia 1,643 1,331
UACGMUCCO.GAMCAU 1,607 1,097
UWM CAIMPAiGAGAW
1,349 1,316
UAMCOlAAWMJiIACU
785 485
UCGCGACUAG
184 116
UCGADCMGAM
GG
119
50
AGCMUMAGMDGAAMaa
154
74
GCMMCAMJAanJCAJ 138 128
132
81
ACAUGGUCUQAMCMiGA
MUAMGiACUCUCAUA 117 98
GGAGA
M J
116 107
88
116
ACOJAMUJCACNUGAUGA
AMMMAG
Q
85
76
LMAJCUCQPMJGC0UGG
75
44
41
71
LAJCQJKQCMCUGCCOJM
UiMCQA1W6CMWimaJ 72
57
U*WAJCMUUCAUCGCAGMC
72
28
Liar.OaAssirinnrr
10
8
UGC00EMMQIAG
13
10
69
66
KC JCMAG i6
Uim
UCRAJGAG
46
19
32
UCCVGA8JAJGAQCMMA41
(MAOWOCGWAU
GAM 7,053
(MCAJMUCUAGGA
(LJGI
M
36
IAJCGMiAGCAGUJMGGiA
29
ACCUGADCMMAiCAA 25
CAMGCACGCAMUGAAU
22
KMMJ GJAJG
21
483
15
80
63
63
165
115
165
44
55
31
2
1
9
21
5
9
2
2
1
(2)
12
11
(2)
1
1
1
1
(4)
19
(1)
1
1
8
1
21
(6)
16
20
14
15
2
(1)
2
2
1194
@2008 Macmillan Publishers Limited. All rights reserved
Figure 21 The miRNAs of N. vectensis. a, Length
distribution of genome-matching sequencing
reads representing small RNAs, plotted by 5'nucleotide (nt) identity. Matches to ribosomal
DNA were omitted. b, Sequencing reads
matching the mir-2024dhairpin. The sequence of
the mir-2024d hairpin is depicted above the
bracket-notation of its predicted secondary
structure. The sequenced small RNAs mapping to
the hairpin are aligned below, with the number of
reads shown on the left, and the designated
miRNA and miRNA* species coloured red and
blue, respectively. Analogous information is
provided for the other newly identified miRNAs
(Supplementary Data 1). c, Predicted secondary
structure of the mir-2024dhairpin, indicating the
miRNA and miRNA* species. d, The 40
Nematostella miRNAs. MicroRNA read counts
include those sharing the dominant 5' terminus
but possessing variable 3' termini. Occasionally
the only sequenced miRNA* species
corresponded to a variant miRNA species rather
than the major species (counts in brackets).
e, Alignment of miR-100 homologues (Danio
rerio,D. rerio;Xenopus tropicalis,X. tropicalis).
..
................................
.............
.....
ARTICLES
NATUREI Vol 455130 October 2008
annotation of the miRNA and miRNA* (Fig. 3b). However, the
number of reads from the two arms of the mir-2015 hairpin did
not differ substantially, suggesting that each might have similar propensities to enter the silencing complex and target miRNAs.
Moreover, the species from the 3' arm (miR-2015-3p) dominated
in adult tissue, whereas the one from the 5' arm (miR-2015-5p)
dominated in embryonic tissue (Fig. 3d), supporting the notion that
this single hairpin produces two distinct miRNAs, and implying an
intriguing, developmentally controlled differential loading into the
silencing complex.
In Amphimedon, pre-miRNA hairpins were larger than most of
those of other metazoans (Fig. 3e). The Nematostella pre-miRNAs
(including mir-100) fell at the other end of the spectrum, with a
median length less than that of bilaterian pre-miRNAs (Fig. 3e).
None of the Amphimedon miRNAs shared significant similarity with
any previously described miRNAs (Supplementary Fig. 1), or with
the miRNAs found in Nematostella.This observation, combined with
their unusually large pre-miRNA hairpins, raised the possibility of an
origin independent from that of eumetazoan miRNAs. Arguing
against this possibility, we found Amphimedon homologues of
Drosha and Pasha proteins (Table 1), which recognize the miRNA
primary transcript and cleave it to liberate the pre-miRNA hairpin".
Homologues of these proteins appeared to be absent in all lineages
outside the Metazoa, indicating a single origin for these processing
factors early in metazoan evolution and implying a single origin for
their miRNA substrates.
A third animal lineage branching basal to the Bilateria is Placozoa,
7
represented by the sequenced species Trichoplax adhaerens .
that
genes
suggested
of
mitochondrial
analyses
earlier
Although
Trichoplaxdiverged before Amphimedon, genomic data indicate that
Trichoplaxhad a common ancestor with cnidarians and bilaterians
more recently than with Amphimedon" (Fig. 1 and Supplementary
Discussion). Our study of Trichoplax small RNAs failed to find
miRNAs, despite acquiring many more reads than required to
identify miRNAs in all other animals and plants examined
(Supplementary Figs 2 and 3). Thus, despite the formal possibility
that TrichoplaxmiRNAs are expressed at levels so low that we failed to
detect them, we favour the hypothesis that all miRNA genes have
been lost in this lineage. Trichoplaxis thought to have derived from a
more complex ancestor, having lost, for example, the hedgehog and
Notch signalling pathways 7 . Supporting our hypothesis, no Pasha
homologue was found in the Trichoplax genome, although we did
find the core RNAi proteins-argonaute and Dicer-suggesting the
production and use of small interfering RNAs (Table 1). Drosha,
which partners with Pasha during miRNA biogenesis", was found
also but might be required in the absence of miRNAs for ribosomal
RNA maturation' 9 . Of the proteins involved in canonical miRNA
biogenesis, Pasha is the one without known functions outside the
miRNA pathway, and it was the one that appeared to have been
discarded, together with all miRNAs, from the Trichoplax genome
(Table 1).
We also sequenced small RNAs from the single-celled organism
Monosiga brevicollis (Supplementary Fig. 2), which represents the
closest known outgroup to the Metazoa20 . We failed to detect any
plausible miRNAs, a result consistent with our subsequent finding
that Monosiga seems to lack all genes specific to small-RNA biology
(Table 1). The absence of Dicer and argonaute seemed to be derived
rather than ancestral, as the common ancestor of Monosiga and
metazoans possessed these core RNAi proteins' (Table 1). The possibility that the absence of miRNAs in Monosiga might likewise be
derived prevented us from setting an early bound on the origin of
metazoan miRNAs.
In summary, miRNAs appear to have been available to shape gene
expression since at least very early in animal evolution. Nonetheless,
the numbers identified in simpler animals (8 unique miRNAs in
Amphimedon and 40 in Nematostella)were lower than those reported
in more complex animals (Fig. 1). Although miRNAs expressed only
under specific conditions or at restricted developmental stages were
possibly missed in these and other animals, our results are consistent
with the idea that increased organismal complexity in Metazoa correlates with the number of miRNAs and presumably with the number
of miRNA-mediated regulatory interactions.
Piwi-interacting RNAs in deeply branching animals
We next turned to the possibility that piRNAs also might have early
origins. Piwi proteins, the effectors of bilaterian piRNA pathways, are
found in diverse eukaryotic lineages (although not in plants or fungi,
Table 1), implying their presence in early eukaryotes'. In cases characterized, however, the small RNAs associated with non-metazoan
Piwi proteins resemble siRNAs more than bilaterian piRNAs (deriving, for example, from Dicer-catalysed cleavage of long doublestranded RNA"), raising the question of when piRNAs of the types
found in Bilateria might have emerged. The genomes of both
Amphimedon and Nematostella, but not that of Trichoplax, encode
Piwi proteins (Table 1) and express many -27-nucleotide RNAs with
a 5'-terminal uridine (5'-U) (Figs 2a and 3a)-features reminiscent of
a
a
1
Hairpin miRNA miRNA*
reads reads reads
miRNA
o>
miR-2014
miR-2015-3p
miR-2015-5p
miR-2016
miR-2017
miR-2018
miR-2019
miR-2020
miR-2021
E.
we
E
Length (nt)
Sequence
178
17,843 17,043
UGCCAAACAAGUCCGAUCUACA
2,703 1,086
5 501
ACCUCUCCAUCAUGCAUGACA
2,657 2,063
'
UCAUGUAUUGUGGAGGGGAGA
7
37,606 36,675
UAGAUUGGGCUUGGUCGGCAGA
93
1,725 1,531
UACCUGUGCACCUGUGUGCCCA
107
1,529 1,309
UGUCGGAGCCGGAGGUUCCGGA
416
11,574 10,483
AAAGUGAUCGGGUUGCCGUCUG
5
13,936 13,700
UGGGUAGUGUGUCUUUUCGGA
25
7,642 7,537
UGGUGGUCGGUGUUUCGUGGA
AGAU
UGA
A UA
miR-2018
AAC
c
A
GA
A
AUA
AAGCCCAUGCA GGCAuUGGA AUAAACCGGUU
GCAUGAGUUACAGUGUGUCG GAGCCGGAGGuuccGGAG
l1
11111 11 1 G
11111
lii 1111
1 1111tIlI1
ll
111111111 I
11111 11 1
C
UGUUUGGUCAu
CGUAC CG UGA ACACAGC U CAGGU UCCAGGGCCUC C G A UUCGGUAACGU UUGUUACCU
miR-2018*
d
Fold enrichment
Embryo
4
2
1
2
4
miR-2014
miR-2015-5p
miR-2015-3p
miR-2016I
miR-2017
miR-2018
miR-2019
miR-2020
miR-2021
e
Adult
8 16
32
64
1.0
2 0.8
0
N.vectensis
H. sapiens
0.6
0.4
melanogaster
-2D.
C. elegans
A.queenslandica
A. thaliana
E
0.2
50
75
100
125
Pre-miRNA size (nt)
Figure 31 The miRNAs of Amphimedon
queenslandica. a, Length distribution ofgenomematching sequencing reads representing small
RNAs, plotted by 5'-nucleotide identity. Matches
to ribosomal DNA were omitted. b, The
Amphimedon miRNAs, shown as in Fig. 2d.
Information analogous to that of Fig. 2b is
provided for these miRNAs (Supplementary Data
2). c, Predicted secondary structure of the mir2018 hairpin. d, Relative expression of
Amphimedon miRNAs, as indicated by
sequencing frequency from adult and embryo
samples. e, Cumulative distributions of pre-
miRNA lengths from miRNA transcripts of the
species indicated. Amphimedon pre-miRNAs
were significantly larger than those from any
other animal species examined (P< 10-,
Wilcoxon rank-sum test), whereas those from
Nematostella were significantly smaller
(P < 10-).
150
"95
@2008 Macmillan Publishers Limited. All rights reserved
.......
. .. ..
..
........
...............
...................
...........................
ARTICLES
NATURE IVol 455130 October 2008
Table 1 I The small-RNA machinery of representative eukaryotes
Species
Ago
Piwi
Dicer
Drosha
Pasha
Hen1
Homo sapiens
Drosophila melanogaster
Coenorhabditis elegans*
4
2
5
4
3
3
1
2
1
1
1
1
1
1
1
1
1
1
Nematostella vectensist
3
2
5
1
1
1
1
2
3
01
1
Trichoplax adhaerenst
Amphimedon queenslandicat
Monosiga brevicollis
Saccharomyces cerevisiae
Schizosaccharomyces pombe 1|
Arabidopsis thaliana
Physcomitrella patens
Chlamydomonas reinhardtii
3
4
ot
1
0§
1
01
0
0
0
0
1
4
5
0
0
0
0
0
0
Ot
2
ot
ot
ot
2
1
3
0
0
1
01
01
01
0$
1
10
6
Ot
Ot
0t
Ot
2
*Omitted is anematode-specific clade of proteins related to the Ago and Piwi protein families
but distinct from both".
t Protein sequences are listed in Supplementary Data 3.
! Inferred loss based on presence in earlier-diverging lineages.
I Inferred loss based on presence inearlier-diverging lineages when assuming that Amphimedon
diverged before Trichoplax (Supplementary Discussion).
|| Ago and Dicer, but not Piwi, Drosha, Pasha or Henl, were also identified in each of the
additional fungal species examined (Aspergillus nidulans, Neurospora crassa and Sclerotinia
sclerotiorum).
piRNAs in vertebrates and ffies'. Moreover, 45% of Nematostella 5'-U
27-30-nucleotide RNAs originated from only 89 genomic loci
(together comprising 0.4% of the genome), the largest of which was
62 kilobases, and essentially all of these small RNAs derived from one
strand of each locus (Fig. 4a and Supplementary Table 3). In these
respects the genomic loci producing a large fraction of the
Nematostella reads closely resembled the loci producing bilaterian
piRNAs, particularly the pachytene piRNAs'. We observed a similar
clustering ofgenomic matches of Amphimedon 5'-U24-30-nucleotide
RNAs, although the loci were smaller and accounted for fewer reads
(10% of the reads originating from 73 loci comprising 0.2% of the
genome, Supplementary Table 4).
Another characteristic of piRNAs is that they undergo Henimediated methylation of their terminal 2' oxygen 22 . To test for this
modification, we treated RNA from Nematostella and Amphimedon
with periodate and then re-sequenced from both treated and untreated
samples (Supplementary Fig. 4). Piwi-interacting RNAs and other
RNAs modified at their 2' oxygen remain unchanged with this treatment and are sequenced, whereas those with an unmodified 2',3' cisdiol are oxidized, which renders them refractory to sequencing". In
contrast to the Amphimedon miRNAs and many of the Nematostella
miRNAs (Supplementary Tables 1 and 2), reads corresponding to the
candidate piRNA clusters in both Nematostella and Amphimedon were
not reduced after treatment (Supplementary Tables 3 and 4), indicating that their terminal 2',3' cis-diol was modified. This modification, considered together with their other features characteristic of
vertebrate and fly piRNAs, including the length of 25-30 nucleotides,
the 5'-U bias, and the single-stranded, clustered organization of their
genomic matches, provided evidence that these small RNAs represented piRNAs of Nematostella and Amphimedon.
The piRNAs were the type of small RNAs most abundantly
sequenced in Nematostella and Amphimedon (Figs 2a and 3a, and
Supplementary Discussion). A similar phenomenon is observed in
mammalian testes, in which the pachytene piRNAs greatly outnumber
the miRNAs and initially obscured detection of a second class of
mammalian piRNAs, which resemble the most abundant Drosophila
piRNAs with respect to both their biogenesis and their apparent role in
suppressing transposon activity2 . Most of the Nematostella and
Amphimedon genomic loci with clustered piRNA matches resembled
the first class of piRNAs, in that they tended to fall outside of annotated genes (P < 10-3, Wilcoxon rank-sum test) and spawned piRNAs
predominately from only one DNA strand (>99% and 96% from one
strand, Nematostella and Amphimedon, respectively). To determine
whether the second class of piRNAs might also exist in deeply branching lineages, we analysed the sequences from periodate-treated samples, focusing on the minority that matched annotated protein-coding
genes (Fig. 4b). As expected for class II piRNAs, these piRNAs did not
have such a strong tendency to match only one strand of the DNA
(62% and 64% antisense for Nematostellaand Amphimedon, respectively). Moreover, among the predicted coding regions with the most
matches to the piRNAs, a significant fraction (18 of 50 in Nematostella,
P< 10-3; 12 of 40 in Amphimedon, P= 0.03, Supplementary Tables 5
and 6) were homologous to transposases.
Having found small RNAs resembling bilaterian class II piRNAs
we looked for evidence that they were generated through the same
feed-forward biogenic pathway4 ". In this pathway, primary piRNAs
from transcripts antisense to transposable elements pair to transposon messages and direct their cleavage. This cleavage defines the 5'
termini of secondary piRNAs generated from the transposon message, and these secondary piRNAs pair to piRNA transcripts, directing
cleavage and thereby defining the 5' termini of additional piRNAs
resembling the primary piRNAs. Because the primary piRNAs typically begin with a 5'-U and direct cleavage at the nucleotide that
pairs to position 10, the secondary piRNAs typically have an A at
8001 (Scaffold 328: 50-140 kb)
a
600
10 kb
1 400
F 200
sili.L1 LA
0
.~L
1
H
EA[[.ILIL.il~1Jl
150
Z 20 0
2()
100 nt
325-100
(genelD: 200314)
~-~-
-
C
100
Nematostella
Amphimedon
Sense
0080 Sense
S60.
40.
a201
Q
1
100
Nematostella
Antisense
80
kb
0.1
Normalized reads
--
uJr~-s
--
1.
P
5-25
1-5
<1
Nucleotide
identity
10
1 Amphimedon
-. Antisense
60
40
a* 20
1
10
Position
Figure 4 | The piRNAs of basal metazoans. a, Distribution of reads
matching a Nematostefla piRNA locus. Plotted is the number of matching
reads with 5' nucleotide falling within each 100-nucleotide window (main
graph) or at each nucleotide (higher-resolution inset) spanning the genomic
region. Bars above and below the x axis indicate matches to the indicated
strand, with black bars indicating reads with a 5'-U and red bars indicating
the sum of all other reads. For reads also matching other genomic loci,
counts were normalized by total genome matches. Other annotated piRNA
loci are presented in Supplementary Tables 3 and 4. kb, kilobases. b, An
annotated pre-mRNA corresponding to numerous small RNAs resistant to
periodate treatment. Annotated coding segments (open boxes) and intron
segments (black line) are indicated. The gene was homologous to
endonuclease/reverse transcriptases of other genomes and presumed to be a
transposase. Small RNAs with unique 5' ends are represented by coloured
bars above or below the transcript (sense and antisense, respectively), with
colours indicating the read numbers (normalized to account for the number
of transcriptome matches). Small RNAs matching splice junctions (observed
only for sense reads) are represented by discontinuous bars, linked by
dashed lines. Other Nematostella and Amphimedoncoding regions matching
candidate piRNAs are listed in Supplementary Tables 5 and 6. c, Nucleotide
composition of periodate-resistant small RNAs matching the indicated
strand of Nematostella or Amphimedon annotated coding regions.
1196
@2008 Macmillan Publishers Limited. All rights reserved
NATUREI Vol 455130 October 2008
ARTICLES
position 10. Examination of all 27-30-nucleotide periodate-resistant
reads antisense to Nematostella coding regions revealed a propensity
for a 5'-U, characteristic of primary piRNAs (Fig. 4c). The sensestrand piRNAs lacked this 5'-U bias and instead displayed a propensity for an A at position 10 (Fig. 4c and Supplementary Fig. 5).
Moreover, sense and antisense reads that paired to each other tended
to have 10 base pairs formed between their 5' ends (Supplementary
Fig. 6). For the 24-30-nucleotide periodate-resistant reads from
Amphimedon, the same hallmark features of the back-and-forth, or
ping-pong, amplification cycle for piRNA biogenesis4' 5 were
observed (Fig. 4c and Supplementary Fig. 6). We conclude that the
two classes of piRNAs found previously in mammals and flies have
existed since the origin of metazoans: the class I piRNAs, represented
by the mammalian pachytene piRNAs, which have unknown function during germline development, and the class II piRNAs, which
use the ping-pong cleavage and amplification cascade to quiet
expression of certain genes, particularly those of transposons.
Indeed, the sequence-based transposon silencing by piRNAs, which
by virtue of the feed-forward amplification process focuses on the
most active transposon species, might be one of the principle drivers
of transposon diversity in animals.
Taken together, our results indicate that miRNAs and piRNAs, as
classes of small riboregulators, have been present since the dawn of
animal life, and indeed might have helped to usher in the era of
multicellular animal life. However, metazoan miRNA evolution
seems to have been very dynamic: all miRNAs have been lost in
Trichoplax,and the pre-miRNAs of Porifera, Cnidaria and Bilateria
have assumed distinct sizes. In addition, no miRNAs have recognizable conservation between poriferans, cnidarians and bilaterians,
with only one of the Nematostella miRNAs displaying recognizable
homology to bilaterian miRNAs, either because it is the only homologue of extant bilaterian miRNAs or because divergence has
obscured common ancestry of other miRNAs. The wholesale shifts
in miRNA function implied by this plasticity are congruent with the
report that, although thousands of miRNA-target interactions have
been maintained within each of the nematode, fly and vertebrate
lineages, very few appear to be conserved throughout all three
lineages2". The plasticity of miRNA sequences over long timescales
helps to explain why the rich small-RNA biology in basal organisms
had escaped detection for so long.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
METHODS SUMMARY
The M. brevicollis library was constructed as described" and sequenced by 454
Life Sciences. All other libraries (Supplementary Table 7) were constructed using
an analogous method and sequenced on the Illumina platform.
Full Methods and any associated references are available in the online version of
the paper at www.nature.com/nature.
Received 5 June; accepted 12 September 2008.
Published online 1October 2008.
1.
2.
3.
4.
5.
6.
7.
Cerutti, H. & Casas-Mollano, J.A. On the origin and functions of RNA-mediated
silencing: from protists to man. Curr. Genet. 50, 81-99 (2006).
Bartel, D.P.MicroRNAs: genomics, biogenesis, mechanism, and function. Cell116,
281-297 (2004).
Lewis, B. P.,Burge, C. B. & Bartel, D. P. Conserved seed pairing, often flanked by
adenosines, indicates that thousands of human genes are microRNA targets. Cell
120, 15-20 (2005).
Brennecke, J.et al. Discrete small RNA-generating loci as master regulators of
transposon activity in Drosophila. Cell 128, 1089-1103 (2007).
Aravin, A. A., Hannon, G. J.& Brennecke, J.The Piwi-piRNA pathway provides an
adaptive defense in the transposon arms race. Science 318, 761-764 (2007).
Jones-Rhoades, M. W., Bartel, D. P. & Bartel, B. MicroRNAS and their regulatory
roles in plants. Annu. Rev. Plant Biol. 57, 19-53 (2006).
Molnar, A., Schwach, F.,Studholme, D.J.,Thuenemann, E.C.& Baulcombe, D. C.
miRNAs control gene expression in the single-cell alga Chlamydomonas reinhardtii.
Nature 447, 1126-1129 (2007).
29.
Zhao, T. et al. A complex system of small RNAs in the unicellular green alga
Chlamydomonas reinhardtii. Genes Dev. 21,1190-1203 (2007).
Pasquinelli, A. E. et al. Conservation of the sequence and temporal expression of
let-7 heterochronic regulatory RNA. Nature 408, 86-89 (2000).
Hertel, J.et al. The expansion of the metazoan microRNA repertoire. BMC
Genomics 7, 25 (2006).
Sempere, L.F.,Cole, C. N., McPeek, M. A. & Peterson, K. J.The phylogenetic
distribution of metazoan microRNAs: insights into evolutionary complexity and
constraint. J.Exp. Zool. 306, 575-588 (2006).
Prochnik, S.E.,Rokhsar, D. S. & Aboobaker, A. A. Evidence for a microRNA
expansion in the bilaterian ancestor. Dev. Genes Evol. 217, 73-77 (2007).
Putnam, N. H. et al. Sea anemone genome reveals ancestral eumetazoan gene
repertoire and genomic organization. Science 317, 86-94 (2007).
Ruby, J.G. et al. Large-scale sequencing reveals 21U-RNAs and additional
microRNAs and endogenous siRNAs in C.elegans. Cell 127, 1193-1207 (2006).
Ruby, J.G. et al. Evolution, biogenesis, expression, and target predictions of a
substantially expanded set of Drosophila microRNAs. Genome Res.17,1850-1864
(2007).
Larroux, C. et al. Genesis and expansion of metazoan transcription factor gene
classes. Mol. Biol. Evol. 25, 980-996 (2008).
Srivastava, M. et al. The Trichoplax genome and the nature of placozoans. Nature
454, 955-960 (2008).
Lee, Y., Han, J.,Yeom, K. H., Jin, H. & Kim, V. N. Drosha in primary microRNA
processing. Cold Spring Harb. Symp. Quant. Biol. 71,51-57 (2006).
Fukuda, T. et al. DEAD-box RNA helicase subunits of the Drosha complex are
required for processing of rRNA and a subset of microRNAs. Nature Cell Biol. 9,
604-611 (2007).
King, N.et al. The genome of the choanoflagellate Monosiga brevicollis and the
origin of metazoans. Nature 451, 783-788 (2008).
Yao, M.-C. & Chao, J.-L. RNA-guided DNA deletion in Tetrahymena: an RNAibased mechanism for programmed genome rearrangements. Annu. Rev. Genet.
39, 537-559 (2005).
Horwich, M. D. et al. The Drosophila RNA methyltransferase, DmHenl, modifies
germline piRNAs and single-stranded siRNAs in RISC. Curr. Biol. 17, 1265-1272
(2007).
Seitz, H., Ghildiyal, M. & Zamore, P. D. Argonaute loading improves the 5'
precision of both microRNAs and their miRNA strands in flies. Curr. Biol. 18,
147-151 (2008).
Aravin, A. A., Sachidanandam, R.,Girard, A., Fejes-Toth, K. & Hannon, G. J.
Developmentally regulated piRNA clusters implicate MILl in transposon control.
Science 316, 744-747 (2007).
Gunawardane, L.S.et al. A slicer-mediated mechanism for repeat-associated
siRNA 5' end formation in Drosophila. Science 315, 1587-1590 (2007).
Chen, K. & Rajewsky, N.Deep conservation of microRNA-target relationships and
3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harb. Symp. Quant.
Biol. 71,149-156 (2006).
Yigit, E.et al. Analysis of the C.elegans Argonaute family reveals that distinct
Argonautes act sequentially during RNAi. Cell 127, 747-757 (2006).
Bourlat, S.J.,Nielsen, C.,Economou, A. D.& Telford, M. J.Testing the new animal
phylogeny: a phylum level molecular analysis of the animal kingdom. Mol.
Phylogenet. Evol. 49, 23-31 (2008).
Griffiths-Jones, S.,Saini, H. K., van Dongen, S.& Enright, A. J.miRBase: tools for
microRNA genomics. Nucleic Acids Res.36, D154-D158 (2008).
Supplementary Information is linked to the online version of the paper at
www.nature.com/nature.
Acknowledgements We thank M. Abedin and E. Begovic for preparing the
Monosiga and Trichoplax samples, respectively, W. Johnston for technical
assistance, and J.Grenier, C.Mayr, C. Jan and N. Lau for discussions. This work was
supported by an NIH postdoctoral fellowship (A.G.), and by grants from the NIH
(D.P.B.), Richard Melmon (M.S., N.K. and D.S.R.), the Center for Integrative
Genomics (M.S. and D.S.R.), the Gordon and Betty Moore Foundation (N.K.) and
the Australian Research Council (B.F., B.J.W. and B.M.D.). D.P.B. is an investigator
of the Howard Hughes Medical Institute.
Author Contributions A.G. constructed the libraries using procedures developed
by H.R.C., and analysed the sequencing reads and protein homology. M.S., B.F.,
B.J.W., N.K., B.M.D. and D.S.R. provided samples for RNA extraction. A.G. and
D.P.B. designed the study and prepared the manuscript, with input from other
authors.
Author Information RNA sequencing data were deposited in the Gene Expression
Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number
GSE12578. Reprints and permissions information is available at www.nature.com/
reprints. Correspondence and requests for materials should be addressed to D.P.B.
(dbartel@wi.mit.edu).
1197
@2008 Macmillan Publishers Limited. All rights reserved
nature
doi:10.1038/natureO7415
METHODS
Small RNA sequencing. Samples of N. vectensis (mixed developmental stages,
including adult), A. queenslandica (adult tissue, stored in RNAlater, Ambion)
and M brevicollis were ground under liquid nitrogen, and then RNA was
extracted with Trizol (Invitrogen). RNA from T. adhaerens (mixed developmental stages, including adult) and A. queenslandica (mixed embryos, from
cleavage stage to the larval stage", stored in RNAlater) was extracted directly
with Trizol. The M. brevicollis library was constructed as described"' and
sequenced by 454 Life Sciences. All other libraries (Supplementary Table 7) were
sequenced on the Illumina platform, and prepared as follows. The 18-30-nucleotide RNAs were purified from total RNA (typically 5 pg) using denaturing
32
polyacrylamide-urea gels. Before purification, trace amounts of 5'- P-labelled
and
GGCAUUAACGCGGRNA size markers (AGCGUGUAGGGAUCCAAA
CCGCUCUACAAUAGUGA) were mixed with the total RNA and used to monitor this purification and subsequent ligations and purifications. The gel-purified
RNA was ligated to pre-adenylated adaptor DNA (AppTCGTATGCCGTCTTCTGCTTG-[3'-3' linkage]-T) using T4 RNA ligase (10units ligase, GE
Healthcare, 10 pl reaction, 50 pmol adaptor ATP-free ligase buffer", for 2 h at
21-23 C). Gel-purified ligation products were ligated to a 5'-adaptor RNA
(GUUCAGAGUUCUACAGUCCGACGAUC),
again using T4 RNA ligase (as
above, except with 20 units ligase, 15 Il reaction supplemented with 4 nmol
ATP, 400 pmol adaptor, for 18 h at room temperature). Gel-purified ligation
products were reverse-transcribed (SuperScript II, Invitrogen, 30 pl reaction
with the reverse transcription primer CAAGCAGAAGACGGCATA) and then
RNA was base-hydrolysed with addition of 5 pl of 1 M NaOH and incubation at
90 'C for 10 min, followed by neutralization with addition of 25 PIl 1 M HEPES,
pH 7.0, and desalting (Microspin G-25 column, Amersham). The resulting
cDNA library was amplified with the RT primer and PCR primer
(AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA) for a
sufficient number of cycles (typically -20) to detect (SYBR Gold, Invitrogen) a
clear band in a 90% formamide, 8% acrylamide gel, used for purification. Gelpurified amplicon (85-105nucleotides) from each library was subjected to
Illumina sequencing. The adaptor and primer sequences enabled cluster generation on the Illumina machine and placed a binding site for the sequencing primer
(CGACAGGTTCAGAGTTCTACAGTCCGACGATC) adjacent to the sequence
of the small RNA. Periodate-treated libraries were generated identically, except
total RNA was first subjected to p-elimination". Mock-treated libraries omitting
periodate were constructed in parallel.
MicroRNA identification and analysis. The N. vectensis, T. adhaerensand M.
720
were downloaded from JGI
brevicollis genomes and predicted gene sets'"
(http://jgi.doe.gov); the A. queenslandica genome was a preliminary assembly".
After removing the adaptor sequences, reads were collapsed to a non-redundant
set and matched to the appropriate genome. Genome matches were clustered if
neighbouring matches fell within either 50 nucleotides (Amphimedon,
Nematostella) or 500 nucleotides (Amphimedon) of each other. The increased
size of the clustering window used for the Amphimedon analysis (500 nucleotides) was necessary because the 50-nucleotide window was insufficient to
identify all Amphimedon miRNAs, owing to the increased size of their premiRNAs (Fig. 3e). No additional miRNAs were identified in Nematostellawhen
using a 500-nucleotide window. Sequences of clusters containing 17-25-nucleotide reads cloned at least twice were folded with RNAfold". Ifthe most frequently
sequenced species was located on one arm of a predicted hairpin and the region
of the hairpin corresponding to that sequence contained 16 base pairs, the
candidate locus was examined manually for characteristics of known miRNAs,
using criteria described in the main text. Before comparing between adult and
embryonic libraries (Fig. 3d), counts corresponding to each mature miRNA
from each library were first normalized by the total number of genome-matching
reads in that library.
To detect possible homology between previously known miRNAs and either
Nematostella or Amphimedon miRNAs, we searched miRBase (version 10.1) for
miRNAs similar to our new miRNAs. Because miRNA conservation is most
pronounced within the miRNA 5' region", we first identified any known and
new miRNAs that shared a hexanucleotide within their first eight nucleotides,
allowing two-nucleotide offsets. Because of the limited length of the search
sequence, and the large number of miRNAs in miRBase, most Nematostella or
Amphimedon miRNAs shared a hexanucleotide with miRBase miRNAs. For all
such cases, we then searched for extended similarity between the pairs of
miRNAs. With the exception of the miR-100 relationship, no more than chance
similarity was observed (Supplementary Fig. 1). However, we cannot rule out the
possibility that additional homologous relationships are present but undetectable. Because miRNAs are shorter than most other genetically encoded molecules, sequence divergence can more easily obscure homologous relationships,
and although they resist changes in the seed region, which is crucial for target
recognition, divergence in this 5' region can be accelerated with the processes of
sub- and neo-functionalization".
Piwi-interacting RNA identification and analysis. Nematostella 27-30-nucleotide RNAs and Amphimedon 24-30-nucleotide RNAs were mapped to their
respective genome, and at each matching locus counts were normalized, dividing
by the number of genome matches for the sequenced RNA. Regions with both a
high number of match-normalized reads (Nematostella:>1,000 per 10 kilobases;
Amphimedon: >100 per 5 kilobases) and a high diversity of read sequences
(Nematostella): >500 different sequences per 10 kilobases; Amphimedon: >50
different sequences per 5 kilobases) were identified; following the periodate
experiment we further evaluated these regions, which led to the removal of four
Amphimedon regions that had far fewer reads in the periodate-treated libraries.
The remaining regions are listed in Supplementary Tables 3 (Nematostella) and 4
(Amphimedon), which report the proportion of 5'-U match-normalized reads to
each strand and the ratio of match-normalized read counts in periodate-treated
compared to mock-treated libraries, after normalization for the number of gen3 6
ome-matching reads in each library. The number of predicted transcripts1'
overlapping genomic piRNA clusters (Supplementary Tables 3 and 4) was calculated and compared to the number overlapping 1,000 random sets equal in size
and number to the piRNA clusters. Inferred protein sequences from predicted
transcripts matching the greatest number of periodate-resistant, match-normalized reads were compared to annotated protein sequences using BLAST.
Transcripts that were significantly similar to annotated transposons, or protein
domains implicated as transposases (for example reverse transcriptases) were
considered to encode transposases. A random selection of 100 predicted transcripts was searched similarly to ascertain significance (Nematostella: 3 out of
100; Amphimedon: 6 out of 100). When mapping to annotated protein-coding
regions (Fig. 4b), reads with both sense and antisense matches were distributed
to both the sense and antisense tallies after weighting by the proportion of their
sense and antisense matches.
Cataloguing of the small RNA machinery. To identify homologues of components of the small RNA machinery, all established family members from H.
sapiens, D. melanogaster, C. elegans, S. pombe and A. thaliana were used as
BLAST query sequences against all annotated protein sequences of each species
in Table 1. The top-ranking hits resulting from these initial searches were used
reciprocally as query sequences against all annotated protein sequences of H.
sapiens,D. melanogaster,C. elegans, S. pombe and A. thaliana.If the top-ranking
hits of such reciprocal queries corresponded to an established family member,
the query sequence was considered to be a candidate homologue. The domain
structure of each candidate sequence was then evaluated", and candidates lacking the diagnostic domains were discarded. The diagnostic domains used were a
Paz and a Piwi domain (for Ago and Piwi family members), two RNase III
domains (Dicer and Drosha), a double-stranded RNA-binding domain
(Pasha) and a methylase domain (Hen1).
30. Adamska, M. et af. Wnt and TGF-p expression in the sponge Amphimedon
queenslandica and the origin of metazoan embryonic patterning. PLoS ONE 2,
e1031 (2007).
31. England, T. E.,Gumport, R.I.& Uhlenbeck, 0. C. Dinucleoside pyrophosphate are
substrates for T4-induced RNA ligase. Proc. Natl Acad. Sci. USA 74, 4839-4842
(1977).
32. Kemper, B. Inactivation of parathyroid hormone mRNA by treatment with
periodate and aniline. Nature 262, 321-323 (1976).
33. Hofacker, I.L.Fast folding and comparison of RNA secondary structures. Monatsh.
Chem. 125, 167-188 (1994).
34. Lim, L.P.etal. The microRNAs of Caenorhabditis elegans. Genes Dev. 17,991-1008
(2003).
35. Marchler-Bauer, A. et al. CDD: a conserved domain database for interactive
domain family analysis. Nucleic Acids Res. 35, D237-D240 (2007).
@2008 Macmillan Publishers Limited. All rights reserved
Loss of Cardiac microRNA-Mediated Regulation Leads to
Dilated Cardiomyopathy and Heart Failure
Prakash K. Rao, Yumiko Toyama, H. Rosaria Chiang, Sumeet Gupta, Michael Bauer,
Rostislav Medvid, Ferenc Reinhardt, Ronglih Liao, Monty Krieger, Rudolf Jaenisch,
Harvey F. Lodish, Robert Blelloch
Rationale: Heart failure is a deadly and devastating disease that places immense costs on an aging society. To
develop therapies aimed at rescuing the failing heart, it is important to understand the molecular mechanisms
underlying cardiomyocyte structure and function.
Objective: microRNAs are important regulators of gene expression, and we sought to define the global contributions
made by microRNAs toward maintaining cardiomyocyte integrity.
Methods and Results: First, we performed deep sequencing analysis to catalog the miRNA population in the adult
heart. Second, we genetically deleted, in cardiac myocytes, an essential component of the machinery that is
required to generate miRNAs. Deep sequencing of miRNAs from the heart revealed the enrichment of a small
number of microRNAs with one, miR-1, accounting for 40% of all microRNAs. Cardiomyocyte-specific deletion
of dgcr8, a gene required for microRNA biogenesis, revealed a fully penetrant phenotype that begins with left
ventricular malfunction progressing to a dilated cardiomyopathy and premature lethality.
Conclusions: These observations reveal a critical role for microRNAs in maintaining cardiac function in mature
cardiomyocytes and raise the possibility that only a handful of microRNAs may ultimately be responsible for the
dramatic cardiac phenotype seen in the absence of dgcr8. (Circ Res. 2009;105:585-594.)
Key Words: cardiac disease m cardiac failure u cardiomyocytes * myocardium m microRNA
A
into perspective the enormous regulatory potential possessed
by microRNAs. Not surprisingly, a number of studies have
revealed the importance of the microRNA pathway as a
whole, whereas others have pinpointed specific roles for
individual microRNAs in various tissues.7 - 18
Although mature microRNAs are only ~22 nucleotides in
length, they are generated from longer precursors whose
length distribution is similar to that of a mRNA. Indeed, the
primary transcripts (pri-miRNAs) are transcribed by RNA
Polymerase II, capped, polyadenylated, and regulated by
transcription factors like protein-coding mRNAs. 19 -21 Unlike
mRNAs, miRNAs-because of their stem-loop structure-are
cleaved within the nucleus by a Drosha/Dgcr8 containing complex into ~60- to 80-bp precursor miRNAs (premiRNAs). 22-24
The precursor miRNAs are transported out of the nucleus by
Exportin-5 25 and subsequently processed by a cytoplasmic
RNAseIII-Dicer 26-which also resides in a multiprotein
complex. Because the Piwi, Argonaut, and Zwille (PAZ)
domain of Dicer recognizes the 2-nucleotide 3'OH overhang
1 areRNAs,
known
small
noncoding
of endogenous
assubset
microRNAs
(miRNAs
or miRs),
~22 nucleotides long and modulate gene expression by targeting mRNAs
for posttranscriptional repression. There are nearly 500 and
800 microRNAs in mice and humans, respectively 2 (http://
microrna.sanger.ac.uk). In animals, repression is achieved
through imperfect base-pairing between the microRNA and
its target mRNA. Although there are certain rare instances in
which microRNAs have been reported to upregulate target
gene expression,3 4, repression is the most well-documented
direct effect. The target mRNA is rendered labile through
mechanisms involving deadenylation/decapping, translational
repression, or both. Target specificity is largely governed by
the highly conserved seed region (nucleotides 2 to 8) of the
miRNA. 5 Various target prediction programs have relied on
this fact, and an estimated 30% of the mRNAs are susceptible
to miRNA-mediated regulation. 6 Although this number is
likely an overestimate, as it does not take into account the
requirement for coexpression of miRNAs and mRNAs, it puts
Original received May 5, 2009; revision received July 30, 2009; accepted August 3, 2009.
From the Whitehead Institute for Biomedical Research (P.K.R., H.R.C., S.G., F.R., R.J., H.F.L.) and the Department of Biology (Y.T., M.K., R.J.,
H.F.L.), Massachusetts Institute of Technology, Cambridge; the Division of Cardiology (M.B., R.L.), Brigham and Women's Hospital, Harvard Medical
School, Boston, Mass; and the Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Biology, Department of Urology (R.M., R.B.),
University of California, San Francisco.
Correspondence to Harvey F. Lodish, PhD, Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142. E-mail
lodish@wi.mit.edu and Robert Blelloch, MD, PhD, The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, Center for
Reproductive Sciences, and Department of Urology, University of California, San Francisco, San Francisco, CA. E-mail blellochr@stemcell.ucsf.edu
@ 2009 American Heart Association, Inc.
Circulation Research is available at http://circres.ahajournals.org
DOI: 10.1161/CIRCRESAHA.109.200451
585
586
Circulation Research
September 11, 2009
Non-standard Abbreviations and Acronyms
Dgcr8
PGC1a
PGC1b
Myh6
Myh7
KO
DiGeorge syndrome critical region 8
PPARg coactivator-la
PPARg coactivator-1 b
myosin heavy chain 6
myosin heavy chain 7
knockout
generated by Drosha/Dgcr8, it is believed that the nuclear
Drosha/Dgcr8 cleavage is required for Dicer-mediated cytoplasmic cleavage of premiRNAs. Exceptions to this general
dependence on Drosha/Dgcr8 occur in miRtrons and endogenous short hairpin RNAs, 27- 29 and in these rare cases other
nucleases generate the necessary ends for subsequent Dicer
recognition and cleavage. Importantly, spatial segregation of
Drosha/Dgcr8 and Dicer substrates allows for the two cleavage events to occur in a sequential manner.
We sought to uncover the regulatory potential of miRNAs
in the heart by using 2 complementary approaches. First we
catalog the known miRNA population of murine adult heart
using deep (Solexa/Illumina) sequencing of a small RNA
library. Secondly, we disrupt microRNA regulation by deleting dgcr8 and hence canonical microRNA biogenesis. We
chose to focus on mature muscle tissue to establish the
importance of microRNA function in the maintenance (as
opposed to the development) of cardiac tissue. Mice lacking
dgcr8 in muscle tissue die prematurely with signs of heart
failure and dilated cardiomyopathy. Identification of the
depleted microRNAs in dgcr8-deficient hearts led to the
refined list of microRNA targets that may collectively play an
important role in the development of the pathological state.
Thus, the importance of the microRNA regulation in maintaining cardiomyocyte function is revealed by the fatal
outcome associated with lack of dgcr8 in cardiomyocytes.
Methods
Details are included in the supplemental materials (available online
at http://circres.ahajoumals.org). Briefly, a library of small RNAs
was generated and sequenced using the Illumina platform. 30 For the
generation of conditional dgcr8 knockout, floxed dgcr8 mice were
crossed with Muscle Creatine Kinase (MCK)-Cre mice31; mutant
mice were genotyped using tail DNA by a PCR-based approach.
Age- and sex-matched mutant (21ox/21ox; Cre positive) and control
(2lox/+; Cre positive) mice were analyzed pathologically; physiological studies were performed using telemetry and echocardiography. Molecular analyses were carried out using total RNA isolated
from the heart; Northern blots were used to detect depletion of
miR-1, miR-133, and miR-208. Array-based methods were employed to assess global loss of microRNAs.
Results
Deep Sequencing of microRNAs From
Heart Tissue
High throughput deep sequencing produces quantitative data
with an extensive dynamic range, thereby enabling detailed
insight into the relative levels of different microRNA in a
particular tissue. Therefore, to gain such insight into the
microRNA profile of the adult heart, we isolated small RNAs
(16 to 24 nucleotides) from 6- to 8-week-old male and female
hearts, built tagged cDNA libraries and sequenced the libraries on a Illumina Genome analyzer producing over 7 million
reads from each sample. As has been reported previously,
miR-1 and miR-133a were highly abundant (Figure 1A)however the relative abundance of miR-1 reads was quite
striking. miR-1 accounted for nearly 40% of all known
microRNA reads. Also noteworthy is the fact that other
microRNAs, including miR-29a, miR-26a, let-7 family members, were more abundant than miR-133a. MicroRNAs from
noncardiomyocytes (miR-29a and miR-29c from fibroblasts, 32 miR-126 from endothelial cells 9,33) also contributed
to the library as expected because the different cell types in
the heart were not separated. Within the cardiac-specific
miR-208 subsets, 50 to 100 times more reads were obtained
for miR-208a (encoded within an intron of myosin heavy
chain 6 [Myh6]) when compared to miR-208b (encoded
within an intron of Myh7), consistent with the relative
overexpression of Myh6 compared to Myh7 in adult mice.
miR-22 was highly expressed and showed gender-based
differences in expression levels. Although sexually dimorphic
gene expression patterns in somatic tissues 34 have been
established,34 follow-up experiments will need to be carried
out to confirm sex-based differences in miRNA expression in
the heart. Reads from the miR-378 hairpin were also high;
miR-378/378* (miR-378* is the same as miR-422b) is
encoded within an intron of the PPARg coactivator-lb
(PGC1b) gene. Because PPARg coactivator-la (PGCla) and
PGClb regulate mitochondrial biogenesis and the heart is a
mitochondria-rich organ, the high expression levels of miR378/miR-378* probably reflects the high endogenous levels
of PGClb transcription. Because the ability of a miRNA to
repress target gene expression is largely dependent on the 5'
end of the miRNA, multiple miRNAs with identical 5' ends
are expected to function in a similar manner. This seed
identity is the basis by which microRNAs are grouped into
families. Therefore, we tabulated all the microRNA reads
within individual families (as defined by TargetScan 4.1;
www.targetscan.org; Figure IB). By this analysis, the miR1/206 family still emerged as the most dominant microRNA
family (the reads from miR-206 were insignificant). Because
a considerable number of reads were obtained individually
from members of the let7/miR-98 and miR-30a-5p family,
these families were, respectively, the second and third most
abundant microRNA families in the heart.
Muscle-Specific Dgcr8 Knockout
The importance of the microRNA pathway during development has been largely inferred from studies in which Dicer
has been deleted.18, 35- 39 As dicer has roles outside of the
canonical miRNA pathway, we sought to block microRNA
maturation (and therefore microRNA-mediated regulation)
using another component of the microRNA biogenesis pathway, namely Dgcr8. Dgcr8 deletion in embryonic stem cells
has revealed that it is essential for microRNA biogenesis and
implicate microRNAs in regulating efficient ES-cell differentiation. 40 Using MCK-Cre mice 3 ' and a conditional floxed
allele of dgcr8, we generated mice with a muscle-specific
Rao et al
microRNAs in Adult Heart Maintenance
587
45
Umajle
O0femnale
40
.835
~30
.+25
430
C15
10
ikIk[frrn~nAtammrL
LI
E
E
E
40
35
0
.-
230
r-E ....
A _.
E
E
L
~ ~~~~~~ml
EU
E~ EEE=EaEE
B545
rn. ru r-L
EEEE
EEEEE
E
E
EE
.
Figure 1. microRNA abundance in the
murine adult heart. A, The top 20 known
microRNAs (interms of normalized read
number) from the male (dark bars) and
the female (clear bars) heart small RNA
libraries were converted to percentage
terms and plotted. Because the rank
order differs slightly between the male and
female libraries, the total number of
microRNAs plotted is greater than 20.
Note the abundance of miR-1 reads
relative to other known microRNAs. B,
microRNAs belonging to the same family (as defined by Targetscan; www.
targetscan.org) were summed and plotted together. This analyses reveals that
aside from miR-1, the let-7 and miR-30
families are among the ones that are
highly abundant in the heart.
E
C
0
0
IL15
r6im
Arh
o
rIIAA
deletion of the dgcr8 gene. Endogenous MCK expression
reportedly peaks around birth and declines to 40% of peak
levels by day 10.31 This Cre line was deliberately chosen to
match our interest in specifically disrupting microRNA biogenesis in mature differentiated muscle, as this allowed us to
determine the importance of the microRNA pathway in
muscle homeostasis. Genotyping analysis showed that although mutant (2lox/2lox; Cre positive) mice were slightly
underrepresented at the time of genotyping, most mutant mice
survived to at least 12 days after birth. We did not observe
any pathology on 4-chamber sections (H & E stained) at 2
weeks of age. At 3 weeks of age, we detected fibrosis in the
ventricular wall in all mice examined, and loss of ventricular
function (as revealed by transthoracic echocardiography-see
below). Subsequently, all mutant mice died before 2 months
of age and the median survival was 31 days (Figure 2A and
2B). At end stage, the hearts of mutant mice showed marked
decreases in the thickness of the left and right ventricular
walls. Therefore, the development of the pathology is quite
rapid and highly penetrant. This demonstrates the stringent
requirement for a threshold level of microRNAs below which
heart function rapidly deteriorates.
To determine the extent of microRNA depletion in the
heart, we performed Northern blot and quantitative RT-PCR
analyses to quantify cardiomyocyte-specific microRNAs
(Figure 2C) with RNA derived from the heart tissue of mutant
and control (2lox/+; Cre positive) mice. At the time of
sacrifice (when mutant mice were moribund), Northern blot
analysis showed that 3 cardiac-enriched mature microRNAs
(miR-1, miR-133a, and miR-208) were dramatically depleted,
but not completely absent, in mutant heart tissue. Quantification of the Northern blots revealed that depending on the
microRNA, the mature forms were depleted 10- to 60-fold
(Figure 2C, bottom). Their precursor miRNAs (the ~60-bp
product of Drosha/Dgcr8 cleavage) were detectable in the
control lanes and absent in the mutant lanes (Figure 2D;
...
.........
588
A
........
..
......
- - ..
..
........
Circulation Research
I"r o st e ice
September 11, 2009
miR-208
I
2oxilox
2ox/wt
IWt/Wt
mutant
B
control
10....... ....................................... ...
pre
C
2Iox/2Iox; Cre positive (n=42)
2lox/+; Cre positive (n=144)
+/+; Cre positive (n=76)
-
c
4-
J1II*M
mature
W
5075
100
Days
(n=42)
(n=144)
(n=76)
miR-1 (top) &U6 (bottom)
miR-133 (top) &U6 (bottom)
miR-208 (top) & US (bottom)
cnntrol
mutant
rnntlr
mutant
Figure 2. Lethality and microRNA expression in musclespecific dgcr8 KO mice. A, Actual numbers of Crepositive mice (and the expected numbers-based on
OfM-1 P=0,0148
m
33, p
oR-1
s0.00s
DM-208S P=0.0691
4 2
3
3
2
o aRNA
Ol
0.01
1
21OX21O h~r
21x/+
Punnett square analysis for 2 independent loci) obtained
from matings between 21ox/+; Cre-positive mice are
shown. Expected numbers are based on the assumption that Cre transgene is heterozygous, although this is
not known. B, Postnatal lethality of muscle-specific
dgcr8 KO mice. Survival curves for Cre-positive mice
are shown and reveal the lethality when dgcr8 is excised
in muscle tissue. Moribund "hunched over" mice that
had to be euthanized because of animal care committee
specifications were considered dead for survival analysis. Survival curves were plotted using a built-in module
in Prism software. C, miR-1, miR-133, and miR-208
expression was determined using Northern blots from total
derived from heart tissue. Tissues from 3 mutant
(2ox/2lox; Cre-positive) and 3 sex- and age-matched control (21ox/+; Cre-positive) siblingswerpysed for total RNA
ages were 29 days (males), 29 days and 38 days
(females). The same blots were reprobed for U6 (bottom
~rtisolation;
part of each se) to normalize for differences inloading.
Quantified miR/U6 ratios are plotted below for miR-1, miR-1 33, and miR-208, and the indicated probability values were obtained using 2-sample
(unequal variance) 1-tailed t-tests. D,A larger region of the miR-208 Northern blot shown in Figure 2C(iii) reveals the absence of premiR-208 intotal
RNA derived from the mutant heart.
shown for miR-208). The complete loss of the short-lived
~60-bp precursor, but not mature miRNA, favors the argument that the residual amount of mature miRNA detected is
attributable to its long half-life, rather than an incomplete
excision of dgcr8 in these tissues.
The hearts of the mutant animals exhibited a variety of
abnormalities that suggest cardiac dysfunction was responsible for their premature death. Preliminary ECG analysis of
revealed dramatic drops in the heart rate of mutant mice along
with an increased PQ interval and QRS width (all at end
stage) indicative of a cardiac conduction defect (supplemental
Figure IV). Histopathologic analysis revealed that the hearts
obtained from end-stage mutant mice were considerably
enlarged with notable thinning of the ventricular walls (Figure 3A and 3B; note end-stage mutant hearts). Fibrosis was
also evident (Figure 3C), an early and consistent pathological
...
............
.
...........................
NOW-
"
.n"
c
4$
conift
13
' 011
M~t
@C VD
r-?I
eIsmo
Co"
M'Al
4
I
0
J#/J4//0
tmwwb
finding as it is observed in all mice at about 3 weeks of age
(at which time there was no histopathologically obvious
defect in the thickness of the ventricular wall; Figure 3B and
3C). Quantitative RT-PCR analyses of cardiomyocytespecific microRNAs was also carried out at end stage and at
2 weeks after birth. Precipitous decreases in miR-1, miR133a, and miR-208 levels was detected in 2-week-old mice
(Figure 3D), and this preceded any pathophysiological
changes that we observed.
To assess left ventricular function, we performed echocardiography. We conducted these studies at 2 time points: 3
weeks and 4 weeks after birth as histopathologic analysis
showed a dramatic progression between these 2 time points
from mild fibrosis with otherwise no overt ventricular/wall
defects (at 3 weeks) to extensive dilation (at 4 weeks).
Accordingly, measurement of fractional shortening (FS) revealed that the mutant mice had dramatically reduced ventricular function at 4 weeks (supplemental Figure VI), although wall thickness was not significantly different. This
finding was not surprising considering the clear histopathologic defects at this time point. The expectation at 3 weeks
(Figure 4B) was more ambiguous because we noted fibrosis
at this point but did not see an obvious defect in wall
thickness or ventricular volume in tissue sections. However,
echocardiography at 3 weeks after birth revealed that ventricular function (as assessed by FS readings) was decreased in
mutant mice, and the trend toward increased ventricular
volume was already evident (see numbers for EDD at 3 weeks
in the table in Figure 4B).
Given the defects in ventricular function, one plausible
explanation is that the myofibrillar apparatus was disorganized to the extent that contraction was ineffective. Such
disarray has been noted in mice bearing a cardiac-specific
loss of function allele of dicer.' 3 Ultrastructural analysis
(supplemental Figure V) revealed mild myofibrillar disarray
mostly related to misalignment of the contractile apparatus.
To determine whether pathology-associated cardiac markers
n
589
microRNAs in Adult Heart Maintenance
Rao et al
A
...........
Figure 3. A,Intact excised hearts from 30-day-old
mutant (2lox/2lox: Cre-positive; left) and control
(2lox/+; Cre-positive; right) female sibling mice. B,
Representative long axis sections at different
stages (as indicated) from mutant and control sexmatched sibling mice were stained with H & E (for
end stage: LV indicates left ventricle; RV, right
ventricle) or Masson Trichrome (for d14/15 & d21/
22). Bar=500 pm. The stage at which the mice
were euthanized to reveal end-stage pathology
was variable and defined by the health status of
the mice and is d34 in this panel. C, High magnification (20x) view of Masson Trichrome stained
sections from 3-week and end-stage (d43 in this
panel) mutant and control female sibling mice
(interstitial blue staining, bright green arrows)
is indicative of fibrotic collagen deposits.
Bars=50 ym. D, RT-PCR assay to detect mature
miR-1, miR-133a, and miR-208 levels showing that
the decline is already evident at 2 weeks after birth
and continues to decrease by the time the mice
are moribund ("end stage"). The ratio of mutant to
control is shown on the y axis, and the pairs chosen for evaluation were age- and sex-matched.
t
are expressed and fetal genes are activated, we performed
real-time RT-PCR analysis. Nppa and Nppb were expressed
at higher levels in mutant heart (Figure 5). Myh7, a fetal
myosin whose reexpression in adulthood is associated with
heart failure, was also expressed at higher levels in the mutant
A
(2lx/21ox;Cre pos)
n=
WTmm
(2lox/+; Cre pos)
n=3
n=
-903
Figure 4. Trans-thoracic echocardiography. A, Representative
short axis B and M mode images for both mutant (left) and control (right) mice at 3 weeks of age showing dilation in the mutant
mice. B, Summary of echocardiographic data 3 weeks and 4
weeks after birth showing progressive dilation and reduction in
ventricular function (see numbers for EDD and FS, respectively)
between 3 and 4 weeks in mutant mice. WT indicates wall
thickness; EDD, end-diastolic diameter; ESD, end-systolic diameter; FS, fractional shortening; HR, heart rate. *P<0.05 vs
2lox/+; Cre pos; tP<0.05 vs 3 weeks.
..............
590
Circulation Research
September 11, 2009
Figure 5. Gene expression
patterns in dgcr8 KO heart.
Total RNA was obtained from
end-stage (mutant) and sexand age-matched control
mice. A minimum of 7 pairs
-i -i iihearts. Myh6, the normal adult cardiac myosin, within which
miR-208 is encoded, was expressed at similar levels in
control and mutant hearts; thus the decrease seen in mature
miR-208 is not attributable to differences in the regulation of
the host gene. These molecular assays complement the
pathological and echocardiographic observations and are
consistent with a diagnosis of dilated cardiomyopathy.
Next, we isolated RNA from mutant and control hearts to
examine the expression of marker genes expressed in striated
muscle (Figure 5). Cardiac markers were uniformly low in the
mutant heart. In contrast, fast skeletal muscle markers were
uniformly upregulated. One of the 3 slow skeletal markers
(Tnnil) was also upregulated, whereas 2 (Tnntl and Tnncl)
were not; intriguingly Tnnil also has a miR-133 binding site
in its 3' untranslated region, and part of its upregulation may
be attributable to the loss of miR-133. The upregulation of
skeletal muscle genes has been previously noted in other miR
knockout mice.8 10 As misexpression of skeletal muscle isoforms in the heart can lead to impaired cardiac function,41 at
least part of the observed pathology may be attributed to
increased expression of fast skeletal muscle transcripts at the
expense of cardiac genes.
An array-based profiling approach was carried out to compare
relative levels of mature microRNAs in RNA derived from the
hearts of mutant and control mice. MicroRNAs that are less
abundant in the mutant heart when compared to the control
heart are likely to be those enriched in cardiomyocytes (as
dgcr8 is knocked out only in cardiomyocytes). Hence this
analysis allows us to indirectly detect cardiomyocyteenriched microRNAs. As expected, we detected precipitous
declines in the levels of cardiomyocyte-specific miR-1, miR133, miR-208 and miR-499 in the mutant hearts (supplemental Figure VII, compare to Figure 2C). Others that were
decreased by greater than 2-fold, and therefore likely to be
was used for the analysis of
relative expression levels of
the indicated genes. Box and
whisker plots are represented
for each gene. Expression is
depicted as a ratio of mutant
over control, with each being
first normalized to GAPDH to
account for differences in total
amount of RNA used. Lack of
a difference should manifest
itself as having a ratio of 1.0
(dotted line). Although it is
appreciated that some genes
may fall into 2 or more categories during the embryonic and
postnatal development, for
convenience they are grouped
into a single category (as indicated) that is representative of
adult gene expression.
enriched within the cardiomyocytes include miR-378/miR378* (aka miR-422b), miR-22, miR-486, miR-30e*, miR149, miR-709, miR-345, and members of the miR-30a-5p
family (supplemental Figure VII).
To uncover the scope of regulation that is disrupted by the
loss of the microRNA pathway, we carried out an in silico
analysis. We chose 10 microRNAs that were downregulated
the most and used Targetscan to obtain a target list of mRNAs
with conserved miRNA binding sites. Next, we extracted a
published dataset42 that had compiled the list of genes that are
expressed in the human heart. The intersection of these 2 lists
(supplemental Table I) provided us with a list of genes whose
expression could be upregulated in the hearts of mutant mice.
This analysis suggests that approximately 14% (1140/7896)
of the genes expressed in the heart could be potentially
upregulated because of the loss of these 10 microRNAs that
we determined to be cardiomyocyte-enriched. Included
among this list of targets are genes that are involved in GPCR
signaling (endothelin receptors), calcium signaling (Calcineurin subunits), smooth muscle contraction (Mylk), and
calcification (Runx2). Thus it is likely that the complex
phenotype is at least in part attributable to the misexpression
of a subset of these genes.
Discussion
miR-1 and miR-133a
Although many studies utilizing intertissue comparisons can
attest to the abundance of miR-1 within muscle tissue, our
study, by focusing on the intratissue abundance, has revealed
a wide disparity between the levels of miR-1 and all the other
microRNAs. This is especially significant when we compare
miR-1 to miR-133 which are coregulated,43 (albeit differentially spliced") microRNAs. These results suggest that mechanisms other than transcription (eg, processing or stability)
Rao et al
can dramatically alter steady state levels of mature miRNAs.
Recent evidence from other labs have shown such posttranscriptional regulation for let-7 and miR-21. 45-48 A second
testament to the relative abundance of miR- 1 is evident when
comparing the levels of mature miR-1 and miR-208a. Because miR-208a is resident within a cardiac myosin, its levels
should be representative of a highly transcribed miRNA. The
fact that miR-1 levels far exceed that of miR-208a provides
further indirect evidence for posttranscriptional mechanisms
governing microRNA stability. Lastly, as we are sampling a
multitude of cell types in the heart, it is possible that within
cardiomyocytes, the percentage of miR-1 in relation to other
microRNAs is even higher, and further studies using purified
cardiomyocytes will be needed to verify this possibility. The
very high levels of miR-1 suggests it plays a central role in
sustaining heart muscle function; indeed previous analysis of
a miR-1-2 knockout mouse8 confirmed the importance of
miR-l dosage in maintaining proper cardiac function. The
homozygous loss of miR-1-2, 1 of 2 mir-1 loci, causes
multiple defects in heart function. 8 We await the generation
of appropriate conditional knockout mice lacking both miR1-1 & miR-1-2 mice to ascertain its singular importance in
adult mouse myocardium. However, recent findings evaluating the mir-133a knockout mice clearly define its importance
in cardiac development. Taken together, miR-1 and miR-133
appear to be attractive candidates for rescuing the phenotype
associated with Dgcr8 loss.
Our sequencing revealed that the miRBase annotation for
miR-133a is offset by one nucleotide at the 5' end. The
mature miRNA (which is the one that has the most common
5' end and is read most often) is UUGGUCCCCUUCAACCAGCUGU; the miRBase annotated miR-133a is UUUGGUCCCCUUCAACCAGCUG. Given the importance of the
5' end in determining target repression, this also changes the
putative targets that may be repressed by miR-133a. We did
obtain significant number of counts for the miR-133 species
annotated on miRbase; however, based on our criteria for
miRNA classification, we denote the species with one less U
as the mature miRNA. Other independent sequencing data
(HR Chiang, unpublished data, 2009) confirm this 5' heterogeneity of mature miR-133a.
We also performed microarrays to determine the fold
change in microRNA levels in mutant versus wild-type
hearts. This strategy is particularly powerful as it enables the
specific identification of microRNAs that are present in the
cells expressing the cre transgene. Of note, the ranking of
genes that were most dramatically reduced in the mutant
hearts as determined by microRNA microarrays did not
directly match with the relative amounts of individual microRNAs uncovered in the sequencing data. This difference is
likely the result of a number reasons. First, the sequencing
data represents the miRNAs in all the cell types of the heart,
rather the cells expressing the transgene. Second, the absolute
level of any microRNA will not necessarily correlate with the
fold decrease after Dgcr8 loss as different miRNAs in the
cardiomyocytes will almost certainly have different halflives. Third, we isolated RNA from the hearts of slightly
different aged mice for sequencing (6 to 8 week) versus the
microarrays (4 to 5 weeks). Hence age-related differences
microRNAs in Adult Heart Maintenance
591
may partly explain the differences between the array and the
sequencing data sets.
Phenotype of Muscle-Specific Dgcr8
Knockout Mice
We have used a loss of function allele of dgcr8 to uncover the
importance of the microRNA pathway in cardiac integrity.
The phenotypic outcome is similar to the cardiac-specific
dicer deficient mice 18 and this similarity in phenotypes has
also been shown in mice bearing conditional alleles of dgcr8
and dicer in the skin. 49 However, dgcr8 deficient mice have
an advantage over dicer deficient as the former can potentially be rescued by a Dicer-substrate short hairpin RNA
designed to produce mature miRNAs. Hence it will be
possible to define, in a fairly straightforward manner, the
"minimal microRNA" requirements for different cell types
derived from these mice. This approach has been successfully
used to reveal microRNAs important in the cell cycle regulation of murine ES cells, 5 0 and such an approach should be
feasible in other cell types too.
The results from the muscle-specific dgcr8 knockout mice
demonstrate the essential role of microRNA regulation in
cardiac function. Although we have not identified the root
cause of dilated cardiomyopathy and HF in our mice, our data
clearly demonstrates a role for the microRNA pathway in
proper functioning of the heart. Changes in ventricular
diameters were further visible in transthoracic echocardiography. This resulted in a significant decrease in mutant
ventricular function as assessed by fractional shortening at 4
weeks of age compared to control littermates. Furthermore,
transthoracic echocardiography revealed functional deterioration was already present at 3 weeks of age with mutant mice
showing markedly reduced fractional shortening. This functional deterioration preceded dilation seen at 4 weeks of age
(Figure 4). Heart rates and wall thickness were not significantly different in mutant mice (both at 3 weeks and at 4
weeks) supporting our histological observations that ruled out
a hypertrophic phase prior to dilation. Ventricular walls from
mutant mice did, however, exhibit a thinning trend that was
detectable by echocardiography at 4 weeks (Figure 4B) and
was very obvious histologically in end-stage mice (Figure
3B). Changes in microRNA levels have been noted to be
secondary consequences of a stressed heart. 51 We demonstrate that cardiomyocyte-specific microRNA levels are depleted before the occurrence of pathophysiological changes
(Figure 3D). These data are consistent with the microRNA
loss being causative and representing a primary event in the
emergence of the phenotype we have described.
In comparison to the other single-microRNA knockouts,8 10
, 5, 2 the dgcr8 knockout exhibits a much more severe
and penetrant phenotype. This is to be expected as a number
of microRNAs are affected. Using a candidate gene approach
and incorporating results from previously published work
with the miR-208 / mice,10 we interrogated and detected the
upregulation of several fast skeletal muscle genes in the heart.
As cardiac muscle is more akin to a "slow" muscle the
aberrant activation of fast skeletal genes could be pathological. Even though myofibrillar proteins are homologous, each
striated muscle tissue has evolved to meet its particular needs
592
Circulation Research
September 11, 2009
and previous studies have demonstrated that cardiac-specific
overexpression of skeletal muscle specific protein can cause
loss of cardiac function.4 ' Clearly, a widespread increase in
skeletal gene expression, suggested by our candidate gene
analysis (Figure 5), could contribute to the loss of cardiac
function. Another aspect of pathological remodeling is the
reestablishment of a fetal gene program in failing cardiomyocytes. Clearly this is also a consequence of Dgcr8 loss as
exemplified by an increase in Myh7, a fetal myosin.
Comparison With Other microRNA Deficient Mice
In comparing our phenotype to the heart-specific Dicerdeficient mice (using the a-MHC promoter driven Cre)' we
note the following important differences. We always detect
fibrosis (Figure 3C) and see marked increases in MYH7
expression (Figure 5) in mutant mice. These could be attributable to differences in the timing of Cre-mediated excision,
implying that loss of microRNA regulation at different times
lead to different phenotypic outcomes. We also note that the
recent report describing the knockout of dicer in the adult
myocardium18 shows a broadly similar, but not an identical
phenotype, to the dgcr8 knockout (KO) mice described
herein. Importantly, we did not detect any hypertrophy during
routine pathological staining, and the pathology of our mutant
mice is more consistent with dilated cardiomyopathy that
eventually leads to a phenotype that resembles human heart
failure. Becuase the timing of the cre-mediated excision can
cause different phenotypic outcomes, 18 direct comparison of
the two dicer KO studies with this dgcr8 KO (using different
promoters driving Cre) is complicated; nonetheless all these
studies point to the clear importance for the microRNA
pathway in cardiac function. A recent report from the Olson
group reported the phenotype of a complete knockout of
miR-133a.52 Pathologically, the dgcr8 muscle KO mice are
very similar to the fraction of 133a-lk", 133aa-2k"(133a dKO)
mice that survive to 2 or 4 months of age with fibrosis and
ventricular wall thinning as common features. Similar to the
133adKO mice, we did not detect any gross hypertrophy
before the advent of dilation; however, in contrast to the 133a
dKO mice, we detect only mild myofibrillar disarray in our
ultrastructural analysis. In both the 133a dKO mice and the
cardiac-specific dicer KO mice, deletion of the cognate
genomic region occurs early and hence may have a more
profound effect on the arrangement of myofibrils. When
dgcr8 expression (and therefore microRNA-mediated regulation) is perturbed after the establishment of the myofibrillar
array, the requirement for an intact microRNA pathway may
be less stringent (as is the case here). Nonetheless, overall
pathological similarity suggests that one reason for the
myopathy seen in the dgcr8 mice could be its lack of mature
miR-133a. Fast skeletal gene expression has also been noted
in the hearts of miR-208-'- mice and this particular phenotypic characteristic could be a consequence of low levels of
miR-208. However, miR-208-1- mice do not lose cardiac
function, and additional microRNAs have to be implicated in
describing the complete phenotype associated with loss of
Dgcr8 in the heart.
Cardiac Heterochrony as a Possible Mechanism
Underlying the Dramatic Phenotype
An equally feasible and alternative (albeit speculative) explanation for the sustained expression of fetal gene markers in
adulthood is that the lack of the microRNA pathway leads to
an arrest in the development of the heart, such that late
embryonic/prenatal gene expression patterns are maintained
in the adult. Indeed, such heterochronic events in C elegans
were instrumental in the identification of the founding members of microRNAs, 53"- and recent reports have confirmed
that this heterochronic pathway is conserved. 56- 58 Two candidate genes that we have examined are normally repressed in
adult tissues but continue to be expressed when Dgcr8 is
absent: Myh7 (a fetal myosin) and Tnni (a slow skeletal
muscle-specific troponin-complex subunit). Under normal
physiological conditions, Tnni levels are downregulated in
the heart after birth,59 but it continues to be expressed at
relatively higher levels in the dgcr8 KO heart. Importantly,
Tnni, as opposed to Myh7, is not expressed at higher levels in
a pathological state, 59 and therefore its overexpression cannot
be attributed to the cardiac reprogramming that occurs secondary to a failing heart. This observation suggests that the
lack of the dgcr8 gene can cause a heterochronic phenotype
(which in turn is incompatible with adult heart function). In
addition, our in silico analysis (supplemental Table I) suggests that a large number of genes that are expressed in the
heart are susceptible to microRNA-mediated regulation.
Comparison of temporal mRNA expression profiles in mutant
mice and wild-type mice will aid in the identification of the
primary targets of the microRNA pathway as well as provide
evidence for the existence of cardiac heterochrony in the
mutant mice.
In summary, we have, through complementary approaches,
ascertained the importance of the microRNA pathway in
maintaining cardiomyocyte function. We have identified high
abundance microRNAs in the heart by performing intratissue
comparisons and the preeminent position of miR-1 within
muscle tissue has been quantitatively established. In addition,
the requirement of the microRNA pathway in cardiac muscle
maintenance has been unequivocally established (as its absence is lethal). Based on the results described, we suggest 2
distinct but related mechanisms to explain the drastic loss of
cardiac function. The first one implicates fast skeletal muscle
gene expression as a plausible causative factor in loss of
cardiac function. Second, the loss microRNA function maybe
causing cardiac heterochrony, which ultimately leads to heart
failure. In addition, our data from the deep sequencing
suggests that loss of a few microRNAs-including miR-1 and
miR-133a may ultimately be responsible for the dramatic loss
of function seen in Dgcr8 deficient cardiomyocytes.
Acknowledgments
We thank Ron Kahn (Joslin Diabetes Center) for the MCK-Cre line
and members of the Lodish and Bartel laboratory for their insightful
comments. We are indebted to Carsten Russ and the Broad Sequencing Platform for carrying out Illumina/Solexa sequencing runs at the
Broad Institute.
Sources of Funding
This work was supported by the following grants: PKR (Muscular
Dystrophy Association-3882), RB (NIH K08 NS48118-01, NIH
Rao et al
RO1 NS057221, Stem Cell Research Foundation and the Pew
Scholars Program in the Biomedical Sciences); HFL (NIH-ROt
DK068348-04 and a SPARC grant from the Broad Institute); MK
(NIH-HL52212); HFL & MK (NIH/NHLBI P01 - HL066105); RJ
(NIH RO1-CA087869, NIH R37-CA084198, and NIH RO1HD0445022); RL (NIH ROls, HL071775, HL088533, HL090884,
and HL093148); MB (Max Kade Foundation, Austria); HRC (NIH
RO1 GM067031 to David Bartel).
Disclosures
None.
References
1. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function.
Cell. 2004; 116:281-297.
2. Griffiths-Jones S, Saini HK, van Dongen S, Enright AL. miRBase: tools
for microRNA genomics. Nucleic Acids Res. 2008;36:D154-8.
3. Vasudevan S, Tong Y, Steitz JA. Switching from repression to activation:
microRNAs can up-regulate translation. Science. 2007;318:1931-1934.
4. Orom UA, Nielsen FC, Lund AH. MicroRNA-10a binds the 5'UTR of
ribosomal protein mRNAs and enhances their translation. Mol Cell.
2008;30:460-471.
5. Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB. Prediction
of mammalian microRNA targets. Cell. 2003;115:787-798.
6. Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked
by adenosines, indicates that thousands of human genes are microRNA
targets. Cell. 2005;120:15-20.
7. Xiao C, Calado DP, Galler G, Thai TH, Patterson HC, Wang J, Rajewsky
N, Bender TP, Rajewsky K. MiR-150 controls B cell differentiation by
targeting the transcription factor c-Myb. Cell. 2007;131:146-159.
8. Zhao Y, Ransom JF, Li A, Vedantham V, von Drehle M, Muth AN,
Tsuchihashi T, McManus MT, Schwartz RJ, Srivastava D. Dysregulation
of cardiogenesis, cardiac conduction, and cell cycle in mice lacking
miRNA-1-2. Cell. 2007;129:303-317.
9. Wang S, Aurora AB, Johnson BA, Qi X, McAnally J, Hill JA, Richardson
JA, Bassel-Duby R, Olson EN. The endothelial-specific microRNA
miR-126 governs vascular integrity and angiogenesis. Dev Cell. 2008;15:
261-271.
10. van Rooij E, Sutherland LB, Qi X, Richardson JA, Hill J, Olson EN.
Control of stress-dependent cardiac growth and gene expression by a
microRNA. Science. 2007;316:575-579.
11. Thai TH, Calado DP, Casola S, Ansel KM, Xiao C, Xue Y, Murphy A,
Frendewey D, Valenzuela D, Kutok JL, Schmidt-Supprian M, Rajewsky
N, Yancopoulos G, Rao A, Rajewsky K. Regulation of the germinal
center response by microRNA-155. Science. 2007;316:604-608.
12. Yi R, O'Carroll D, Pasolli HA, Zhang Z, Dietrich FS, Tarakhovsky A,
Fuchs E. Morphogenesis in skin is governed by discrete sets of differentially expressed microRNAs. Nat Genet. 2006;38:356-362.
13. Chen JF, Murchison EP, Tang R, Callis TE, Tatsuguchi M, Deng Z, Rojas
M, Hammond SM, Schneider MD, Selzman CH, Meissner G, Patterson
C, Hannon GJ, Wang DZ. Targeted deletion of Dicer in the heart leads to
dilated cardiomyopathy and heart failure. Proc Natl Acad Sci U SA.
2008;105:2111-2116.
14. Finnegan EJ, Margis R, Waterhouse PM. Posttranscriptional gene
silencing is not compromised in the Arabidopsis CARPEL FACTORY
(DICER-LIKEl) mutant, a homolog of Dicer-1 from Drosophila. Curr
Biol. 2003;13:236-240.
15. Cuellar TL, Davis TH, Nelson PT, Loeb GB, Harfe BD, Ullian E,
McManus MT. Dicer loss in striatal neurons produces behavioral and
neuroanatomical phenotypes in the absence of neurodegeneration. Proc
Natl Acad Sci U S A. 2008;105:5614-5619.
16. Kobayashi T, Lu J, Cobb BS, Rodda SJ, McMahon AP, Schipani E,
Merkenschlager M, Kronenberg HM. Dicer-dependent pathways regulate
chondrocyte proliferation and differentiation. Proc Nat Acad Sci U S A.
2008;105:1949-1954.
17. Koralov SB, Muljo SA, Galler GR, Krek A, Chakraborty T, Kanellopoulou C, Jensen K, Cobb BS, Merkenschlager M, Rajewsky N,
Rajewsky K. Dicer ablation affects antibody diversity and cell survival in
the B lymphocyte lineage. Cell. 2008;132:860-874.
18. da Costa Martins PA, Bourajjaj M, Gladka M, Kortland M, van Oort RJ,
Pinto YM, Molkentin JD, De Windt LJ. Conditional dicer gene deletion
in the postnatal myocardium provokes spontaneous cardiac remodeling.
Circulation.2008;118:1567-1576.
microRNAs in Adult Heart Maintenance
593
19. Rao M. Conserved and divergent paths that regulate self-renewal in
mouse and human embryonic stem cells. Dev Biol. 2004;275:269-286.
20. Cai X, Hagedorn CH, Cullen BR. Human microRNAs are processed from
capped, polyadenylated transcripts that can also function as mRNAs.
RNA. 2004;10:1957-1966.
21. Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN. MicroRNA
genes are transcribed by RNA polymerase II. EMBO J. 2004;23:
4051-4060.
22. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark
0, Kim S, Kim VN. The nuclear RNase Ill Drosha initiates microRNA
processing. Nature. 2003;425:415-419.
23. Denli AM, Tops BB, Plasterk RH, Ketting RF, Hannon GJ. Processing of
primary microRNAs by the Microprocessor complex. Nature. 2004;432:
231-235.
24. Gregory RI, Yan KP, Amuthan G, Chendrimada T, Doratotaj B, Cooch N,
Shiekhattar R. The Microprocessor complex mediates the genesis of
microRNAs. Nature. 2004;432:235-240.
25. Yi R, Qin Y, Macara IG, Cullen BR. Exportin-5 mediates the nuclear
export of pre-microRNAs and short hairpin RNAs. Genes Dev. 2003;17:
3011-3016.
26. Bernstein E, Caudy AA, Hammond SM, Hannon GJ. Role for a bidentate
ribonuclease in the initiation step of RNA interference. Nature. 2001;
409:363-366.
27. Babiarz JE, Ruby JG, Wang Y, Bartel DP, Blelloch R. Mouse ES cells
express endogenous shRNAs, siRNAs, and other Microprocessorindependent, Dicer-dependent small RNAs. Genes Dev. 2008;22:
2773-2785.
28. Okamura K, Hagen JW, Duan H, Tyler DM, Lai EC. The mirtron
pathway generates microRNA-class regulatory RNAs in Drosophila. Cell.
2007;130:89-100.
29. Ruby JG, Jan CH, Bartel DP. Intronic microRNA precursors that bypass
Drosha processing. Nature. 2007;448:83-86.
30. Grimson A, Srivastava M, Fahey B, Woodcroft BJ, Chiang HR, King N,
Degnan BM, Rokhsar DS, Bartel DP. Early origins and evolution of
microRNAs and Piwi-interacting RNAs in animals. Nature. 2008;455:
1193-1197.
31. Bruning JC, Michael MD, Winnay JN, Hayashi T, Horsch D, Accili D,
Goodyear LJ, Kahn CR. A muscle-specific insulin receptor knockout
exhibits features of the metabolic syndrome of NIDDM without altering
glucose tolerance. Mol Cell. 1998;2:559-569.
32. van Rooij E, Sutherland LB, Thatcher JE, DiMaio JM, Naseem RH,
Marshall WS, Hill JA, Olson EN. Dysregulation of microRNAs after
myocardial infarction reveals a role of miR-29 in cardiac fibrosis. Proc
NatI Acad Sci USA. 2008;105:13027-13032.
33. Fish JE, Santoro MM, Morton SU, Yu S, Yeh RF, Wythe JD, Ivey KN,
Bruneau BG, Stainier DY, Srivastava D. miR-126 regulates angiogenic
signaling and vascular integrity. Dev Cell. 2008;15:272-284.
34. Rinn JL, Snyder M. Sexual dimorphism in mammalian gene expression.
Trends Genet. 2005;21:298-305.
35. Bernstein E, Kim SY, Carmell MA, Murchison EP, Alcorn H, Li MZ,
Mills AA, Elledge SJ, Anderson KV, Hannon GJ. Dicer is essential for
mouse development. Nat Genet. 2003;35:215-217.
36. Harfe BD, McManus MT, Mansfield JH, Hornstein E, Tabin CJ. The
RNaselI enzyme Dicer is required for morphogenesis but not patterning
of the vertebrate limb. Proc Natl Acad Sci U S A. 2005;102:
10898-10903.
37. Schaefer A, O'Carroll D, Tan CL, Hillman D, Sugimori M, Llinas R,
Greengard P. Cerebellar neurodegeneration in the absence of
microRNAs. J Exp Med. 2007;204:1553-1558.
38. Chen JF, Mandel EM, Thomson JM, Wu Q, Callis TE, Hammond SM,
Conlon FL, Wang DZ. The role of microRNA-1 and microRNA-133 in
skeletal muscle proliferation and differentiation. Nat Genet. 2006;38:
228-233.
39. O'Rourke JR, Georges SA, Seay HR, Tapscott SJ, McManus MT,
Goldhamer DJ, Swanson MS, Harfe BD. Essential role for Dicer during
skeletal muscle development. Dev Biol. 2007;311:359-368.
40. Wang Y, Medvid R, Melton C, Jaenisch R, Blelloch R. DGCR8 is
essential for microRNA biogenesis and silencing of embryonic stem cell
self-renewal. Nat Genet. 2007;39:380-385.
41. Huang QQ, Feng HZ, Liu J, Du J, Stull LB, Moravec CS, Huang X, Jin
JP. Co-expression of skeletal and cardiac troponin T decreases mouse
cardiac function. Am J Physiol Cell Physiol. 2008;294:C213-C22.
42. Hannenhalli S, Putt ME, Gilmore JM, Wang J, Parmacek MS, Epstein JA,
Morrisey EE, Margulies KB, Cappola TP. Transcriptional genomics asso-
594
43.
44.
45.
46.
47.
48.
49.
50.
51.
Circulation Research
September 11, 2009
ciates FOX transcription factors with human heart failure. Circulation.
2006;114:1269-1276.
Rao PK, Kumar RM, Farkhondeh M, Baskerville S, Lodish HF.
Myogenic factors that regulate expression of muscle-specific
microRNAs. Proc Natl Acad Sci U S A. 2006;103:8721-8726.
Liu N, Williams AH, Kim Y, McAnally J, Bezprozvannaya S, Sutherland
LB, Richardson JA, Bassel-Duby R, Olson EN. An intragenic MEF2dependent enhancer directs muscle-specific expression of microRNAs 1
and 133. Proc Natl Acad Sci U S A. 2007;104:20844-20849.
Davis BN, Hilyard AC, Lagna G, Hata A. SMAD proteins control
DROSHA-mediated microRNA maturation. Nature. 2008;454:56-61.
Heo I, Joo C, Cho J, Ha M, Han J, Kim VN. Lin28 mediates the terminal
uridylation of let-7 precursor MicroRNA. Mol Cell. 2008;32:276-284.
Newman MA, Thomson JM, Hammond SM. Lin-28 interaction with the
Let-7 precursor loop mediates regulated microRNA processing. RNA.
2008;14:1539-1549.
Viswanathan SR, Daley GQ, Gregory RI. Selective blockade of
microRNA processing by Lin28. Science. 2008;320:97-100.
Yi R, Pasolli HA, Landthaler M, Hafner M, Ojo T, Sheridan R, Sander C,
O'Carroll D, Stoffel M, Tuschl T, Fuchs E. DGCR8-dependent
microRNA biogenesis is essential for skin development. ProcNatl Acad
Sci USA. 2008;106:498-502.
Wang Y, Baskerville S, Shenoy A, Babiarz JE, Baehner L, Blelloch R.
Embryonic stem cell-specific microRNAs regulate the G1-S transition
and promote rapid proliferation. Nat Genet. 2008;40:1478-1483.
Ikeda S, Kong SW, Lu J, Bisping E, Zhang H, Allen PD, Golub TR,
Pieske B, Pu WT. Altered microRNA expression in human heart disease.
Physiol Genomics. 2007;31:367-373.
52. Liu N, Bezprozvannaya S, Williams AH, Qi X, Richardson JA,
Bassel-Duby R, Olson EN. microRNA-133a regulates cardiomyocyte
proliferation and suppresses smooth muscle gene expression in the heart.
Genes Dev. 2008;22:3242-3254.
53. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene
lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell.
1993;75:843-854.
54. Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller
B, Hayward DC, Ball EE, Degnan B, Muller P, Spring J, Srinivasan A,
Fishman M, Finnerty J, Corbo J, Levine M, Leahy P, Davidson E,
Ruvkun G. Conservation of the sequence and temporal expression of let-7
heterochronic regulatory RNA. Nature. 2000;408:86-89.
55. Reinhart BJ, Slack FJ, Basson M, Pasquinelli AE, Bettinger JC, Rougvie
AE, Horvitz HR, Ruvkun G. The 21-nucleotide let-7 RNA regulates
developmental timing in Caenorhabditis elegans. Nature. 2000;403:
901-906.
56. Decembrini S, Andreazzoli M, Barsacchi G, Cremisi F. Dicer inactivation
causes heterochronic retinogenesis in Xenopus laevis. Int J Dev Biol.
2008;52:1099-1103.
57. Caygill EE, Johnston LA. Temporal regulation of metamorphic processes
in Drosophila by the let-7 and miR- 125 heterochronic microRNAs. Curr
Biol. 2008;18:943-950.
58. Sokol NS, Xu P, Jan YN, Ambros V. Drosophila let-7 microRNA is
required for remodeling of the neuromusculature during metamorphosis.
Genes Dev. 2008;22:1591-1596.
59. Huang X, Lee KJ, Riedel B, Zhang C, Lemanski LF, Walker JW. Thyroid
hormone regulates slow skeletal troponin I gene inactivation in cardiac
troponin I null mouse hearts. J Mol Cell Cardiol. 2000;32:2221-2228.
::::::::::::::::::::
:_
.............
.
...
...
.................
PR
E
S
Molecular Cell
Expanding the MicroRNA Targeting Code:
Functional Sites with Centered Pairing
1
3 6
Chanseok Shin, 1,2,,s,26 Jin-Wu Nam, 1 ,2,3,6 Kyle Kai-How Farh,1,2, ,4, H. Rosaria Chiang,1,2S Alena Shkumatava, ,2,3
and David P. Bartel1. ,S.*
'Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
2
Howard Hughes Medical Institute
3
Department of Biology
4
Division of Health Sciences and Technology
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
5
Department of Agricultural Biotechnology, Seoul National University, Seoul, 151-921, Republic of Korea
6
These authors contributed equally to this work
*Correspondence: dbartel@wi.mit.edu
DOI 10.1016/j.molcel.2010.06.005
SUMMARY
Most metazoan microRNA (miRNA) target sites have
perfect pairing to the seed region, located near the
miRNA 5' end. Although pairing to the 3' region
sometimes supplements seed matches or compensates for mismatches, pairing to the central region
has been known to function only at rare sites that
impart Argonaute-catalyzed mRNA cleavage. Here,
we present "centered sites," a class of miRNA
target sites that lack both perfect seed pairing and
3'-compensatory pairing and instead have 11-12
contiguous Watson-Crick pairs to the center of the
miRNA. Although centered sites can impart mRNA
cleavage in vitro (in elevated Mg2*), in cells they
repress protein output without consequential Argonaute-catalyzed cleavage. Our study also identified
extensively paired sites that are cleavage substrates
in cultured cells and human brain. This expanded
repertoire of cleavage targets and the identification
of the centered site type help explain why central
regions of many miRNAs are evolutionarily
conserved.
INTRODUCTION
MicroRNAs (miRNAs) are a class of -22 nucleotide (nt) RNAs
that direct the posttranscriptional repression of protein-coding
genes (Bartel, 2004). After processing from hairpin precursors,
miRNAs are loaded into Argonaute-containing silencing
complexes, which downregulate mRNA targets through two
distinct modes, either Argonaute-catalyzed cleavage or a second
mode that involves mRNA destabilization and translational
repression, at least in part through poly(A) shortening (Filipowicz
et al., 2008).
Argonaute-catalyzed cleavage of the target strand occurs in
the context of extensive base pairing, at the linkage joining the
.....................
............
.......................
mRNA nucleotides that pair to miRNA positions 10 and 11 (Elbashir et al., 2001b; Hutvigner and Zamore, 2002; Yekta et al.,
2004). In mammals, this slicing activity is catalyzed by Argonaute2 (AGO2), which leaves a 3' hydroxyl on the 5' cleavage
fragment and a 5' monophosphate on the other fragment (Liu
et al., 2004; Meister et al., 2004; Schwarz et al., 2004). Unlike
miRNAs in plants, very few examples of miRNA-dependent
cleavage targets have been reported in mammals (Yekta et al.,
2004; Davis et al., 2005; Jones-Rhoades et al., 2006). Nonetheless, artificially designed small interfering RNAs (siRNAs) that
silence target genes through this mechanism are widely used
reagents, illustrating that in principle, the cleavage mode of
repression can function in many contexts and for many targets
(Elbashir et al., 2001 a).
Sites that confer slicing-independent destabilization and
translational repression typically pair to the 5' region of the
miRNA, centering on miRNA nt 2-7, known as the miRNA seed
(Lewis et al., 2005; Bartel, 2009). Introducing an siRNA/miRNA
or deleting an endogenous miRNA leads to modest yet detectable changes in the output of hundreds of genes containing
seed sites in their 3' UTRs (KrUtzfeldt et al., 2005; Lim et al.,
2005; Giraldez et al., 2006; Grimson et al., 2007; Rodriguez
et al., 2007; Baek et al., 2008; Selbach et al., 2008). Moreover,
most mammalian protein-coding genes are under selection to
maintain pairing to the seed of one or more miRNAs, and thousands of genes have also evolved to specifically avoid pairing
to the seeds of preferentially coexpressed miRNAs (Farh et al.,
2005; Lewis et al., 2005; Stark et al., 2005; Friedman et al.,
2009). These observations illustrate both the broad scope of
seed-type regulation and the widespread influence of this targeting mode on mRNA evolution.
Pairing to the 3' region of the miRNA can supplement seed
pairing to enhance target recognition, or it can even compensate
for a mismatch to the seed; such sites are known as "3'-supplementary sites" and "3'-compensatory sites," respectively
(Bartel, 2009). However, pairing to the 3' region appears to be
consequential for relatively few sites (<10%) (Bartel, 2009). In
principle, pairing to the central region of the miRNA could also
supplement pairing to the other regions of the miRNA, but a
role for such pairing has been demonstrated only for sites that
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 789
-
-
-
W-gw
-
.........................................
UP
R
E
S
A
Molecular Cell
MicroRNA Centered Sites
200
I180
UmiRNA
&mer seed-matched
site
OOpposite
5'-nnnnnnnnnnnnnnACUACCUA-3' mRNA 3'UTR
1111111
miRNA
3'-GGGUUGUUGUACUUUGAUGGAU-5'
4160
'140
e120
Centeredsite
5'-nnnnnnnACAUGAAACUACnnn-3' mRNA
111111llIll
3'-GGGUUGUUGUACUUUGAUGGAU-5'
miRNA
100
Cleavagesite
40
E 20(D
HOXB
mRNA3'UTR
5'-CCCAACAACAUGAAACUGCCUA-3'
i1111111IIl
o11110111
3'-GGGUUGUUGUACUUUGAUGGAU-5' miR-196a
0
1 2 3 4 5 6 7 8 9 101112131415161718
5 position of4-mer
0.
C
rbi rI1 rbI r5ff
HU
0
~-0l
2
4
0.2-
X
011-mer
a Control
-02
-0.3.
3
4
7
6
5
8
5' position of 1 1-mer
Expression fold change (log2)
0.5
o 11-mer
*
0.4"
cO
MControl
0
0.3-
5
d/
0.2
zo0.1 -
o
.
0
7to
T
2-0.1-
x -0.2-0.3-0.4-
3
4
5
6
7
8
-05
-i.0
5' position of 11-mer
0.0
05
1.0
Expression fold change (log2)
G
mIR-1transfected
Mild-type
PLXNA1
PLXNA1
mutant
..R-1
5'-UCCUCAGAUUCACCGC(acgUCUGCGC-3'
5'-UCCUCAGAUUCACCGCGULGCUCUGCGC-3'
5mutant
miR-124
1.5
3'-ACCGUAAGUGGCGCACGGAAU-5'
MR-124 3'-ACCGUAAGUGGCGCACGGAAU-5'
I
-wild-type
sites
transfected
sites
miR-124 transfected
sites
miR-124 transfected
mutantsites
mutant
RAPTCR
RAPTOR
OL
5'-CCCCCAUGGGCACCGCGacgCGCCUGC-3'
5'-CCCCCAUGGGCACCGCGUGCCGCCUGC-3'
L1
00llililli
o
1ill
00
niR-124
3'-ACCGUAAGUGGCGCACGGAAU-5' 3'-ACCGUAAGUGGCGCACGGAAU-5'
miR-124
mutant
VAMP1
VAMP1
0.5
5'-GAGCUUUCUCUUCUUUAguaUUUCUAC-3'
5'-GAGCUUUCUCUiCUUIJACAUUUUCUAC-3'
0 1ilillil
loo
0
illitilliil100
3'-AUGUAUGAAGAAAUGUAAGGU-5'
iR-1
3'-AUGUAUGAAGAAAUGUAAGGU-5'
miR-1
ZNF586
ZNF586nmutant
0
5'-GAAUGCUAGCUUCUUUACAUAAAAGAG-3'
5'-GAAUGCUAGCUUCUUUAguaAAAAGAG-3'
101
ottttttttttt
PLXNA1
RAPTOR
VAMPI
ZNF586
101ott1tttt1
3'-AUGUAUGAAGAAAUGUAAGGU-5'
miR-1
3'-AUGUAUGAAGAAAUGUAAGGU-5'
niR-1
miR-124 targets
miR-1 targets
Figure 1. Centered Sites Regulate Both mRNA Accumulation and Protein Output
(A)Conservation of 4 nt segments of mammalian miRNAs. As schematized in Figure SlA, segments from the mature miRNA (orange) were compared with
opposing segments from the other arm of the hairpin (gray). At each position of the mouse miRNAs, the number of segments perfectly conserved in the
whole-genome alignments of the other 29 species (Blanchette et al., 2004) isplotted.
(B)Spectrum of functional miRNA target sites.
(C)Response of HeLa mRNAs with contiguous perfect pairing to 11-mer sites starting at the indicated positions of 78 transfected mi/siRNAs. Plotted are mean
fold changes for mRNAs with 3' UTR sites to the cognate mVsiRNAs (gray)and the cohorts of chimeric miRNA-like control sequences (white, error bars indicate
standard deviation for ten cohorts). Also plotted isthe mean expression change for mRNAs with a shifted 6-mer 3' UTR site (purple), which includes all mRNAs
with 11 -mer sites starting at position 3. To assess statistical significance, the distribution of log2-fold changes for genes with sites was compared with the
distribution of log 2-fold changes for genes without sites ("p < 0.05; *p < 0.01; K-S test).
(D)In vivo response of mRNAs with 11-mer sites to endogenous miRNAs after loss of all miRNAs. Plotted isthe fold change of zebrafish mRNAs with 11-mer sites
to 21 endogenous miRNAs depleted inthe dicer mutant. Otherwise, as in (C).
790 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
......
......
...............
.
....
..................................
......
..,:,:.....
.....
..........
PR
E
S
Molecular Cell
MicroRNA Centered Sites
statistical significance for determining their efficacy. In order to
systematically analyze these sites, we therefore compiled additional array data from HeLa experiments with similarly transfected miRNAs and siRNAs (Birmingham et al., 2006; Jackson
et al., 2006a, 2006b; Schwarz et al., 2006; Anderson et al.,
2008). To ensure that the transfected mVsiRNAs were loaded
and active within the silencing complex, the pooled data sets
were restricted to those of the 78 HeLa experiments for which
the canonical 8-mer 3' UTR site to the transfected mVsiRNA
was associated with downregulated mRNAs with high statistical
significance (p < 0.0001, K-S test) (Table S1). Testing matches
that did not include the canonical seed match to miRNA positions 2-7 showed that perfect 11-mer matches starting at miRNA
RESULTS
positions 3, 4, and 5 were each significantly associated with
repression (Figure 1C), whereas perfect 10-mer matches,
A Class of miRNA Target Site
The most highly conserved region of metazoan miRNAs is the 5' perfect 9-mer matches, and near-perfect 11 -mer matches (those
region containing the seed (Lewis et al., 2003; Lim et al., 2003), with single mismatches or wobbles) were not significantly assowhich is the region most important for recognizing most targets. ciated with repression (Figures S1C and S1D).
The efficacy of centered sites matching ectopically introduced
The next most highly conserved region spans nt 13-16, which is
the region most important for 3'-supplementary and 3'-compen- miRNAs and siRNAs raised the possibility that such sites might
satory pairing (Grimson et al., 2007). Despite being less also mediate endogenous miRNA targeting. Array results examconserved than other miRNA regions, we noted that the central ining the effects of miRNA loss in zebrafish embryos lacking
region of vertebrate miRNAs is significantly more conserved Dicer provided data on a sufficient number of messages with
than is the opposite arm of the pre-miRNA hairpin (Figures 1A centered sites to enable a systematic analysis of targeting interand S1A). Because both arms participate equally in the pairing actions in vivo. miRNAs present at 24 hr postfertilization, the
required to form the pre-miRNA hairpin, preferential conserva- developmental stage used for mRNA analysis, were identified
tion of the miRNA observed in this region suggested that these by high-throughput sequencing (Table S2). Sites were considcentral nucleotides play a role beyond that of miRNA biogenesis. ered for 21 of these miRNAs for which the canonical 8-mer 3'
One such role would be to aid in target recognition, but among UTR sites were significantly associated with mRNAs derethe previously characterized targeting modes, the central region pressed in the dicer mutant (p < 0.01) (Table S2). As observed
is known to function only for cleavage sites, which seemed too for the ectopic interactions, perfect 11-mer matches starting at
rare to provide the additional selective pressure for conserving miRNA positions 3, 4, and 5 were each associated with signifinucleotide identity at the miRNA central regions. Therefore, we cant repression, although efficacy of sites starting at position 3
searched for another type of site that might explain this preferen- was mostly attributable to overlap with the "shifted 6-mer"
seed match (Figure 1D), which comprises pairing to nt 3-8 (Friedtial conservation.
Examination of array data investigating the response of man et al., 2009). Perfect 10-mer matches, perfect 9-mer
mRNAs after transfecting 11 miRNAs into HeLa cells (Lim matches, and near-perfect 11 -mer matches generally were not
et al., 2005; Grimson et al., 2007) revealed a type of site that associated with significant repression, although for a few of the
was associated with mRNA downregulation (Figure S1 B). This numerous possibilities examined, marginal significance was
site type, which we call the "centered site," was characterized observed (Figures S1 E and S1 F).
When considering both ectopic and endogenous interactions,
by at least 11 nt of contiguous Watson-Crick base pairing to
the center region of the miRNA at either nt 4-14 or 5-15, without contiguous Watson-Crick 3' UTR pairing to the central region of
substantial pairing to either the 5' or the 3' ends of the miRNA. the miRNA, at either nt 4-14 or nt 5-15, was unique among the
Because of the location and extent of their base pairing, these tested possibilities in that it was both consistently associated
sites occupy a position intermediate between seed sites and with mRNA repression and not attributable to overlap with previously described site types. A previous array study had reported
the extensively complementary cleavage sites (Figure 1B).
Because these sites are relatively rare, pooling of data from a handful of siRNA off-targets with similarly long stretches of
multiple miRNA transfections was initially required to achieve contiguous Watson-Crick base pairing, but these sites were
mediate Ago-catalyzed cleavage and not for sites that mediate
destabilization and translation repression.
Here, we describe "centered sites," a class of miRNA target
sites that lack both perfect seed pairing and 3'-compensatory
pairing and instead have 11-12 contiguous Watson-Crick pairs
to miRNA nt 4-15. In the process of characterizing these sites,
we found that Mg2+ concentration profoundly influences both
the specificity and efficacy of miRNA-directed cleavage, and
we performed whole-transcriptome analyses that substantially
add to the number of known instances in which metazoan
miRNAs direct mRNA cleavage.
(E)Reduced levels of HeLa messages with either centered sites or canonical sites to 78 transfected mVsiRNAs. Shown is analysis of microarray data, plotting
cumulative changes of mRNAs with single 3' UTR sites of the indicated types. Canonical sites (8-mer, 7m8, 7A1, and 6-mer) had either 8, 7, or 6 nt matches
centered on the miRNA seed (Bartel, 2009); centered sites had 11 contiguous Watson-Crick pairs to miRNA positions 4-14 or 5-15; control sites were centered
sites to the chimeric miRNA-like control sequences (combining results for all ten cohorts).
(F)Efficacy of endogenous centered sites invivo. Shown isanalysis of microarray data, plotting cumulative changes of zebrafish mRNAs with single 3' UTR sites
to 21 miRNAs depleted inthe dicer mutant. Otherwise, as in (E).
(G)miRNA-mediated repression at centered sites. Shown isthe fold repression of luciferase reporter genes fused to 3' UTR fragments of the indicated genes
with the indicated sites or mutant sites. Plotted are the geometric means, normalized to the geometric means observed for reporters with mutant sites.
Error bars represent the second largest and the second smallest values among 12 replicates (from four independent experiments). Statistical significance is
indicted (**, pvalue < 0.001, Wilcoxon rank-sum test). See also Figure S1 and Tables S1-S4.
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 791
1111-
t . ..
..
..
....
......
't "
_.-
_ ..................................
PR
E
SU
A
K89
Molecular Cell
MicroRNA Centered Sites
5'-GAUAAAAAUUCAGUCUGAUAACCUCAAA-3'
FL9
II lllillllIll
I
Figure
5'-GUUGGCCCACUAGUCUGAUAAGAAGUCU-3'
Il
011111111111
2. miRNA-Directed
Cleavage
at
Centered Sites
(A)Centered sites for miR-21 or let-7g within 3'
UTR fragments of the indicated mRNAs tested in
assays. K89, KIAA1189; FL9,
GSTM3
5'-AGAAGUUUUUCAGUCUGAUAACUAUUGA-3'
NFI
5'-CGAAAAUGGCAAACUACUACUACUACU-3' cleavage
011l11111l110 1
0 IllIllllIll
0
FLJ40919; NFI, NFIA (sequences provided in
3'-UGACAUGUUUGAUGAUGGAGU-5' let-7g
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
Supplemental Information).
(B) Cleavage of target fragments directed by
endogenous miRNAs invitro. 5'-Cap-radiolabeled
K89-21as
FL9-21as
FL9
NFI-let-7g
as
NFl
K89-21as
K89
GSTM3
mRNA fragments were incubated in HeLa cyto0 3 150 3 15 0 3 1520 0 7.51520 0 3 15 0 3 15 0 3 15 0 3 15min
plasmic S100 extract for the indicated time and
analyzed on denaturing gels. As a control, fragments modified to be-fully complementary to the
cognate miRNA, designated as antisense (as)
substrates, were tested and analyzed in parallel.
Whereas most fragments were cleaved predominantly at the expected site, NFA was cleaved at
two positions (*, Figure S2), as is sometimes
observed invitro (Martinez and Tuschl, 2004).
(C)miR-21 -directed cleavage of GSTM3 mRNA in
HeLa cells. 5'-RACE using primers specific for
GSTM3 mRNA was performed on mRNA isolated
from HeLa cells treated with two siRNAs targeting
GSTM3 5'-AAGUUUUUCAGUCUGAUAACUAUUGAUAUAAUUUCCA-3'
XRN1. Seven of nine sequenced clones mapped to
0 lllllllill 0
the position expected for miR-21 -directed
miR-21
3'-AGUUGUAGUCAGACUAUUCGAU-5'
cleavage. The other two clones mapped 52 nt
16
5
downstream.
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
3'-AGUUGUAGUCAGACUAUUCGAU-5'
miR-21
offset further toward the 3' end of the miRNA, at nt 6-16 or 7-17
(Jackson et al., 2003), a region not significantly associated with
targeting when examined using our larger data sets. Effective
miRNA target-prediction algorithms rely heavily on perfect pairing to the seed region and thus miss this additional class of
targets (Bartel, 2009).
The transfected mVsiRNAs had an average of 11 and a median
of eight centered sites in 3' UTRs of human mRNAs. About
one-quarter of the mRNAs with a centered site lacked conventional seed sites to the transfected RNA and were sufficiently
expressed in HeLa such that changes could be accurately
measured on the arrays. Analysis of cumulative distributions of
log-fold changes indicated that >20% of these mRNAs responded to the transfected mi/siRNAs in a manner attributable
to the site, with a lower bound for site efficacy, resembling that
of canonical 7-mer sites (Figure 1E). Likewise, >30% of the
endogenous centered sites analyzed appeared to mediate
repression in zebrafish embryos (Figure 1F).
To examine whether centered sites also function in other
animals, we analyzed mRNA array data sets monitoring the
impact of knocking down proteins required for Drosophila
miRNA biogenesis (Kadener et al., 2009). Following either Drosha or Dicer1 knockdown in Drosophila S2 cells, messages
with 3' UTR centered sites matching the endogenous miRNAs
had a significant propensity to be derepressed (Figures S1G
and S1 H, p = 0.00045 and 0.027, respectively, for Drosha and
Dicer1 knockdown data sets; Tables S3 and S4).
To confirm that centered sites can be directly targeted by
miRNAs, luciferase reporter constructs and their mutant counterparts with disrupted pairing were prepared and tested in
both HeLa cells and S2 cells (Figures 1G and S11). For three of
the four UTR fragments tested, the sites reduced protein output
in a manner that depended on the presence of both the wild-type
792 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
site and the cognate miRNA. Taken together, the reporter and
microarray results suggested that the centered site is a miRNA
target site capable of downregulation comparable with that
observed for single 7 nt seed sites. Although they are much
less abundant than both seed-matched sites and sites with
3'-supplementary pairing, centered sites are present in numbers
similar to 3'-compensatory sites and could help explain the
preferential conservation observed in the central region of
most miRNAs.
miRNA-Directed Cleavage Detected at Centered Sites
Because of pairing to the central region of the miRNA, centered
sites might be subject to AGO2-dependent cleavage similar to
that occurring for known cleavage sites of plants and animals,
which are more extensively paired (Yekta et al., 2004; Davis
et al., 2005; Jones-Rhoades et al., 2006). To test this possibility,
we employed an in vitro cleavage assay using S100 extract
prepared from HeLa cells (Martinez and Tuschl, 2004; Shin,
2008), focusing on mRNA fragments containing centered sites
for miR-21 or let-7g miRNA, which are abundant in HeLa cells
(Figure 2A and Table S10). Cleavage was observed at the position expected for AGO2-catalyzed cleavage of the centered sites
(Figure 2B).
To examine whether cleavage was also occurring in the cells,
we tested for miR-21 -directed cleavage of GSTM3 mRNA
(moderately expressed in HeLa cells) using RNA ligase-mediated
rapid amplification of cDNA 5' ends (5'-RACE). By directly
cloning and sequencing the 5' end of the 3' cleavage product,
this assay can be used to validate miRNA-directed cleavage
(Llave et al., 2002; Yekta et al., 2004). To increase the sensitivity
of the assay, XRN1, the 5' -- 3' exonuclease responsible for
degrading the 3' cleavage product (Souret et al., 2004; Orban
and Izaurralde, 2005), was knocked down (Alemdn et al., 2007).
.
.. ...........
..
................................................
.....
.
. ...
......
......
...........
::......................
...
............
..................
::_,
PR
E
S
Molecular Cell
MicroRNA Centered Sites
5'-RACE fragments within -50 bp of the expected cleavage site
were cloned for sequencing. The 5' ends for seven of nine
sequenced clones precisely matched that expected for cleavage
at the centered site inthe cell (Figure 2C). These results indicated
that for an endogenous mRNA targeted at a centered site by an
endogenous miRNA, at least some transcripts underwent
AGO2-catalyzed cleavage in the cell.
Pairing Requirements for Cleavage Are Sensitive to
Mg 2* Concentration
To understand the specificity of cleavage at centered sites,
miR-21 recognition of the K89 mRNA fragment (Figure 2) was
examined further. The K89 RNA sequence, which was perfectly
complementary to positions 5-16 of miR-21, was systematically
mutated at each nucleotide corresponding to miR-21 positions
1-16, substituting an A:C mismatch or a G:U wobble for each
Watson-Crick match and substituting a Watson-Crick match
for each of the two mismatches (Figure 3A). When using
5.8 mM Mg2+, as in Figure 2B, or 2.2 mM Mg2+, both of which
were within the ~2-6 mM range used previously to study
in vitro cleavage (Martinez and Tuschl, 2004; Gregory et al.,
2005; Maniataki and Mourelatos, 2005; Miyoshi et al., 2005;
Rand et al., 2005; Ameres et al., 2007; Wang et al., 2009a,
2009b), cleavage was retained after changing positions outside
of the centered site and was reduced after changing most positions within the centered site, although wobble pairs were tolerated at positions 6 and 8 (Figure 36, top two panels).
Mg2+ is essential for the in vitro cleavage reaction (Schwarz
et al., 2004) but also has a striking effect on the relative stabilities
of matched and mismatched RNA duplexes (Serra et al., 2002).
Indeed, lowering the Mg2+ concentration increases the fidelity
of RNA 2'-0-methylation, another reaction specified by WatsonCrick pairing between small guide RNAs and their targets
(Appel and Maxwell, 2007). We found that lowering Mg2* gave
maximal target RNA cleavage specificity and efficacy for
substrates that were extensively paired to miR-21, whereas
higher Mg2+ was optimal for more weakly pairing substrates
(Figure 36). For example, the cleavage of K89-21 as RNA, which
is fully paired to the miRNA, was the most efficient at 0.3 mM
Mg2*, whereas cleavage of the wild-type K89 substrate containing the centered site with only 12 contiguous pairs was undetectable at 0.3 mM Mg2+ and most efficient at 5.8 mM Mg2+; K89m4GC, which had an intermediate number of contiguous pairs,
had an intermediate Mg2+ optimum (Figures 3B and 3C).
The free Mg2+ levels in the cytoplasm of various cells and
tissues is less than 1 mM (Gunther, 2006), a concentration at
which we found that efficient cleavage required pairing more
extensive than that of typical centered sites (Figure 36). Nonetheless, some cleavage at the centered site was detected at
physiological Mg2* concentrations (Figure 36) (0.75 mM Mg24),
which explained why the 5'-RACE assay yielded fragments diagnostic of miR-21-directed cleavage in the cell (Figure 2C).
Whole-Transcriptome Analysis of miRNA-Directed
Cleavage
The poor efficacy of cleavage at the centered site at physiological Mg2+ concentration called into question whether miRNAdirected cleavage plays a consequential role during repression
.......
mediated by centered sites and suggested that most repression
at centered sites might resemble the destabilization and translational repression observed for most seed-matched targets.
To better characterize the scope of miRNA-directed cleavage
in mammals and to examine the extent to which cleavage at
centered sites is relevant to target gene regulation in vivo, we
applied degradome sequencing to mammalian cells. Degradome sequencing generates short sequence tags representing
the 5' ends of uncapped mRNA fragments found in the cell
(Addo-Quaye et al., 2008; German et al., 2008). Although these
fragments are predominantly 5' -+ 3' exonuclease degradation
intermediates, they also include 3' fragments of Argonaute-catalyzed mRNA cleavage in sufficient numbers to enable empirical
detection of endogenous cleavage targets of plant miRNAs
and siRNAs (Addo-Quaye et al., 2008; German et al., 2008).
Inspired by this success in plants and the ability to detect miR21-directed cleavage by 5'-RACE, we applied the method to
HeLa cells following XRN1 knockdown by RNAj (Figure S3A).
Sequencing yielded 14,323,668 tags mapping to the human
genome, with a diversity of 2,069,190 unique tag sequences.
Of the total tags, 61.2% came from protein-coding genes
and represented 36,806 out of 46,319 ENSEMBL mRNAs (Figure 4A). The tags showed a relatively uniform distribution across
the mRNAs, with averystrong peak atthe 5' terminus (Figure4B).
About 30% of tags were not classified because they did not map
to mature annotated RNAs (Figure 4A). Many of these were from
introns and processing fragments from pri-miRNAs, mitochondrial tRNAs, ribosomal RNAs, and snRNAs, illustrating how
unstable 3' products of endonucleases can be detected in mammalian cells by using degradome sequencing (Tables S5 and S6).
To determine if miRNA centered sites were associated with
cleavage at the expected position within the mRNA 3' UTR, we
searched for centered matches to 50 distinct, conserved
miRNAs most highly expressed in HeLa cells and tabulated the
frequency of degradome tags corresponding to mRNA cleavage
at the tenth position of these sites. Tags corresponding to
cleavage at the expected position were found much more
frequently for authentic miRNA:site pairs than for negativecontrol pairs (Figure 4C). However, when we excluded miR196a, miR-151, and miR-28, which target several extensively
paired sites, the signal above background was greatly reduced,
suggesting that most centered sites lacked the complementarity
required for robust miRNA-directed cleavage (Figure 4C). The
abundance of degradome tags mapping to the expected
cleavage sites of the siRNAs targeting XRN1 illustrated that the
method can identify tags diagnostic of AGO2-catalyzed
cleavage in human cells (Figure S3). These results supported
those from the invitro cleavage assays (Figure 36) in suggesting
that under physiological Mg2+ conditions, the mRNA downregulation mediated by centered sites is usually accompanied by very
little AGO2-catalyzed cleavage.
Genome-wide Search for miRNA:site Duplexes
with High Complementarity
Our observation of significant cleavage at the small subset of
centered sites with unusually extensive complementarity to the
miRNA indicated that miRNA-directed cleavage at extensively
paired sites was more frequent in animals than had been
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 793
E
SU
PR
A
K89
Molecular Cell
MicroRNA Centered Sites
K89-21as
5'-CUCUUUUUCACUGUAGAAUAAUGUGGAAAUAACCCUAGAUAAAAAUUCAGUCUGAUAACCUCAAAUCAAAAAGCUUUA-3'
i
5'-GAUUCAACAUCAGUCUGAUAAGCUAAAA-3'
li'lIIAlAG
Giliilli
3
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
IiIlIIIIIII
i1
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-ml
5' -GAUAAAAAUUCAGUCUGAUAACCUAAAA-3'
Il IIiIlIIIII
II
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm2
0
5'-GAUAAAAAUUCAGUCUGAUAACCCCAAA-3'
|1 lllillilIll
I
3' -AGUUGUAGUCAGACUAUUCGAU-5' miR-21
0
O06
K89-mm3GU
5'-GAUAAAAAUUCAGUCUGAUAACUUCAAA-3'
|1
W 0
l1ilIllll 01
E E E4H
5' -GAUAAAAAUUCAGUCUGAUAAACUCAAA-3'
il
11
IiIIIlIIIIII
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
E
il I ilIlIlI
K89-mm5GU
5.8 mM
20
[MgCl 2]
10
Of
-l
Il
2.2mM
-- ... -
20
[MgCl
2]
IlIllllillo II
----50
50
l
5'-GAUAAAAAUUCAGUCUGAUAGCCUCAAA-3'
00
30
5'-GAUAAAAAUUCAGUCUGAUAACCUCAAA-3'
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
03
:
EEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEE
0
K89-m4GC
M
Er
3'-ACUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm4
15 min
0
15 min
10
loop-~
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm6GU
5'-GAUAAAAAUUCAGUCUGAUGACCUCAAA-3'
|I
IIIIIlIIOI 11
miR-21
3' -AGUUGUAGUCAGACUAUUCGAU-5'
K89-mm7
20
10
5'-GAUAAAAAUUCAGUCUGACAACCUCAAA-3'
||
0
40
0.75 mM
111111111
il I1
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm8GU
Il tiIIIIIOIII 11
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm9
32
30
5'-GAUAAAAAUUCAGUCUGGUAACCUCAAA-3'
0.5 mM
[MgCl2]
20
10
5'-GAUAAAAAUUCAGUCUAAUAACCUCAAA-3'
lill
li
Il 11111ll
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mmlO
5'-GAUAAAAAUUCAGUCCGAUAACCUCAAA-3'
il 1il111
11ll1||
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
30
0.3 mM
[MgC 2]
20
0
K89-mmliGU 5'-GAUAAAAAUUCAGUUUGAUAACCUCAAA-3'
10
*
0
- - - -
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
189-mm12
5'-GAUAAAAAUUCAGCCUGAUAACCUCAAA-3'
||
il lill1111111
5-16match, K89
1-21match, K89-21as
2-16 match, K89-m4GC
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm13
5'-GAUAAAAAUUCAAUCUGAUAACCUCAAA-3'
|I
lIl11ll11li
il
40
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mml4GU 5'-GAUAAAAAUUCGGUCUGAUAACCUCAAA-3'
||
11011111111
il
3'-AGUUGUAGUCAGACUAUUCGAU-5'
miR-21
K89-mm15GU 5'-GAUAAAAAUUUAGUCUGAUAACCUCAAA-3'
Il lOlllll1ll
0
2
U-
||
3'-AGUUGUAGUCAGACUAUUCGAU-5' miR-21
K89-mm16
5'-GAUAAAAAUCCAGUCUGAUAACCUCAAA-3'
I1 IIIIIlIIIII
II
0.25
3' -AGUUGUAGUCAGACUAUUCGAU-5'
miR-21
0.5
1
mM
[Mg2+],
2
Figure 3. Pairing Requirements for Cleavage at a Centered Site and the Influence of Mg . Concentration
(A)Sequences used to examine pairing requirements for cleavage. Sequences were derivatives of the K89 3' UTR fragment, a miR-21 target. K89-21 as, fully
complementary version of K89; K89-ml, matched to position 1 of miR-21; k89-mm2, A:C mismatch at position 2; K89-mm3GU, G:U wobble pairing position 3.
2
(B)The influence of Mg + on cleavage specificity and efficacy in vitro. Reactions were performed as in Figure 2B, using the substrates depicted in (A), with the
indicated Mg2+concentrations. Quantification of the fraction cleaved isplotted on the right.
2
(C)Plot of the effect of Mg *on cleavage efficacy for the model centered site (K89), a more extensively paired site (K89-m4GC), and a fully paired site (K89-21 as).
appreciated. This insight prompted a systematic examination of
mammalian sites with extensive miRNA complementarity of the
type that would mediate cleavage in plants but might not have
794 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
fulfilled our criteria for classification as centered sites because
they either had perfect seed pairing or lacked 11 contiguous
pairs within positions 4-15.
...............
..........
.
..
......
. ...
PR
E
S
Molecular Cell
MicroRNA Centered Sites
0 mRNA
MAntisense
0 MTRNAs
o Pseudogene
* rRNA
STransposon
0.6%
1
0.9%
OtherncRNA
unmiNA
1.2%
0 Unclassified
3.6%1.4%
Relative position on mRNAs
- -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Relative cleavage sites
-8
-30-28-26-24-22-20-18-16-14-12-10
Figure 4. Rare miRNA-Directed Cleavage Detected by Degradome
Sequencing
(A)Mapping of HeLa degradome sequencing tags to the transcriptome.
Antisense corresponds to tags mapping antisense to annotated mRNAs,
noncoding RNAs (ncRNAs), pseudogenes, and mitochondrial (MT) RNAs.
Coverage indicates the fraction ofthe 46,319 annotated mRNAs that were represented by at least one tag. Unclassified tags mapped primarily to introns and
3' flanking regions of ncRNAs (Table S5).
(B)The distribution of degradome tags along the length of the mature mRNAs.
mRNAs were split along their length into 100 bins, and tag 5' ends were tallied
for each bin. Shown are the aggregate tallies for all mRNAs.
(C)Search for evidence of cleavage at centered sites. Plotted are the numbers
of interactions with evidence for cleavage at the expected position (0)or at
positions either upstream (negative values) or downstream (positive values).
Interactions were counted if a centered site matching a conserved miRNA
expressed in HeLa had a tag supporting cleavage at the indicated position
(blue). Analysis excluding miR-196a, miR-28, and miR-151-5p isalso shown
(red). As a control, the analysis was repeated using ten cohorts of artificial
tags, generated by randomly positioning tags on mRNAs (gray; error bars,
standard deviation). See also Tables S5 and S6.
To search for potential cleavage sites in mammals, we used
a scoring rubric similar to those that successfully identify miRNA
target sites in plants (Figure 5A) (Jones-Rhoades and Bartel,
2004; Allen et al., 2005). The search yielded 106 predicted
miRNA:site duplexes scoring 2.0 (Figure 5B), including 47 in
annotated ORFs, 16 in 5' UTRs, and 43 in 3' UTRs (Table S7).
At the mid-to-higher penalty scores, sites were no more abundant than expected by chance, but at scores <3.0, sites were
at least 1.5-fold enriched compared to the control sets of
chimeric miRNAs constructed so as to preserve the seeds as
well as the overall dinucleotide and trinucleotide compositions
of authentic miRNAs (Figure 5C). Repeating the analyses with
annotated murine miRNAs yielded analogous results (Figures
S4C-S4E and Table S8).
The higher abundance of extensive matches to miRNAs
compared to that of controls might indicate biological function.
However, eukaryotic genomes, complex tapestries containing
remnants of innumerable duplications and repetitive elements,
are far from random, and thus this abundance might simply be
a consequence of the miRNAs and sites sharing common
ancestry. To distinguish between these possibilities, we examined the conservation of orthologous sites in five mammalian
species, as assessed using a conservation-alignment (CA) score
(Figure 5D). When applied to sites for distinct miRNAs conserved
throughout mammals, 17 miRNA:site duplexes had CA scores
3.0 (Figure 5E), most of which were unlikely to be conserved
by chance (Figure 5F). Four of the 17 top-scoring sites were
miR-151-5p targets (Table S9).
Cleavage at Highly Complementary Predicted Duplexes
Having found evidence that the most extensively paired sites
were more abundant and more conserved than expected by
chance, we retumed to the degradome sequencing data to
search for evidence that these sites were cleaved in the cell.
Because the degradome sequencing data included intermediates of normal mRNA decay, steps were taken to distinguish
AGO2 cleavage products from other decay intermediates.
To do this, we considered the tag possession ratio (TPR), which
represented the proportion of predicted miRNA:site duplexes
that were represented by tags at the expected cleavage site
(Figure 6A). When focusing on the miRNAs and mRNAs
expressed in HeLa, miRNA:site duplexes with alignment penalty
scores 2.5 possessed significantly more cleavage tags at the
expected cleavage site than did control duplexes (Fisher's exact
test, p = 1.1 x 10-04) (Figure 6B and Table S1 1). Even after
excluding tags mapping to multiple loci, this TPR difference
remained both substantial and significant (p = 2.6 x 10~4
(Figure 6C and Table S1 1). miRNA-directed cleavage in Arabidopsis sometimes occurs at ±1 nt from the expected cleavage
site (Addo-Quaye et al., 2008). When applying a window of ±1 nt,
there was no improvement in the TPR of expressed miRNA:site
pairs (Figure S5A and Table S11). As an added control, we
repeated the analysis for miRNAs that were not expressed in
HeLa cells and found that these miRNAs performed similarly to
the chimeric miRNA controls (p = 1.0) and significantly worse
than the miRNAs expressed in HeLa cells (p = 5.3 x 10-5).
These results strongly indicated that for miRNA:site pairs with
favorable alignment scores ( 2.5), most tags at the expected
cleavage site did not arise from background 5' -+ 3' degradation
but instead were the consequence of miRNA-directed mRNA
cleavage.
miRNA-Directed Cleavage in HeLa Cells
and Human Brain
Using an alignment penalty score of 2.5, a threshold at which the
cumulative TPR difference between signal and background was
most significant in HeLa data (Table S11), we found eight
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 795
UP
R
E
S
Molecular Cell
MicroRNA Centered Sites
D
Human
5'p AG-UUAACCUGGAAUACUUG 3'p
| 011-|||I |-|
| |||
5 p
3'p UCGGAUAGGACCUAAUGAACUU
5'p
Mouse
5'p AG-UUAACCUGGAAUAGUUG 3'p
Core region (x2)
4.5
E
E8
|1-0||-||||||-||-11|1
mRNA 5'p AG -UUAACCUGGAAUACUUG
miRNA
3'p
3'p UCGGAUAGGACCUAAUGAACUU 5'p
10
2
1 0.5 1
pos. 2-21
E
3'p UCGGAUAGGACCUAAUGAACUU
5'p
Pig
5'p
3'p UCGGAUAGGACCUAAUGAACUU
a
O
6.5
1000
7000
Ig 6000
25000
4348
600
4000
=3000
1912f
2000
C
/
5'p AG UUAACCUGGAAUAGUGO 3'p
E
0
. E
6.5
C M
c)
5 16
51 521
M-
0 1 2 3 4
5
7
HHHflrir.~
8 400
18
c 200
9 10 11 12 13 14 15
Alignment penalty score
0
0
1 3 13 2 6
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
CAscore
f
lfl
12
10
8
6
4
2
01
512
3
4
5 6 7 8 9 10 11 12 13 14 15
Alignment penalty score
534
5 6 7 8 9 10 11 12 13 14 15
CAscore
Figure 5. Enrichment and Conservation of miRNA:site Duplexes with Extensive Complementarity
(A)Illustration of the alignment penalty score, used to judge the quality of pairing to miRNAs. Pairing to miRNAnt2-21 was considered, assigning a2-point penalty
for each mismatch or alignment insertion/deletion (indel) within the miRNA core (nt 2-13) (Mallory et al., 2004), a 1-point penalty for each mismatch or insertion/
deletion outside the core or each G:U wobble within the core, and a 0.5-point penalty for each G:U wobble outside the core. An additional 1-point penalty was
assigned to sites lacking an A across from miRNA nt 1 (Lewis et al., 2005).
(B)Distribution of scores for potential miRNA:site duplexes with at least seven consecutive base pairs and 13 base pairs in total. Sites were considered for the
620 distinct human miRNA/miRNA*s annotated inmiRBase, version 11.0, excluding four miRNAs that paired to multiple repeat loci, skewing the distribution to
the left (Figures S4A and S4B).
(C)Analysis of site enrichment. To estimate the signal-to-background ratio for each score bin, the number of miRNA:site duplexeswas compared with the number
of miRNA:site duplexes found when using chimeric control miRNAs (error bars, standard deviation for ten chimeric miRNA cohort sets).
(D)Illustration of the conservation alignment (CA)score, used to identify conserved miRNA:site pairs. Alignment penalty scores were considered for human sites
aligned in orthologous genomic regions of mouse, rat, dog, horse, and pig, with the second highest (second worst) among the six assigned as the CA score.
(E)Distribution of CA scores for miRNA:site duplexes. Sites were considered for 165 distinct miRNAs conserved in mammals.
(F)Analysis of preferential conservation of extensively paired sites. To estimate the signal-to-background ratioforeach score bin, thefraction ofmiRNA:site duplexes
that were conserved was compared with the fraction of analogous duplexes that were conserved when using chimeric control miRNAs, accounting for the lower
abundance of matches to control sequences (error bars, standard deviation for ten chimeric miRNA cohort sets). See also Figure S4 and Tables S7-S9.
miRNA-directed cleavage targets with tags precisely at the expected cleavage site (Table 1 and Figure S5B). All eight cleavage
sites were in 3'UTRs, and half were conserved inother mammals
(Table 1 and Figure S5B). Four of the pairs involved miR-151-5p
(Figures 6E-6G and Table 1). miR-196a and its cleavage target
HOXB8 are both known to be moderately expressed in HeLa
cells (Lim et al., 2005), and as expected, HOXB8 was among
the eight (Figure 6H).
To extend our results beyond cells in culture, we performed
degradome sequencing using poly(A)-selected RNA from whole
human brain. Sequencing yielded 9,240,114 reads mapping to
the human genome, with a diversity of 2,360,502 unique tag
sequences. miRNAs expressed in human brain tissues were
found by small-RNA sequencing (Table S12). As in HeLa cells,
we found a statistical association between the miRNA:site pairs
and cleavage tags for miRNAs and mRNAs expressed in brain
796 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
(Figures 6D and S5D). For pairs with alignment score 3.0, the
TPR was significantly higher than for that of the controls
(p = 0.008 and p = 0.030, nonexpressed and chimeric controls,
respectively) (Table S11). Statistical significance was retained
when also including tags mapping 1 nt downstream of the
expected cleavage site as diagnostic of cleavage (p = 0.011
and p = 0.013, nonexpressed and chimeric controls, respectively) (Table S11), perhaps because some 5' -+ 3' trimming
occurred in the animal, where we could not knock down XRN1
activity. Eleven sites with scores 3.0 had tags suggestive of
miRNA-directed cleavage (Table S13) at the expected position
(Table 1), and two had tags suggestive of cleavage at position -1
(Figure S5E). Three of the 13 matched miR-151-5p and included
N4BP1, which was also identified in HeLa cells. FRS2, a
proposed target of miR-182, was also identified in HeLa cells.
Four of the miRNA:site pairs identified in brain appeared
...
....
nnnn: - ...
::.:::::::::::::
:::::::::I I
-
-:-v:':-' :::-:::::::AW
- :: :::::::r:r:-::
PR
E
S
Molecular Cell
MicroRNA Centered Sites
A
TAB
mR NA
miR-151-5p
5.
3
score: 5
p score:1.5
Expected cleavage site
3
score 21 Y2
1
Tag
No tag
4
Smir-151 hairpin
Expected cleavage site
5
6f 7
8
9
Conservation
...
Repeat
1
L2 repeat (-) L2 repeat (+)
B
F
-
0.45
0.35
0.05
51 2
3
4
5
6
7
8
9
10
11 12 13 14 15
Alignment
penaltyscore
C
0.45-
0
0.25-
$
tp
0
1500
Positionon mRNA
N4BPI vs niR-151-5p
6401
0.05-
40-
2
4
D
5
6
7
9 10 11 12 13 14 15
score
Alignmentpenalty
1000
2000
3000
4000
5000
6000
7000
Position on mRNA
0.20
1000
500
-2
0.15-0.5
4] LYPD3vsnmi-151-5p
cleavage site(exdudingmultiplelodtags)
Tagsat expected
0.35.
0
ATPAF1
0.15
-0.05
I
LYPD3
N4BPI
0.25
3
&
miR-151-5p
Tagsat expected
cleavagesite
mutipleloc tags)
cleavagesite(excluding
atexpected
Tags
0.15-
0.10.05
0.
F-0.05
S1 2
3
500
4
5
6
7
8
9
10
12 13 14 15
1000
1500
Positionon rmRNA
penaltyscore
AJignment
Figure 6. Enrichment of Degradome Tags at Sites of Expected miRNA-Directed mRNA Cleavage
(A)The tag possession ratio (TPR), used to search for evidence of miRNA-directed cleavage. At each alignment penalty score, the number of miRNA:site duplexes
that had at least one tag with its 5'-terminal nucleotide mapping to the expected site of cleavage (across from miRNA position 10) was tallied, as was the number
duplexes that lacked a tag indicative of cleavage. The TPR was calculated as the number of duplexes with a tag divided by the sum of all duplexes.
(B)Enrichment of tags at the expected sites. TPR values for miRNAs expressed inHeLa (blue) (Table S1 0) were compared to values for miRNAs not expressed in
HeLa (red) and values for ten cohorts of chimeric control miRNAs (gray error bar, the standard deviation).
(C)Enrichment of tags at expected site after excluding tags mapping to multiple loci.
(D)Enrichment of tags at the expected sites, after omitting tags mapping to multiple loci, in human brain. TPR values for miRNAs expressed in human brain
(blue line) (Table S1 2)were compared to values for miRNAs not expressed inhuman brain (red) and values for ten cohorts of chimeric control miRNAs (gray; error
bar, the standard deviation).
(E)The mir-151 hairpin and its cleavage targets. Schematic depicts the mir-151 hairpin, the positions of the two ancestral L2 LINE repeats that gave rise to the
hairpin, and the region of high conservation insequenced mammals (Rhead et al., 2010). Once processed from the hairpin, the mature miR-1 51-5p pairs to and
directs cleavage of mRNAs in HeLa cells.
(F-H) The distribution of degradome tags along the length of the indicated mRNAs (omitting tags mapping to >10 genomic loci). The red peaks are cleavage tags
corresponding to cleavage at the site expected forthe indicated miRNA. Shown are results from HeLa cells. Similar graphs are provided forthe other mRNAs with
See also Figure S5 and Table S4.
evidence of miRNA-directed cleavage in HeLa cells (Figure S5B) and in human brain (Figure 5SE).
conserved in other mammals (CA
Figure S5E).
5 3.0) (Table 1 and
DISCUSSION
Centered Sites
We present centered sites as a type of miRNA target site.
Centered sites contain at least 11 contiguous nucleotides that
pair to a miRNA at positions 4-14 or 5-15, a pairing pattern
distinct from that of most 3'-compensatory sites and seed sites.
However, because a centered site might include additional
nucleotide pairing on either side and a 3'-compensatory site
might have additional pairing extending into the miRNA central
region, there is potential overlap between a few extended
centered sites and a few 3'-compensatory sites. Similarly,
a seed site might include 3'-supplementary pairing extending
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 797
UP
R
E
S
Molecular Cell
MicroRNA Centered Sites
Table 1. mRNAs with Degradome Tags at the Expected Cleavage Site
miRNA
miRNA Reads
mRNA
Location of Site
Score
Cleavage Tags
Degradome Fraction (%)
Conserved
HeLa Cell
1
1
50.0
Noa
miR-196a
3682
HOXB8
3' UTR
0.7
No
34
IGFBP4
3' UTR, LINE L2
2
1
miR-545*
miR-28-5p
1829
LYPD3
3' UTR, LINE L2
2.5
0.75
3.4
No
14.1
Yes
2526
N4BP1
3' UTR, LINE L2
2
55
miR-151-5p
miR-151-5p
2526
MPL
3' UTR, LINE L2
1.5
0.5
6.9
No
No
3' UTR, LINE L2
0
4
17.9
miR-151-5p
2526
LYPD3
miR-151-5p
2526
ATPAF1
3' UTR, LINE L2
1
1
1.4
Yes
miR-182
624
FRS2
3' UTR
2.5
1
0.7
Yes
Human Brain
miR-28-5p
2544
MDGA1
3' UTR, LINE L2
3
1
0.1
No
miR-151-5p
33,007
MDGA1
3' UTR, LINE L2
3
9
0.9
No
miR-151-5p
33,007
N4BP1
3' UTR, LINE L2
2
6
2.1
Yes
miR-873
2033
MAN2C1
ORF
2.5
1
0.2
Yes
miR-330-5p
744
FAM62C
ORF
3
2
1.3
No
No
2.5
1
1.2
miR-95
523
EGLN3
3' UTR
miR-182
115
FRS2
3'UTR
2.5
2
1.4
Yes
miR-877
41
SPTBN1
ORF
3
2
0.1
No
0.8
No
1014
PMVK
5' UTR
3
2
miR-185
0.4
Yes
593
DCTN4
ORF
2
1
miR-383
0.3
No
EFTUD2
ORF
2.5
1
miR-598
8131
Listed are miRNA:site pairs with alignment penalty scores (Score) for which the TPR (counting only tags mapping precisely to the expected cleavage
site) significantly exceeded background ( 2.5 in HeLa cells and 3.0 in brain) (Fable S11). The expression of each miRNA is indicated by its miRNA
reads in a high-throughput sequencing experiment (Tables S1 0 and S1 2). Cleavage tags were those tags mapping precisely to the expected site of
cleavage and were normalized by the number of times they mapped to the genome. For each mRNA, the fraction of degradome tags that were
cleavage tags is indicated (Degradome fraction). Sites with CA 53.0 are categorized as conserved (Figures S5B and SSE).
aAlthough miR-1 96a:HOXB8 was not classified as conserved using our CA score because the site is missing in pig and horse, this pairing is known to
be conserved in more distant lineages, including frog and fish (Yekta et al., 2004).
into the miRNA central region, which creates potential overlap
between a few extended centered sites and a few 3'-supplemen-
tary sites. However, such overlap with previously known site
types is very rare. For example, a search of annotated human
3' UTRs revealed that for most human miRNAs, no seedmatched sites extend into centered sites, i.e., most human
miRNAs have no 3' UTR match with contiguous Watson-Crick
pairing to nt 2-14. Furthermore, conservation analysis and array
data show that seed-type targets prefer to acquire supplemental
pairing at positions 13-16 rather than extending pairing through
nt 9-12 (Grimson et al., 2007).
The reason that centered sites had not been described previously can be explained by their relatively low abundance, which
resembles that of 3'-compensatory sites and is far lower than
that of seed-matched sites. Although no more effective than
7 nt seed-matched sites, centered sites are 4 nt longer, leading
to an informational complexity -250-fold (~44-fold) greater than
that of 7 nt sites and a correspondingly increased difficulty for
their emergence and retention during evolution. The rarity of
centered sites hampers statistical assessment of whether they
are subject to evolutionary conservation. Nonetheless, the
conserved miRNAs of mammals each match an average of 13
centered sites in human 3' UTRs (Figure S1J), and based on
our zebrafish analyses, we estimate that on average about two
798 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
sites per miRNA both reside in messages coexpressed with
the miRNA and mediate repression. The presence of even
a few beneficial interactions (species-specific or more broadly
conserved) for a subset of the miRNAs could impart at least
intermittent pressure to preserve the miRNA sequence, thereby
explaining the preferential conservation observed in the central
region of vertebrate miRNAs (Figure 1A). Moreover, centered
sites resemble 3'-compensatory sites in providing a mechanism
by which different members of the same miRNA seed family can
repress distinct targets (Bartel, 2009).
Why would centered sites require so much more contiguous
pairing than that required by seed sites? When bound by the
Argonaute protein within the silencing complex, the seed is
thought to be preorganized to favor Watson-Crick pairing to
the mRNA (Bartel, 2004). In the current version of this seednucleation model, pairing cannot propagate to the center of
a miRNA without a substantial conformational change in which
the original contacts between Argonaute and the miRNA central
and 3' regions are disrupted (Bartel, 2009). Disrupting these
contacts offsets some binding energy gained in forming the
central pairs, causing contiguous pairing adjacent to the seed
to contribute less affinity than might have otherwise been
expected. This lower contribution of pairing to the central
region, combined with the higher contribution achieved by the
...............................
......
..............
..................
PR
E
S
Molecular Cell
MicroRNA Centered Sites
preorganized seed, would explain why so much more pairing is
needed for centered sites to achieve the same outcome as
7 nt seed sites.
Mg2* Effect on Cleavage Specificity and Efficiency
Our results shed light on the biochemistry of RNAi. We suggest
that at 370C, in the low Mg2+ concentrations present in the cell,
only the extensively paired sites can be bound with the stability
and conformation that favors mRNA cleavage, and that after
cleavage, the products are not so tightly bound so as to slow
multiple turnover. In higher Mg2 +, however, less extensively
paired sites achieve the stability and conformation needed for
cleavage, and product release is more apt to slow turnover.
This model explains the reduction of both specificity and
efficiency at extensively paired sites observed in high Mg2.
concentrations. Under these conditions, less extensively paired
sites are more readily cleaved-hence, the reduced specificity.
The more extensively paired sites, on the other hand, undergo
slower product release and gain little benefit from this more
permissive binding-and-cleavage regime. Indeed, any benefit
gained is more than offset by the tighter binding of the miRNA
to less extensively paired sites, which causes the total cellular
RNA present in extracts used for cleavage reactions to more
effectively inhibit utilization of the labeled substrates-hence,
the reduced efficiency. The free cytoplasmic Mg2* concentration
in most cells and tissues is <1 mM (Ginther, 2006), suggesting
that cleavage specificity is very high in vivo.
Our results explain previous observations regarding the
effects of adding phosphate-containing compounds to in vitro
cleavage reactions. Many diverse phosphate compounds,
including inorganic monophosphate, stimulate the multipleturnover cleavage by the mammalian silencing complex (Gregory et al., 2005). We suggest that these phosphate compounds
titrate the free Mg2+, which in turn increases product turnover
through decreased RNA duplex stability.
Endogenous miRNA-Directed mRNA Cleavage in Human
We find that miRNA-directed cleavage of mammalian mRNAs,
although even more rare than repression at centered sites,
occurs more frequently than previously appreciated. Two
endogenous cleavage targets had been reported in mammals,
HOXB8 and RTL1 (Yekta et al., 2004; Davis et al., 2005).
We substantially add to this list, with evidence for cleavage of
seven additional targets in HeLa cells and cleavage of 13 in
human brain, two of which overlapped with HeLa targets. This
small overlap, largely attributed to differential expression of the
miRNAs or mRNAs in the two samples (Tables 1, S10, S12,
and S13), suggests that as more tissues are examined, more
cleavage targets will be found.
The fraction of degradome sequencing tags that provided
evidence of miRNA-directed cleavage was generally higher in
the HeLa analysis than inthe brain analysis (Table 1 and Figures
S5B and S5E). Inthe brain, this fraction of cleavage tags was sufficiently low so as to suggest that some might represent degradation intermediates not indicative of miRNA-directed cleavage.
Whether a smaller fraction of brain messages are cleaved,
however, is unclear. The brain analysis lacked the benefit of the
XRN1-endonuclease knockdown, designed to stabilize the tran-
sient3' cleavage product so that it could be more readily detected
over the background of metastable mRNA-decay intermediates.
Moreover, whole brain has many cell types, with the possibility
that differential expression of a miRNA and its cleavage targets
might decrease the signal relative to background. Nonetheless,
for most cleavage targets in HeLa and for some in brain, degradome profiles resembled thoseof plant targets with validated biological relevance (Figures 6F-6H, S5B, and S5E) (Addo-Quaye
et al., 2008; German et al., 2008), strongly supporting the hypothesis that the miRNA-directed cleavage pathway is an important
degradation pathway for those mRNAs.
In both brain and HeLa cells, several cleavage targets
identified were targets of miR-151-5p. This miRNA derives
from a hairpin that has homology to the L2 subclass of repeat
elements known as long interspersed nuclear elements (LINEs).
L2 LINE elements are remnants of a non-LTR retrotransposon
activity present in the common ancestor of mammals. They
make up over 3% of the human genome (Kamal et al., 2006).
Indeed, the miR-151 hairpin is derived from a tail-to-tail arrangement of two L2 fragments (Figure 6E). Hence, miR-151-5p
derived from L2(+) is strongly complementary to several target
sites derived from L2(-) repeats. Analogous tail-to-tail arrangements of short (S)INE fragments produce transcripts with longer
hairpins that are processed in mouse ESCs into endogenous
siRNAs (Babiarz et al., 2008). However, miR-151-5p and miR151-3p are typical miRNAs, in that (1) their accumulation
depends on both Drosha/DGCR8 and Dicer endonucleases
(Babiarz et al., 2008), (2) they pair to each other with 2 nt 3' overhangs, (3) they are the two dominant products accumulating
from the hairpin (Figure S5C), and (4) their hairpin has aconservation pattern typical of other conserved miRNAs (Figure 6E).
Two other miRNAs that direct cleavage in HeLa, miR-28-5p and
miR-545*, are also L2 repeat-derived miRNAs. The notion that
these miRNAs and their targets ultimately derived from the
same ancestral elements is reminiscent of the origin of some plant
miRNAs, which derive from duplicated fragments of their cleavage
targets (Allen et al., 2004; Rajagopalan et al., 2006). In mammals,
however, the miRNAs and target sites evolved in parallel from the
common ancestor, rather than one from the other. Moreover, in
mammals, common ancestry between the miRNAs and their
targets can be detected for older, conserved miRNAs, such as
miR-1 51 and miR-28, whereas in plants, common ancestry has
been detected only for younger, nonconserved miRNAs.
The observation that many of the cleaved mRNAs were the
targets of repeat-derived miRNAs can be explained by the fact
that repeat-derived miRNAs are more likely to encounter extensively complementary matches, since repeat-element remnants
are found within many mRNAs. Over the course of evolution,
repeat-derived miRNAs presumably had access to a wide variety
of cleavage targets, providing the opportunity for some favorable
regulatory interactions to emerge and be retained as conserved
cleavage interactions. Thus, the repeat-derived miRNAs and
their cleavage targets provide yet another avenue for repetitive
elements to shape the regulation of cellular genes.
Concluding Remarks
The discovery of centered sites raises the question of how many
additional site types remain to be found. On the one hand,
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 799
............................
CeU
transcriptome/proteome changes observed after introducing or
deleting a miRNA can all be explained by direct interactions
between the miRNA and messages with the five known site types
(seed sites, 3'-supplementary seed sites, 3'-compensatory sites,
centered sites, and cleavage sites) combined with indirect
effects, as changes in the primary targets influence expression
of secondary targets. On the other hand, detailed experimental
follow-up on mRNAs that respond to the miRNA despite lacking
any of these established site types seems to indicate that some
of them should not be dismissed as secondary targets but might
instead be direct targets (Lal et al., 2009). However, the pairing
schemes proposed thus far for these unusual interactions have
not been defined sufficiently to provide predictive utility. That is,
incontrast to centered sites and the previously known site types,
these pairing schemes lack the specificity required to distinguish
other responsive messages with similar pairing from background.
Hence, experiments like that shown in Figures 1C-1 F cannot
distinguish responsive messages that satisfy these unusual pairing schemes from nonresponsive messages that do not. Perhaps
unknown factors binding to neighboring UTR elements help
achieve interaction specificity differently for each individual
mRNA in a manner too idiosyncratic to be generalized into site
types. Alternatively, future insights into miRNA targeting might
identify commonalities in these unusual interactions, which could
form the basis of novel site types with predictive value.
Molecular Cell
MicroRNA Centered Sites
sequences (miRBase 11.0) were aligned and classified into groups whose
members differed from each other at 55 positions. The miRNA with the lowest
miRBase annotation number was selected as the representative from each
group. For distinct mRNAs, the mRNA isoform with the longest 3' UTR (or, if
all 3' UTRs were ofthe same length, a randomly chosen isoform) was selected
from a previously filtered set of RefFlat and H-INV annotations (Baek et al.,
2008).
CA Score
To search for orthologous sites, we used 165 distinct miRNAs conserved
among mammals and a six-way genome alignment (human, mouse, rat,
dog, horse, and pig) from the UCSC Genome Browser (hgl 8, http://genome.
ucsc.edu/) (Rhead et al., 2010). Alignment penalty scores were determined,
and the second-worst rather than the worst score was selected as the CA
score to accommodate some genome-alignment errors, incomplete genome
sequences, and species-specific losses.
Generation of mIRNA-like Control Sequences
To generate controls with the same seed composition and same trinucleotide
composition as authentic miRNAs, chimeric miRNA sequences were created
by reciprocally recombining, using the link between nt 10 and 11 as the crossover breakpoint, two miRNAs randomly chosen (without replacement) from
miRNA pairs with the same dinucleotide at positions 10 and 11,considering
only our set of distinct miRNAs. Ten chimeric miRNA cohorts were generated
to estimate the signal-to-background ratios.
ACCESSION NUMBERS
High-throughput raw sequence reads and processed reads are available at
the NCBI GEO (accession number GSE22068).
EXPERIMENTAL PROCEDURES
A detailed description of all materials and methods used can be found inthe
Supplemental Information.
Microarray and Molecular Analyses
Array analyses were as inGrimson et al. (2007). Luciferase reporter constructs
were prepared as in Grimson et al. (2007), and assays were performed as in
Farh et al., (2005). In vitro cleavage reactions were essentially as in Haley
and Zamore (2004) and Shin (2008). Uncapped 5' ends of GSTM3 mRNA
degradation products were identified using the 5'-RACE kit (Invitrogen; Carlsbad, CA), as in Jones-Rhoades and Bartel (2004). starting with cells inwhich
XRN1 mRNA was knocked down more than 90%, as confirmed by RT-PCR
(Alemin et al., 2007). Degradome libraries were constructed essentially as in
Addo-Quaye et al. (2008). Small-RNA libraries were prepared for Illumina
sequencing as described (Grimson et al., 2008).
Analysis of mIRNA Conservation
Out of 223 miRNA genomic loci producing 197 mouse miRNAs conserved in
other mammals (Friedman et al., 2009), 203 miRNA loci producing miRNAs
with 5' ends validated from a large-scale profiling of mouse miRNAs (Chiang
et al., 2010) were used inthe analysis of Figure 1A.
Processing of Degradome Tags
After removing linker sequences and tags shorter than 20 nt, degradome tags
were mapped to RNAs annotated inthe ENSEMBL (http://www.ensembl.org/,
requiring a perfect match. To find "multiple loci tags" and tags that did not map
to annotated RNAs, filtered tags were mapped to the human genome (UCSC
Genome Browser, hg18, http://genome.ucsc.edu/). When determining TPRs,
filtered tags were mapped to a curated set of distinct mRNAs (Baek et al.,
2008). Expressed mRNAs were those represented by at least one degradome
tag.
mIRNA:site Duplexes
When searching for miRNA:site duplexes, distinct mRNAs and miRNAs were
selected to avoid over-counting predicted duplexes involving miRNA families
or mRNA isoforms. To select distinct miRNAs, all human miRNAs and miRNA
800 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
SUPPLEMENTAL INFORMATION
Supplemental Information includes Supplemental Experimental Procedures,
Supplemental References, five figures, and 13 tables and can be found with
this article online at doi:10.1016[j.molcel.2010.06.005.
ACKNOWLEDGMENTS
We thank Andrew Grimson, Michael Axtell, Daehyun Baek, and Alexander
Subteny for helpful discussions; Shujun Luo and Gary Schroth for Illumina
sequencing of the small-RNA library from brain; and the Whitehead Genome
Technology Core for the remaining Illumina sequencing. This work was supported by a Damon Runyon postdoctoral fellowship (C.S.) and a grant from
the NIH. D.P.B. isa Howard Hughes Medical Institute Investigator.
Received: December 22, 2009
Revised: April 27, 2010
Accepted: June 3, 2010
Published: June 24, 2010
REFERENCES
Addo-Quaye, C., Eshoo, T.W., Bartel, D.P., and Axtell, M.J. (2008). Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis
degradome. Curr. Biol. 18, 758-762.
Alemdn, L.M., Doench, J., and Sharp, P.A. (2007). Comparison of siRNAinduced off-target RNA and protein effects. RNA 13, 385-395.
Allen, E., Xie, Z., Gustafson, A.M., Sung, G.H., Spatafora, J.W., and Carrington,
J.C. (2004). Evolution of microRNA genes by inverted duplication of target
gene sequences inArabidopsis thaliana. Nat. Genet. 36, 1282-1290.
Allen, E., Xie, Z., Gustafson, A.M., and Carrington, J.C. (2005). microRNAdirected phasing during trans-acting siRNA biogenesis in plants. Cell 121,
207-221.
Ameres, S.L., Martinez, J., and Schroeder, R.(2007). Molecular basis for target
RNA recognition and cleavage by human RISC. Cell 130, 101-112.
............
:....
. ...
...
PR
E
S
Molecular Cell
MiCroRNA Centered Sites
Anderson, E.M., Birmingham, A., Baskerville, S., Reynolds, A., Maksimova, E.,
Leake, D., Fedorov, Y., Karpilow, J., and Khvorova, A. (2008). Experimental
validation of the importance of seed complement frequency to siRNA specificity. RNA 14, 853-861.
Appel, C.D., and Maxwell, E.S. (2007). Structural features of the guide:
target RNA duplex required for archaeal box C/D sRNA-guided nucleotide
2'-O-methylation. RNA 13, 899-911.
Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P., and Blelloch, R. (2008).
Mouse ES cells express endogenous shRNAs, siRNAs, and other
Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev. 22,
2773-2785.
Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008).
The impact of microRNAs on protein output. Nature 455, 64-71.
Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and
function. Cell 116,281-297.
Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions.
Cell 136, 215-233.
Birmingham, A., Anderson, E.M., Reynolds, A., lisley-Tyree, D., Leake, D.,
Fedorov, Y., Baskerville, S., Maksimova, E., Robinson, K., Karpilow, J., et al.
(2006). 3' UTR seed matches, but not overall identity, are associated with
RNAi off-targets. Nat. Methods 3, 199-204.
Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M.,
Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. (2004). Aligning
multiple genomic sequences with the threaded blockset aligner. Genome Res.
14, 708-715.
Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek,
D., Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian
microRNAs: experimental evaluation of novel and previously annotated genes.
Genes Dev. 24, 992-1009.
Davis, E., Caiment, F., Tordoir, X., Cavaills, J., Ferguson-Smith, A., Cockett,
N., Georges, M., and Charlier, C.(2005). RNAi-mediated allelic trans-interaction at the imprinted RtI1/Peg1 1 locus. Curr. Biol. 15, 743-749.
Elbashir, S.M., Harborth, J., Lendeckel, W., Yalcin, A., Weber, K., and
Tuschl, T. (2001 a). Duplexes of 21 -nucleotide RNAs mediate RNA interference
in cultured mammalian cells. Nature 411, 494-498.
Elbashir, S.M., Lendeckel, W., and Tuschl, T. (2001b). RNA interference is
mediated by 21- and 22-nucleotide RNAs. Genes Dev. 15, 188-200.
Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge,
C.B., and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817-1821.
Filipowicz, W., Bhattacharyya, S.N., and Sonenberg, N.(2008). Mechanisms of
post-transcriptional regulation by microRNAs: are the answers in sight? Nat.
Rev. Genet. 9, 102-114.
Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19, 92-105.
German, M.A., Pillay, M., Jeong, D.H., Hetawal, A., Luo, S., Janardhanan, P.,
Kannan, V., Rymarquis, L.A., Nobuta, K., German, R., et al. (2008). Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends.
Nat. Biotechnol. 26, 941-946.
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K.,
Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and clearance of matemal mRNAs. Science 312, 75-79.
Gregory, R.I., Chendrimada, T.P., Cooch, N., and Shiekhattar, R. (2005).
Human RISC couples microRNA biogenesis and posttranscriptional gene
silencing. Cell 123, 631-640.
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and
Bartel, D.P. (2007). MicroRNA targeting specificity inmammals: determinants
beyond seed pairing. Mol. Cell 27, 91-105.
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N.,
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and
evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455,
1193-1197.
G(nther, T.(2006). Concentration, compartmentation and metabolic function
of intracellular free Mg2+. Magnes. Res. 19, 225-236.
Haley, B., and Zamore, P.D. (2004). Kinetic analysis of the RNAi enzyme
complex. Nat. Struct. Mol. Biol. 11, 599-606.
Hutvdgner, G., and Zamore, P.D. (2002). A microRNA in a multiple-tumover
RNAi enzyme complex. Science 297, 2056-2060.
Jackson, A.L., Bartz, S.R., Schelter, J., Kobayashi, S.V., Burchard, J., Mao, M.,
Li, B., Cavet, G., and Linsley, P.S. (2003). Expression profiling reveals
off-target gene regulation by RNAi. Nat. Biotechnol. 21, 635-637.
Jackson, A.L., Burchard, J., Leake, D., Reynolds, A., Schelter, J., Guo, J.,
Johnson, J.M., Lim, L., Karpilow, J., Nichols, K., et al. (2006a). Positionspecific chemical modification of siRNAs reduces "off-target" transcript
silencing. RNA 12, 1197-1205.
Jackson, A.L., Burchard, J., Schelter, J., Chau, B.N., Cleary, M., Lim, L., and
Linsley, P.S. (2006b). Widespread siRNA "off-target" transcript silencing
mediated by seed region sequence complementarity. RNA 12,1179-1187.
Jones-Rhoades, M.W., and Bartel, D.P. (2004). Computational identification of
plant microRNAs and theirtargets, including astress-induced miRNA. Mol. Cell
14, 787-799.
Jones-Rhoades, M.W., Bartel, D.P., and Bartel, B. (2006). MicroRNAS and
their regulatory roles in plants. Annu. Rev. Plant Biol. 57, 19-53.
Kadener, S., Rodriguez, J., Abruzzi, K.C., Khodor, Y.L., Sugino, K., Marr, M.T.,
2nd, Nelson, S., and Rosbash, M. (2009). Genome-wide identification of
targets of the drosha-pasha/DGCR8 complex. RNA 15, 537-545.
Kamal, M., Xie, X., and Lander, E.S. (2006). A large family of ancient repeat
elements in the human genome is under strong selection. Proc. Natl. Acad.
Sci. USA 103, 2740-2745.
KrUtzfeldt, J., Rajewsky, N., Braich, R., Rajeev, K.G., Tuschl, T., Manoharan,
M., and Stoffel, M. (2005). Silencing of microRNAs in vivo with 'antagomirs'.
Nature 438, 685-689.
Lal, A., Navarro, F., Maher, C.A., Maliszewski, L.E., Yan, N., O'Day, E.,
Chowdhury, D., Dykxhoom, D.M., Tsai, P., Hofmann, 0., et al. (2009).
miR-24 Inhibits cell proliferation by targeting E2F2, MYC, and other cell-cycle
genes via binding to "seedless" 3'UTR microRNA recognition elements. Mol.
Cell 35, 610-625.
Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B.
(2003). Prediction of mammalian microRNA targets. Cell 115, 787-798.
Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing,
often flanked by adenosines, indicates that thousands of human genes are
microRNA targets. Cell 120, 15-20.
Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S., Rhoades,
M.W., Burge, C.B., and Bartel, D.P. (2003). The microRNAs of Caenorhabditis
elegans. Genes Dev. 17, 991-1008.
Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J.,
Bartel, D.P., Linsley, P.S., and Johnson, J.M. (2005). Microarray analysis
shows that some microRNAs downregulate large numbers of target mRNAs.
Nature 433, 769-773.
Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J.,
Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. (2004). Argonaute2 isthe
catalytic engine of mammalian RNAi. Science 305, 1437-1441.
Llave, C., Xie, Z., Kasschau, K.D., and Carrington, J.C. (2002). Cleavage of
Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA.
Science 297, 2053-2056.
Mallory, A.C., Reinhart, B.J., Jones-Rhoades, M.W., Tang, G., Zamore, P.D.,
Barton, M.K., and Bartel, D.P. (2004). MicroRNA control of PHABULOSA in
leaf development: importance of pairing to the microRNA 5' region. EMBO J.
23, 3356-3364.
Maniataki, E., and Mourelatos, Z. (2005). A human, ATP-independent, RISC
assembly machine fueled by pre-miRNA. Genes Dev. 19, 2979-2990.
Martinez, J., and Tuschl, T. (2004). RISC isa5' phosphomonoester-producing
RNA endonuclease. Genes Dev. 18, 975-980.
Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc. 801
................................
......
...
1
.1
.....
.
UP
R
E
S
Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and
Tuschl, T. (2004). Human Argonaute2 mediates RNA cleavage targeted by
miRNAs and siRNAs. Mol. Cell 15, 185-197.
Miyoshi, K., Tsukumo, H., Nagami, T., Siomi, H., and Siomi, M.C. (2005). Slicer
function of Drosophila Argonautes and its involvement in RISC formation.
Genes Dev. 19, 2837-2848.
Orban, T.I., and Izaurralde, E. (2005). Decay of mRNAs targeted by RISC
requires XRN1, the Ski complex, and the exosome. RNA 11, 459-469.
Rajagopalan, R., Vaucheret, H., Trejo, J., and Bartel, D.P. (2006). A diverse and
evolutionarily fluid set of microRNAs inArabidopsis thaliana. Genes Dev. 20,
3407-3425.
Rand, T.A., Petersen, S., Du, F., and Wang, X. (2005). Argonaute2 cleaves the
anti-guide strand of siRNA during RISC activation. Cell 123, 621-629.
Rhead, B., Karolchik, D., Kuhn, R.M., Hinrichs, A.S., Zweig, A.S., Fujita, P.A.,
Diekhans, M., Smith, K.E., Rosenbloom, K.R., Raney, B.J., at al. (2010). The
UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38,
D613-D619. Published online November 11, 2009. 10.1093/nar/gkp939.
Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R.,
van Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., et al. (2007).
Requirement of bic/microRNA-155 for normal immune function. Science
316,608-611.
Schwarz, D.S., Tomari, Y., and Zamore, P.D. (2004). The RNA-induced
silencing complex is a Mg2+-dependent endonuclease. Curr. Biol. 14,
787-791.
Schwarz, D.S., Ding, H., Kennington, L., Moore, J.T., Schelter, J., Burchard, J.,
Linsley, P.S., Aronin, N., Xu, Z., and Zamore, P.D. (2006). Designing siRNAthat
802 Molecular Cell 38, 789-802, June 25, 2010 @2010 Elsevier Inc.
Molecular Cell
MICroRNA Centered Sites
distinguish between genes that differ by a single nucleotide. PLoS Genet. 2,
e140.
Selbach, M., Schwanhdusser, B., Thierfelder, N., Fang, Z., Khanin, R., and
Rajewsky, N. (2008). Widespread changes in protein synthesis induced by
microRNAs. Nature 455, 58-63.
Serra, M.J., Baird, J.D., Dale, T., Fey, B.L., Retatagos, K., and Westhof, E.
(2002). Effects of magnesium ions on the stabilization of RNA oligomers of
defined structures. RNA 8, 307-323.
Shin, C. (2008). Cleavage of the star strand facilitates assembly of some
microRNAs into Ago2-containing silencing complexes inmammals. Mol. Cells
26, 308-313.
Souret, F.F., Kastenmayer, J.P., and Green, P.J. (2004). AtXRN4 degrades
mRNA in Arabidopsis and its substrates include selected miRNA targets.
Mol. Cell 15, 173-183.
Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005).
Animal MicroRNAs confer robustness to gene expression and have a significant impact on 3'UTR evolution. Cell 123, 1133-1146.
Wang, B., Li, S., Qi, H.H., Chowdhury, D., Shi, Y., and Novina, C.D. (2009a).
Distinct passenger strand and mRNA cleavage activities of human Argonaute
proteins. Nat. Struct. Mol. Biol. 16,1259-1266.
Wang, Y., Juranek, S., U, H., Sheng, G., Wardle, G.S., Tuschl, T., and Patel,
D.J. (2009b). Nucleation, propagation and cleavage of target RNAs in Ago
silencing complexes. Nature 461, 754-761.
Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of
HOXB8 mRNA. Science 304, 594-596.
Download