by

advertisement
Dicer deletion and short RNA expression analysis in mouse embryonic stem cells
by
Joseph Mauro Calabrese
B.S. Chemistry, Biochemistry, and Molecular Biology (2001)
University of Wisconsin-Madison
Submitted to the Department of Biology
In Partial Fulfillment of the Requirements for the Degree of
Doctor in Philosophy in Biology
at the
Massachusetts Institute of Technology
February 2008
C Massachusetts Institute of Technology
All rights reserved.
Signature of Author:_
J
Department of Biology
December 18d, 2007
Certified by:
C/'
Phillip A.
Sharp
Professor of Biology
Thesis Supervisor
Accepted by:
Steven P. Bell
Professor of Biology
Chair, Biology Graduate Committee
MASSACHUSETT
MImr1M
OF TEOHNOLOGY
FEB 1-2 2008
LIBRARIES
ARCHIVE8
Dicer deletion and short RNA expression analysis in mouse embryonic stem cells
By
Joseph Mauro Calabrese
Submitted to the Department of Biology in Partial Fulfillment of the Requirements for
the Degree of Doctor in Philosophy in Biology.
ABSTRACT
RNA interference (RNAi) manages many aspects of eukaryotic gene expression
through sequence-specific interactions with RNA. Short RNAs, 20-30 nucleotides in
length, guide the various effector proteins of RNAi to silence fully or partially
complementary targets. The sequencing of endogenously expressed short RNA species
coupled with genetic studies in various experimental organisms has revealed a role for
RNAi in the silencing of protein-coding genes and repetitive elements in genomes. In
mammals, it is unknown to what extent RNAi is involved in silencing processes other
than the modulation of protein-coding gene expression, which is achieved through a class
of short RNAs termed microRNAs (miRNAs).
The work in this thesis quantitatively describes the short RNAs expressed in
mouse embryonic stem (ES) cells. ES cell lines are derived from the pre-implantation
blastocyst and can be cultured in vitro for extended periods while still maintaining
pluripotency. It was demonstrated that approximately 130,000 5' phosphorylated short
RNA molecules are present in a single ES cell. 10% of these short RNAs represent nonrandom fragments of larger, abundant non-coding RNA species, and have no known
function. Low abundance short RNAs were discovered that cluster bidirectionally around
the transcription start sites of protein-coding genes. These RNAs associate with features
of active transcription, and may be evidence of widespread bidirectional initiation and
pausing of RNA polymerase II in ES cells.
There are on the order of 300 different miRNA species expressed in ES cells,
comprising 85% of the total pool of 130,000 5' phosphorylated short RNAs. Based on
experiments correlating miRNA abundance to target repression, only about 30 of these
miRNAs are expected to carry significant ES cell regulatory capacity. ES cells lacking
all miRNAs do not significantly change their morphology or gene expression patterns,
but do show a significant drop in growth rate compared to controls, suggesting that a
major function of ES cell miRNAs may be to govern cell division. A detailed
comparison of short RNAs expressed in ES cells with and without the ribonuclease Dicer
strongly suggests that miRNAs are the sole regulatory molecules that function through
the RNAi pathway in ES cells. Considering previous work showing that repeating
elements are frequently under Dicer-dependent repression, this observation raises the
possibility that mammalian miRNAs may in certain contexts function to silence repeating
genomic elements in addition to protein-coding genes.
ACKNOWLEDGEMENTS
To Phil, for training me to become a research scientist. I have learned so much in the lab
it is impossible to recount the details in this space. Thanks for all of your help and
advice, and for providing me with so many opportunities to discover.
To my committee members, Dave Bartel and Rudolf Jaenisch for sage advice and
expanding the way we thought about diverse problems.
To all of my co-workers in the Sharp lab over the years, so many of you have been
teachers and role models, in addition to friends. Thanks for everything, especially the
Muddy Charles trips. Sharp lab rules.
To my friends who have made my life outside of lab exciting, thanks. Many highlights
come to mind, including: the CCR retreats, Harvard Law parties, firing Ann and Keara,
taquito bombs, the syrup race, grilled pecan sandies, flip cup, uneven pool tables, road
trips, good beadings, physical challenges on Boston Commons, Department-funded
recruitment events at Pleasant Place, four 4 th of Julys, and other barbecue-centered
events, too numerous to mention.
To Nicole, for all of your support and for providing so many pleasant distractions during
times of stress.
To my parents, for all that you have done. Too few people in this world have had the
opportunities and support that you have given me. Without your guidance,
encouragement, and love things would be very different.
Discovery consists of seeing what everyone else has
seen and thinking what no one else has thought.
- Albert Szent-Gy6rgyi
5
TABLE OF CONTENTS
A bstract .....................................................................
................ 2
Chapter 1 RNA interference and the biology of mouse embryonic stem cells............... 6
Animal miRNAs..............................................................8
miRNA biogenesis .............................................................. 8..
miRNA-mediated silencing mechanisms .....
.................... .....
........ 11
m iRNA function .................................................................. 14
C. elegans antisense siRNAs...................................
..........
17
RNAi-mediated transcriptional silencing in S. pombe............................
20
RNAi-mediated viral and transcriptional silencing in plants ....................... 21
RNAi-mediated silencing of transposons and transgenes in C. elegans...........23
RNAi-mediated transposon control in D.melanogaster............................. 24
Mammalian RNAi and repetitive elements ........................................... 26
M ouse embryonic stem cells ....................................................... ..... 29
Chapter 2 Characterization of the short RNAs bound by the P 19 suppressor of RNA
silencing in mouse ES cells .............................................................. 48
Chapter 3 RNA sequence analysis defines Dicer's role in mouse ES cells..................89
Chapter 3 Appendix................................
........................ 129
Chapter 4 Short RNAs in the sense and anti-sense orientation from transcription initiation
sites in m ouse ES cells........................................................
........ 147
Chapter 5 Examining miRNA function in mouse ES cells ................................. 174
Conclusions and future directions ............................................................... 202
Chapter 1
RNA interference and the biology of mouse embryonic
stem cells
Introduction
To manage their gene expression programs, organisms employ many distinct
mechanisms. In one set of regulatory mechanisms, termed RNA interference (RNAi),
short RNAs 20-30 nucleotides (nt) in length, guide multi-protein complexes to suppress
functions of complementary nucleic acid targets. Present in many single-celled, and
likely all multi-celled eukaryotes, the processes of RNAi regulate protein-coding gene
expression, initiate and maintain transcriptional silencing of specific genomic loci, and
maintain genomic integrity and immunity via the silencing transposable elements. In
Chapter 1,the various mechanisms of RNAi-mediated silencing are described, focusing
heavily on studies conducted in mammals, though also touching on aspects of RNAi in
many experimental organisms. Additionally, a basic introduction to the biology of mouse
embryonic stem (ES) cells is included to provide appropriate background to the research
described in this thesis.
RNAi in the control of protein-coding gene expression in animals
Animal microRNAs
RNAi is a master regulator of protein-coding gene expression, predominantly
through a class of -22 nt long non-coding RNA genes termed microRNAs (miRNAs).
miRNAs are sequence-specific guide molecules for protein complexes that prevent
productive translation and destabilize mRNAs. miRNAs appear to be ubiquitously
expressed in all multi-cellular eukaryotes, and have recently been identified in the
unicellular eukaryote, Chlamydomonas reinhardtii(Molnar et al. 2007; Zhao et al.
2007a). Their roles are diverse, and in many cases, essential. From a growing set of
genetic, biochemical, and computational analyses, it appears that many miRNAs control
cell-fate specification and have pleiotropic effects on cellular environments, similar to
cell-type-specific transcription factors (Kloosterman and Plasterk 2006). miRNAs have
critical regulatory roles in plants as well as animals; however, significant differences
exist between plant and animal miRNA biosynthesis and function. In the text below,
only animal miRNAs are discussed.
miRNA biogenesis
miRNAs are transcribed by RNA Pol II as long primary transcripts, termed primiRNAs, that are capped, poly-adenylated, and frequently poly-cistronic (Cai et al. 2004;
Lee et al. 2004; Rodriguez et al. 2004). Many miRNAs are located in defined intergenic
transcriptional units (Saini et al. 2007), others are located in introns and likely coexpressed with host genes from single promoters (Baskerville and Bartel 2005). PrimiRNAs are processed in the nucleus by the Drosha-DGCR8 heterodimer to generate
-70 nt long pre-miRNA hairpins with characteristic 5' phosphates and 3' 2 nt overhangs
(Lee et al. 2003; Zeng et al. 2005; Han et al. 2006). Pre-miRNAs are then exported into
the cytoplasm by Exportin-5 and Ran-GTP (Ying et al. 2003).
After nuclear export, the pre-miRNA hairpin is processed by the cytoplasmic
enzyme Dicer to generate a ~22 base pair RNA duplex consisting of the mature miRNA
paired to its complement, termed the miRNA* (Grishok et al. 2001; Hutvagner et al.
2001; Ketting et al. 2001). This duplex is likely short-lived, as miRNA* levels are up to
100 fold lower than levels of corresponding miRNAs (Ruby et al. 2006). Overexpression
of an RNA-duplex binding protein in mammalian cells fails to capture miRNA-miRNA*
duplexes, consistent with their proposed short life span and suggesting that these
duplexes are bound by protein components in the cytoplasm (described in Chapter 2).
After Dicer processing, the mature single-stranded miRNA is then displaced from the
miRNA* and incorporated into an active silencing complex.
The Argonaute proteins bind miRNAs in the core of the multi-subunit RNAinduced silencing complex (RISC), the protein complex that mediates RNAi-based
silencing (Liu et al. 2004). Mammals have eight Argonaute paralogues, divided equally
between the Ago and Piwi subfamilies (Carmell et al. 2002). Ago subfamily members
are thought to strictly associate with miRNAs, while at least two Piwi subfamily
members associate with a separate class of RNAs, termed piRNAs (Liu et al. 2004;
O'Donnell and Boeke 2007). Of the 4 Ago proteins, only Ago2 is capable of cleaving
target transcripts that are perfectly complementary to bound miRNAs (Liu et al. 2004).
Transcript cleavage by Ago2 is multi-turnover and occurs on the target RNA directly
across from the 10h nucleotide measuring from the 5' end of the miRNA (Hutvagner and
Zamore 2002; Martinez and Tuschl 2004). Ago2 does not explicitly depend on this
cleavage activity to function, as expression of a cleavage-deficient Ago2 mutant protein
is able to fully rescue the phenotypic defects of a mouse Ago2 hematopoietic knockout
(Tang et al. 2007). The other Ago proteins lack significant cleavage activity and likely
function mainly to prevent translation of target mRNAs (Pillai et al. 2004).
Many proteins associate with the Argonautes either as RISC loading or accessory
factors. In HEK 293T cells, the double-stranded RNA (dsRNA) binding protein TRBP
associates with Dicer and Ago2 and is likely required for proper loading of miRNAs into
the RISC (Chendrimada et al. 2005). Biochemical studies mainly conducted in cells from
Homo sapiens and Drosophilamelanogasterhave shown many other proteins associate
with RISC as accessory factors, including: the fragile-X-mental-retardation protein
(FMRP), tudor staphylococcal nuclease (TSN), the vasa intronic gene (VIG), Mov 10,
elF6, and Gemin3 and Gemin4 (Meister et al. 2005; Sontheimer 2005; Chendrimada et al.
2007). The function of many of these proteins in miRNA-mediated silencing, and
whether they consistently associate with Ago and the RISC in multiple cell types,
remains unclear.
Loading of single-strand mammalian miRNAs into the RISC is thought to depend
on the difference in thermodynamic end stabilities between the two ends of the
miRNA/miRNA* duplex. Analysis of functional short-interfering RNAs (siRNAs) and
both vertebrate and invertebrate miRNAs has shown that the short RNA whose 5'
terminus is located at the end of the duplex that is least thermodynamically stable is
preferentially incorporated into the RISC (Khvorova et al. 2003; Schwarz et al. 2003).
This difference in thermodynamic stability can in many cases accurately predict which
strand of the pre-miRNA hairpin will be the miRNA and which will be preferentially
degraded as the miRNA*; however, exceptions exist where differences in thermodynamic
stability alone are insufficient to predict miRNA duplex strand choice (Khvorova et al.
2003; Schwarz et al. 2003).
In addition to miRNA duplex end stabilities, base pairing between the body of the
miRNA and miRNA* can affect how miRNAs are loaded into different RISCs. In D.
melanogaster, miRNAs duplexes with perfect complementarity across from the site of
Ago2 cleavage are preferentially incorporated into Ago2-containing RISCs, while those
with bulges in the would-be cleavage region-positions 9 through 11 measuring from the
miRNA 5' end-are preferentially incorporated into Agol-containing RISCs (Forstemann
et al. 2007; Tomari et al. 2007). In mammals, this differential incorporation of miRNAs
into RISCs does not appear to occur, as immunoprecipitation of different Argonautes
followed by miRNA microarray analysis shows that Ago 1, 2, and 3 bind all miRNAs
equally well (Liu et al. 2004).
Dynamic changes in miRNA levels have been observed along developmental axes
and changes in physiological state (Cheng et al. 2007; Neilson et al. 2007; Xu et al.
2007), suggesting the existence of active processes for the regulation of mature miRNA
levels. Nevertheless, some mature miRNAs appear to be very stable in non-dividing cells
(Song et al. 2003). Also, levels of mature miRNAs are often uncoupled from levels of
pre- and pri-miRNAs, suggesting that miRNA processing itself is a regulated process
(Obernosterer et al. 2006; Thomson et al. 2006).
miRNA-mediated silencing mechanisms
The mechanisms of miRNA-mediated gene silencing processes appear to be
diverse. The founding miRNA, Caenorhabditiselegans lin-4, was observed to prevent
translation of its target mRNA, lin-14, through an interaction that required partially
complementary sequences in lin-14's 3' untranslated region (UTR). This translational
repression did not significantly change lin-14 mRNA levels or the location of lin-14
mRNA in a polysome sedimentation profile (Lee et al. 1993; Wightman et al. 1993;
Olsen and Ambros 1999). A large body of subsequent work shows that translational
repression of mRNAs via partially complementary sequence interaction is the
predominant mechanism of miRNA-mediated gene silencing in animals (Bartel 2004).
However, at least one animal miRNA, miR-196, functions by cleaving perfectly
complementary target transcripts (Yekta et al. 2004). Moreover, it has been observed that
many miRNAs can destabilize target mRNAs, likely by causing relocation of mRNAs to
cytoplasmic processing bodies (P-bodies) (Bagga et al. 2005; Lim et al. 2005).
Sequence information in 3' UTRs predominantly dictates the type and extent of
miRNA-mediated mRNA repression. Comparative genomics studies have shown that
miRNA target sites on mRNAs are most conserved over bases 2-8 measuring from the 5'
end of the miRNA, termed the "seed" region of the miRNA (Lewis et al. 2003; Lewis et
al. 2005). Experiments testing the repressive capability of both artificial and natural
miRNAs are in agreement with this, having demonstrated that perfect base pairing
between the 5' end of the miRNA and the 3' UTR is a strong determinant of repression
(Doench and Sharp 2004; Brennecke et al. 2005). Additionally, these experiments
showed that extensive pairing to the 3' end of the miRNA can compensate for weak 5'
pairing, and that miRNA sites in close proximity synergize with each other,
demonstrating that a single UTR may be subject to regulation from many miRNAs
(Doench et al. 2003; Doench and Sharp 2004; Brennecke et al. 2005). More recently, it
has been shown that target site accessibility, local A/U content, target-site-proximal
conservation, and location of the target site relative to the stop codon, are all additional
determinants of miRNA-mediated repression (Grimson et al. 2007; Kertesz et al. 2007;
Nielsen et al. 2007).
Though it is accepted that the majority of animal miRNAs function by preventing
productive translation of their target mRNAs, the apparent mechanisms of translational
inhibition by miRNAs vary depending on the experimental system used by researchers.
Studies using various in vitro cell extracts or in vitro transcribed mRNAs have shown
miRNAs inhibit translational initiation in a manner dependent on a 7-methyl-guanine
(m7G) cap structure and a poly-A tail (Humphreys et al. 2005; Pillai et al. 2005; Wang et
al. 2006a; Mathonnet et al. 2007; Wakiyama et al. 2007). Argonaute proteins have a m7 G
cap binding domain that is similar to the that of the cap binding protein eIF4E, and this
domain is required for Ago-mediated translational repression of mRNAs (Kiriakidou et
al. 2007). Furthermore, addition of recombinant eIF4E to extracts interferes with
miRNA-mediated translational inhibition (Mathonnet et al. 2007). Together, these studies
support a model by which miRNA-guided RISCs bind m7G cap structures to prevent
translational initiation.
Apparently at odds with these findings are a number of studies analyzing cells
that suggest miRNAs inhibit translation at a step post-initiation (Olsen and Ambros 1999;
Seggerson et al. 2002; Maroney et al. 2006; Nottrott et al. 2006; Petersen et al. 2006). In
these works, miRNAs and mRNAs actively repressed by miRNAs co-sediment with
polyribosomes in sucrose gradients, and this co-sedimentation can be disrupted by
puromycin, suggesting it depends on actively translating ribosomes (Maroney et al. 2006;
Nottrott et al. 2006; Petersen et al. 2006). Further, miRNAs inhibited translation of a
reporter gene driven by the cricket paralysis virus IRES, which allows loading of
elongation-competent 80S ribosomes on mRNAs without the requirement for canonical
initiation factors and initiator tRNAs, again suggesting that miRNAs repress translation
post-initiation (Petersen et al. 2006). One potential explanation for these discrepancies
could be that miRNAs may inhibit translation both pre- and post-initiation, but certain
experimental conditions, such as those that utilize in vitro transcribed mRNAs, are
differentially sensitive to these two modes of inhibition.
miRNAs can also induce mRNA destabilization by targeting mRNAs for
deadenylation, and potentially decapping. Zebrafish miR-430 promotes clearance of
maternal mRNAs at the onset of zygotic transcription via deadenylation, and the miRNA
let-7 promotes translation-independent deadenylation of a reporter mRNA in vitro
(Giraldez et al. 2006; Wakiyama et al. 2007). Separate experiments show that Argonaute
proteins associate with P-bodies and decapping enzyme in tissue culture cells, indirectly
linking miRNA-mediated repression to decapping (Jakymiw et al. 2005; Liu et al. 2005;
Pillai et al. 2005; Sen and Blau 2005). Further, decay products have been detected of
miRNA-targeted mRNAs that are consistent with the 5' to 3' exonucleolytic degradation
mediated by Xrnlp, the nuclease that destroys uncapped mRNAs (Bagga et al. 2005).
miRNA function
The current miRNA database has annotations for 533 human and 442 mouse
miRNAs (Griffiths-Jones 2004). Given the large number and apparent ubiquitous
expression of miRNAs in animals, the potential for miRNA-mediated gene regulation is
large. The founding miRNAs were identified in forward genetic screens via their
phenotypic influence on genetic pathways (Chalfie et al. 1981; Reinhart et al. 2000);
however, the small amount of sequence complementarity needed for miRNA-mediated
repression suggests that most miRNAs affect many cellular pathways rather than one
specifically.
miRNAs tune translation from expressed mRNAs to define protein output of
targeted genes. In one specific example, reduction of atrophin levels by DrosophilamiR8 is required for normal central nervous system function. Importantly, further reduction
or overexpression of atrophin in otherwise wild-type flies result in a mutant phenotype,
indicating that miR-8 reduces atrophin expression to a level appropriate for normal
function (Karres et al. 2007). miR- 150 expression in differentiating B cells represents
another example of miRNA-mediated tuning of protein expression (Xiao et al. 2007).
miR-150 targets the transcription factor c-Myb during lymphocyte development. Similar
to the situation described for atrophin and miR-8 in Drosophila,the relief of miR- 150mediated c-Myb repression or a reduction of c-Myb protein levels both result in the
impairment of B cell development (Xiao et al. 2007).
Many miRNAs are reciprocally expressed with target mRNAs, suggesting that in
certain cases, miRNAs function as master regulators of cell-type specific transcriptional
and translational output (Farh et al. 2005; Stark et al. 2005). Genes that are highly
expressed in specific tissues have evolved to avoid targeting by abundant tissue-specific
miRNAs, whereas genes that are conserved targets of tissue specific miRNAs are
frequently expressed at low levels, or in tissues adjacent to the tissue specific miRNAsuch that gene expression boundaries and cell identity appear to be maintained by
miRNA expression (Farh et al. 2005; Stark et al. 2005). Additionally, genes that need to
be ubiquitously expressed, such as ribosomal protein genes, tend to have short UTRs that
avoid miRNA targeting completely (Stark et al. 2005). Consistent with a role for
miRNAs in the restriction of tissue identity, miRNA levels are generally down-regulated
in tumors, and impaired miRNA processing enhances tumorigenesis, a process in which
diverse collections of rapidly evolving cells need to adopt multiple cellular identities (Lu
et al. 2005; Kumar et al. 2007).
Genetic knockouts of specific mouse miRNAs reveal a striking intolerance for
loss of tissue-specific miRNA expression. The most poignant example of this thus far is
knockout of miR- 1-2, which is expressed specifically in muscle cells. Approximately
50% of miR-1-2 knockout mice die from severe cardiac dysfunction at or before
weaning, indicating a critical role for miR-1-2 in the heart (Zhao et al. 2007b). Also,
deletions of the lymphoid specific miRNAs miR-150 and -155 result in severe defects in
B-cell and T-cell differentiation, respectively (Rodriguez et al. 2007; Thai et al. 2007;
Xiao et al. 2007).
Deletion of Dicer from mouse tissues results in catastrophic abnormalities in all
cases examined, again suggesting that miRNA function is critical in many tissues;
however, because Dicer is required for the biogenesis of several other regulatory RNAs
in non-mammalian organisms, the phenotypic consequences of Dicer loss may not be
solely due to loss of miRNA expression and must be carefully interpreted. Dicer
knockout mice die at the earliest stage examined, E7.5, and oocyte-specific Dicer
deletion results in arrest at oocyte meiosis I, showing the necessity of Dicer activity at the
earliest stages in mouse development (Bernstein et al. 2003; Murchison et al. 2007; Tang
et al. 2007). Tissue-specific deletions of Dicer in the limb, lung, immune system, heart,
and epidermis all show catastrophic mutant phenotypes, consistent with a requirement for
Dicer function in these tissues (Harfe et al. 2005; Andl et al. 2006; Harris et al. 2006; Yi
et al. 2006; Zhao et al. 2007b).
To conclude, there is a role for miRNA-mediated gene regulation in a large
number of biological processes. Not discussed here are documented roles for miRNAs in
a range of biology, including apoptosis, metabolism, cell division, metastasis, local
translation at synapses, and management of circadian rhythms (Kloosterman and Plasterk
2006; Cheng et al. 2007; Wu et al. 2007; Xu et al. 2007). Additionally, it is possible that
miRNAs have a generalized role in regulating gene expression during stress (Leung and
Sharp 2007). Recent observations show that Ago2 and the RISC component FXR1 are
curiously required for the up-regulation of TNFa protein in human cells after serum
starvation (Vasudevan and Steitz 2007); whether or not this up-regulation is miRNAdependent is currently unclear.
C.elegans antisense siRNAs
In the nematode C. elegans, the expression of short RNAs antisense to proteincoding mRNAs is thought to modulate protein-coding gene expression in a manner
separate from miRNAs, likely by guiding direct mRNA cleavage (Ambros et al. 2003).
Several proteins are implicated in the biogenesis of these endogenous siRNAs, including
Dicer, an RNA-dependent RNA polymerase (RdRP), an RNA helicase, an RNAse D
homologue, a nucleotidyltransferase, and the conserved RNA phosphatase Pir- 1
(Duchaine et al. 2006; Lee et al. 2006a; Sijen et al. 2007). The mechanistic details of C.
elegans siRNA biogenesis remain unclear. Potentially, target mRNAs serve as templates
for an RdRP to generate double-stranded RNA species with 5' tri- or di-phosphates.
Presumably, these phosphates need to be removed before Dicer processing, as pir-1
phosphatase mutants accumulate long RNAs anti-sense to target transcripts (Duchaine et
al. 2006). Dicer processing then likely generates primary siRNAs that are low in
abundance and serve as guides to initiate a second round of RdRP synthesis, this time
resulting in abundant short 21-27 nt siRNAs that likely function to silence
complementary mRNAs (Ruby et al. 2006; Sijen et al. 2007).
C. elegans siRNAs have 5' tri- or di-phosphates, different from the 5' monophosphates of miRNAs. Unlike miRNAs, most C. elegans siRNAs do not serve as
substrates for T4 RNA ligase in vitro, which requires a 5' mono-phosphate; however, they
do show a shift in mobility after treatment with alkaline phosphatase, indicating the
presence of at least one terminal 5' phosphate (Pak and Fire 2007; Sijen et al. 2007). C.
elegans siRNAs also serve as substrates for in vitro capping reactions that require 5' trior di-phosphates, and exhibit gel mobility patterns that mimic the mobility of synthetic 5'
tri- and di-phosphorylated RNAs, strongly suggesting they are marked with 5' tri- or diphosphates (Ruby et al. 2006; Pak and Fire 2007; Sijen et al. 2007).
The 5' end modification of C. elegans siRNAs is noteworthy because it greatly
reduces endogenous siRNA sequencing frequency in short cDNA libraries that have been
prepared by selecting for the canonical 5' and 3' end modifications of animal miRNAs, 5'
monophosphates and 3' hydroxyls. Ruby and colleagues, selecting for short RNAs with
5' monophosphates and 3' hydroxyls, found miRNAs to be 100-fold more abundant than
anti-sense siRNAs in mixed-stage C. elegans (Ruby et al. 2006). In contrast, using a
cDNA library preparation method that was independent of 5' modification, Ambros and
colleagues found miRNAs and siRNAs to be approximately equal in abundance (Ambros
et al. 2003). It is currently unclear whether other organisms express endogenous short
RNAs with similarly modified 5' termini; however, these studies set the precedent for
short RNA species eluding discovery because of end modifications incompatible with
cDNA library preparation methods.
RNAi in the control of heterochromatin and transposable elements
RNAi has a conserved role in the silencing of transposable elements and the
establishment of heterochromatin at repetitive loci in eukaryotic genomes. These RNAibased silencing processes are diverse, and understood in varying detail, discussed in the
text below. There are many cases in which RNAi prevents the replication of exogenous
RNA viruses on a post-transcriptional level. Also, at least in plants, RNAi
transcriptionally silences exogenous viruses that have integrated into the genome. In a
related set of silencing mechanisms, RNAi prevents the spread of endogenous
transposable elements on both transcriptional and post-transcriptional levels. These types
of endogenous transposable elements represent a large portion of many eukaryotic
genomes, and usually express one or several proteins that function in concert with
cellular machinery to replicate. In certain cases, formation of heterochromatin around
these elements is a direct consequence of the protective role of RNAi. In many cases,
repeats are silenced not as an act of genomic defense but instead as a means to
coordinately regulate nuclear domains for maintenance of genome structure or in
response to developmental cues, suggesting exaptation of this defense pathway. Notably,
despite extensive conservation of RNAi components in eukaryotes, it is unclear to what
extent RNAi mediates the silencing of repetitive elements and the formation of
heterochromatin in mammals.
RNAi-mediated transcriptional silencing in Schizosaccharomyces pombe
RNAi-mediated transcriptional silencing is best understood in S. pombe, which
has only one member from each of three major gene families involved in RNAi.
Targeted deletion of the sole Dicer (dcrl), Argonaute (ago 1), or RdRP (rdp 1) de-silences
centromeric repeats and results in defects in mitotic chromosome segregation and
telomeric clustering, indicating a role for RNAi-mediated transcriptional silencing in
genomic integrity and high-order nuclear structure (Volpe et al. 2002; Hall et al. 2003;
Sugiyama et al. 2005). RNAi is also needed for heterochromatin establishment at the
repetitive mating-type locus, which is a 20kb region harboring a copy of the centromeric
repeat cenH flanked by inverted repeats that serve as boundary elements to
heterochromatin formation (Hall et al. 2002; Jia et al. 2004).
A detailed mechanistic model has emerged for RNAi-mediated establishment and
maintenance of heterochromatin in S. pombe (Colmenares et al. 2007). Nascent
transcripts generated by RNA Polymerase II (RNA Pol II) at heterochromatic loci are
bound by complementary short RNAs carried in the RNA-induced-transcriptionalsilencing, or RITS, complex (Buhler et al. 2006; Irvine et al. 2006). The trimeric RITS
complex, composed of Ago 1 bound to a heterochromatic siRNA, the chromodomaincontaining protein Chpl, and a protein of unknown function, Tas3 (Verdel et al. 2004),
then recruits an RdRP-containing complex (the RDRC) to the heterochromatic locus in a
manner that requires both the catalytic cleavage activity of Agol and the histone3-lysine9 (H3K9) binding activity of Chpl (Motamedi et al. 2004; Noma et al. 2004; Irvine et al.
2006). Heterochromatin formation and siRNA production both require the H3K9
methyltransferase Clr4, highlighting the importance of Chp l's interaction with chromatin
(Noma et al. 2004). Once tethered to the nascent transcript, the RDRC creates a doublestranded RNA that is processed by Dicer to generate more heterochromatic siRNAs and
start the cycle anew (Colmenares et al. 2007). Interestingly, a centromeric repeat
exogenously introduced into euchromatin is sufficient to induce RNAi-mediated
heterochromatin formation, suggesting either a sequence specificity to the RNAimediated induction of heterochromatin, or that siRNA-loaded RITS can act in trans to
silence dispersed repeats (Hall et al. 2002).
RNAi-mediated viral and transcriptional silencing in plants
The RNAi pathway has a significant role in plant antiviral immunity. Many plant
viruses encode single- or double-stranded RNA genomes that are recognized by RNAi
machinery as deleterious and serve as substrates for the generation and amplification of
targeting siRNAs. In this process, the viral genome is converted into double-stranded
RNA via an RdRP and subsequently converted into siRNAs by a plant Dicer, after which
the viral siRNAs are incorporated into a viral-targeting RISC that likely contains plant
Agol at its core (Xie and Guo 2006; Zhang et al. 2006). An interesting aspect of this
silencing mechanism is that viral siRNAs are not only present in the infected cells, but
spread throughout the plant and protect distal portions of the plant from subsequent viral
infections (Hamilton et al. 2002). Accordingly, plants with mutations in various Dicers
(dcl2 and dcl4) and RdRPs (rdr I and rdr6) are highly susceptible to local and systemic
viral infection (Mourrain et al. 2000; Deleris et al. 2006). Emphasizing its protective
importance, potentially all plant RNA viruses have evolved to express proteins that are
potent inhibitors of the plant RNAi antiviral response (Voinnet 2005).
The P19 protein from Tombus family of viruses is the best-characterized viral
inhibitor of RNAi (Scholthof 2006). P 19 functions by binding and sequestering viraltargeting siRNAs that in their free form would be incorporated into a plant RISC. X-ray
crystallographic and biochemical studies show that head-to-head dimers of P19 bind
siRNA duplexes with low nanomolar affinity (Vargason et al. 2003; Ye et al. 2003;
Lakatos et al. 2004). Many viral inhibitors of RNAi are proposed to function similarly
(Voinnet 2005); still others inhibit different steps of the RNAi pathway, as demonstrated
by the 2b protein from the Cucumber mosaic virus, which inhibits plant AGO 1 (Zhang et
al. 2006).
RNAi also plays a significant role in the establishment of heterochromatin
surrounding repetitive elements in plant genomes. One branch of this pathway mediates
the formation of heterochromatin around invading viruses and exogenously introduced
transgenes that have integrated into plant genomes, and has genetic requirements similar
to those described above for post-transcriptional silencing of viruses, namely dcl2, rdrl,
and rdr6 (Dalmay et al. 2000; Fagard et al. 2000; Mourrain et al. 2000). The other branch
serves to silence endogenous repetitive elements, and requires separate RNAi paralogues
to function. Certain classes of transposons and ribosomal DNA loci are de-silenced in
mutant strains of Dicer (dcl3), Argonaute (ago4), and RdRP (rdr2). This de-silencing is
accompanied by disappearance of short RNAs corresponding to the repeats, and a
reduction in H3K9 and DNA methylation at the repetitive loci (Lippman et al. 2003;
Zilberman et al. 2003; Chan et al. 2004; Xie et al. 2004). Synthesis and subsequent
loading of heterochromatic siRNAs into silencing complexes also depends on RNA
Polymerase IV, a DNA-dependent RNA polymerase that localizes with Ago4 in nuclear
Cajal bodies (Onodera et al. 2005; Li et al. 2006; Zhang et al. 2007).
RNAi-mediated silencing of transposons and transgenes in C. elegans
RNAi is required for the suppression of transposon replication and the
transcriptional silencing of transgene arrays in the C. elegans germline and soma.
Genetic screens first uncovered a role for RNAi in these processes, showing that the
RNAseD homologue mut-7 and the Argonaute-like protein ppw-2 are required for
germline suppression of transposition in C. elegans (Ketting et al. 1999; Vastenhouw et
al. 2003). Subsequent work showed that many genes required for the suppression of
transposition are also required for co-suppression, the process by which high-copy
transgenes can induce the silencing of related endogenous genes in trans (Ketting and
Plasterk 2000). Candidate-based RNAi screens uncovered additional genes required for
germline co-suppression, including many chromatin modifiers, suggesting that cosuppression is at least partly a transcriptional silencing process (Robert et al. 2005).
Transcriptional silencing of transgene arrays in the C. elegans soma requires a different
set of RNAi paralogues, including Dicer (dcr-1), the double-stranded RNA binding
protein rde-4, the Argonaute protein rde-1, and the RdRP rrf-1 (Grishok et al. 2005).
RNAi-mediated transcriptional silencing, transposon control, and viral defense in D.
melanogaster
Transcriptional and post-transcriptonal control of endogenous repeats by RNAi
D. melanogasterRNAi pathway components are required for the transcriptional
and post-transcriptional control of repetitive elements. Post-transcriptional silencing of
the tandemly repeated Stellate genes and other classes of retrotransposons requires at
least one of two genes from the Piwi-subfamily of Argonaute proteins, Aubergine and
Piwi, and two DExH-box helicases also involved in RNAi, Spindle-E and Armitage
(Aravin et al. 2001; Vagin et al. 2006). The silencing of transgene arrays on the
transcriptional and post-transcriptional levels has similar genetic requirements (PalBhadra et al. 2002; Pal-Bhadra et al. 2004). Moreover, piwi, aubergine,and spindle-E
mutant flies show genome wide defects in heterochromatin, including reduction in H3K9
methylation levels and delocalization of the heterochromatin binding protein, HP 1 (PalBhadra et al. 2004). Curiously, a repetitive locus associated with the telomere on the
right arm of D. melanogasterchromosome 3, the 3R-TAS locus, is constitutively
euchromatic; however, in piwi mutants 3R-TAS becomes heterochromatinized despite a
genome-wide decrease in H3K9 marks, suggesting that Piwi may also inhibit the spread
of heterochromatin in addition to nucleating its formation (Yin and Lin 2007).
RNAi components are also required for the nuclear clustering of Polycomb genes
in D. melanogaster(Grimaud et al. 2006). Using transgenic flies carrying multiple
copies of the Polycomb response element Fab-7 integrated at different genomic locations,
Grimaud and colleagues showed that Piwi, Ago 1, and Dicer-2 (Dcr-2) frequently colocalize with nuclear clusters of Fab-7 (Grimaud et al. 2006). Further, piwi, agol, and
dcr2 mutant flies still maintained silencing but could no longer organize Fab-7 repeats
into punctate nuclear foci, indicating a role for RNAi in the high-order nuclear
organization but not silencing of Polycomb response elements (Grimaud et al. 2006).
Sequence analysis of short RNAs associated with Piwi subfamily proteins (or
piRNAs, for piwi-interacting RNAs) reveals several interesting characteristics
(Brennecke et al. 2007; Gunawardane et al. 2007). D. melanogaster piRNAs are on
average longer than miRNAs, 23-27 compared to -22 nt. Though the majority are
complementary to highly repetitive elements, analysis of those with unique genomic
locations reveals that piRNAs are generated from large, discreetly located clusters that
harbor diverse classes of transposons (Brennecke et al. 2007). Strikingly, piRNAs
associated with different Piwi subfamily members show characteristic first and tenth
nucleotide biases, such that the large majority of Piwi- and Aubergine-associated RNAs
begin with 'U' residues, while Ago3-associated RNAs frequently contain 'A' residues at
their tenth nucleotide (Brennecke et al. 2007; Gunawardane et al. 2007). Even more
surprising is the staggered overlap of piRNAs associated with different Piwi-subfamily
members. It was observed that a large number of Ago3 piRNAs were complementary to
the first 10 bases of piRNAs associated with Aubergine and Piwi (Brennecke et al. 2007;
Gunawardane et al. 2007). Together with the observations that (1) the first and tenth
nucleotides of Aubergine/Piwi- and Ago3-associated RNAs are complementary ('U' and
'A', respectively), (2) Piwi proteins are capable of cleaving complementary transcripts,
and (3) piRNAs appear to exist in the absence of Dicer, the data suggest a model in which
piRNAs are predominantly generated by other piRNA-containing protein complexes
rather than Dicer cleavage of long, double-stranded RNA (Vagin et al. 2006; Brennecke
et al. 2007; Gunawardane et al. 2007). Considering this model, one major puzzle is how
this putative piRNA-induced biosynthetic loop is initiated.
RNAi-mediated antiviral immunity in D. melanogaster
D. melanogasterantiviral immunity depends on the RNAi pathway, analogous to
the antiviral role of RNAi in plants. Mutations in Dicer-2, the Dicer-2 binding protein
R2D2, or Ago2, render flies hypersusceptible to infection and lethality by various
exogenous viruses (Galiana-Arnoux et al. 2006; van Rij et al. 2006; Wang et al. 2006b).
At least 2 Drosophilaviruses encode proteins, necessary for successful infection, that
function by potently suppressing Dicer-2/R2D2 processing of double-stranded RNA,
further evidence that D. melanogaster RNAi participates in antiviral immunity (Li et al.
2002; Galiana-Arnoux et al. 2006; van Rij et al. 2006).
Mammalian RNAi and repetitive elements
As described in the preceding sections of Chapter 1, repetitive regions of many
eukaryotic genomes are frequently maintained in a silent state via the RNAi pathway.
Though mammalian repeating elements are frequently associated with heterochromatin
(Thurman et al. 2007), the extent that mammalian RNAi is involved in
heterochromatization of repetitive elements is currently unclear. Mammals do encode
Piwi subfamily proteins that share characteristics with their Drosophilahomologues,
potentially indicating the existence of a germline RNAi pathway to repress the expression
of mammalian repetitive elements. Also, studies examining aspects of early mouse
development potentially implicate the RNAi pathway in the silencing of mammalian
repeats outside of the germline.
Mammalian piRNAs
Like their Drosophilahomologues, mammalian Piwi subfamily proteins
associate with germ cell-specific RNAs termed piRNAs. These RNAs are 29-30 nt long
and a large majority have 5' 'U' residues (Aravin et al. 2006; Girard et al. 2006; Lau et al.
2006). Unlike DrosophilapiRNAs, mammalian piRNAs are generally not repetitive and
frequently map uniquely to the genome (Aravin et al. 2006; Girard et al. 2006; Lau et al.
2006). piRNAs are produced from genomic clusters spanning approximately 20-100 kb,
and exhibit a striking strand bias within these clusters, such that sequences associated
with a particular Piwi protein almost never overlap in polarity (Aravin et al. 2006; Girard
et al. 2006; Lau et al. 2006). This polarity bias is consistent with a mammalian piRNA
biosynthetic pathway similar to that proposed in Drosophila,where different Piwi
paralogues associate with partially complementary sequences and appear to
synergistically synthesize piRNAs (Brennecke et al. 2007; Gunawardane et al. 2007).
piRNA sequences are not conserved between mouse, rat, and human; however, syntenic
genomic regions give rise to piRNA clusters in these three organisms, suggesting that
genomic location rather than sequence may be important for piRNA function (Aravin et
al. 2006; Girard et al. 2006; Lau et al. 2006).
Ablation in mice shows a role for Piwi proteins in spermatogenesis and implicates
them in the germline silencing of repetitive elements. There are 4 Piwi proteins in mice:
Miwi, Miwi2, Mili, and PiwiL3. PiwiL3 has not been studied in any context. Mice
lacking Miwi, Mili, or Mili2 are male-specific infertile and show various defects in
spermatogenesis (Deng and Lin 2002; Kuramochi-Miyagawa et al. 2004; Carmell et al.
2007). Additionally, Mili and Miwi2 null mice show reduced DNA methylation at
repetitive elements, implicating these genes in a germline silencing pathway that
methylates repetitive elements (Aravin et al. 2007; Carmell et al. 2007). While consistent
with Piwi function in Drosophila,the proposed role of Miwi2 and Mili in the silencing of
repeats is at odds with sequence data showing that Mili associates predominantly with
short RNAs that are not repetitive. It would seem possible, however, that the minority of
Mili-associated piRNAs that are repetitive could guide silencing in trans. Alternatively,
the observed DNA methylation defects in the germlines of Mili and Miwi2 null mice may
be an indirect consequence of Piwi protein loss.
Proposed roles for Dicer in the silencing of endogenous repeats
Studies of cells deleted or hypomorphic for Dicer indicate a potential role for
mammalian RNAi in endogenous repeat silencing outside of the germline. DNA derived
from the long interspersed nuclear element, or LINE, makes up approximately 20% of the
mouse and human genome (Lander et al. 2001; Waterston et al. 2002). Full-length LINE
repeats are -6kb in length and encode a reverse-transcriptase and chaperone protein that
replicate the LINE genome after it has been transcribed by cellular RNA polymerase II.
Human cells treated with siRNA to reduce Dicer levels show a mild increase in frequency
of LINE retrotransposition, suggesting that RNAi may repress LINE replication (Yang
and Kazazian 2006). Knockdown of Dicer via RNAi in one-cell mouse embryos results
in a -50% increase in steady-state levels of different classes of endogenous long terminal
repeat (LTR) retrotransposons, which have a different genome structure but replicate via
similar mechanisms as LINE repeats (Svoboda et al. 2004). LTR elements are also
abundant repeats, representing approximately 10% of the human and mouse genomes
(Lander et al. 2001; Waterston et al. 2002). Similar to observations from early mouse
embryos, Dicer knockout mouse oocytes accumulate retrotransposon RNA, and display
elevated levels of transcripts containing specific classes of repeats (Murchison et al.
2007). Finally, Dicer knockout mouse ES cell lines have been reported to show increases
in steady-state levels of centromeric repeat transcripts (Kanellopoulou et al. 2005;
Murchison et al. 2005). All of these studies implicate Dicer in the control of repetitive
elements in mammalian genomes, but it is currently unclear whether or not these effects
are direct. Importantly, it has not yet been shown that mammalian Dicer generates the
putative repeat-derived siRNAs that have been proposed to mediate the above-mentioned
repressive effects.
A brief introduction to mouse ES cells
RNAi is essential for normal function in all mammalian tissue types examined, yet
in the majority of these cases, the essential regulatory functions performed by RNAi are
unclear. The functions of specific animal miRNAs have been difficult to determine, not
only because they require so little sequence complementarity to influence target gene
expression, but because they likely each affect several target genes, making it difficult to
computationally determine functionally relevant targets within specific tissues. RNAi
also functions through several different types of silencing molecules, affecting gene
expression in diverse ways. In mammals, it is unclear how RNAi influences gene
expression other than through miRNA-mediated silencing pathways. For example, it
seems likely that mammalian piRNAs mediate silencing processes separate from
miRNAs, but what these silencing processes are is unclear. The work described in this
thesis focuses on defining the roles of RNAi in mouse ES cells. ES cells have a number
of interesting properties relevant to both medicine and the biology of the early embryo;
thus an understanding of RNAi-mediated gene regulation in ES cells will likely have
broad applications.
ES cells are cultured derivatives of the pre-implantation inner cell mass (ICM) of
the blastocyst. The ICM is composed of the progenitor cells that will eventually give rise
to a fully developed embryo (Niwa 2007). At the developmental stage from which ES
cells are derived, the ICM is in an undifferentiated, epigenetically plastic state; genomewide DNA methylation levels are a fraction of what they will be in differentiated cells
(Kafri et al. 1992; Rougier et al. 1998), and in female ICMs, paternal X chromosomes are
re-activated to allow random inactivation in the epiblast (Mak et al. 2004).
ES cells retain several characteristics of the ICM from which they are derived,
most notably they exhibit a large degree of epigenetic plasticity and are pluripotent. ES
cells can survive with two active X chromosomes, and in the complete absence of DNA
methylation, highlighting their epigenetic plasticity (Rastan and Robertson 1985; Lei et
al. 1996; Okano et al. 1999). A direct consequence this plasticity is ES cell pluripotency,
defined as the ability of ES cells to give rise to all tissues in a fully developed embryo
(Beddington and Robertson 1989). ES cell pluripotency can be maintained for extended
periods in culture, and under appropriate conditions, ES cells can be differentiated into a
number of cell types in vitro, raising the possibility that human ES cells may someday be
used as tissue sources in regenerative therapies (Pera and Trounson 2004; Keller 2005).
A number of factors have been implicated in the maintenance of ES cell
pluripotency. The cytokine LIF (for leukemia inhibitory factor) activates a STAT3dependent transcriptional program that is important for maintenance of pluripotency
(Smith et al. 1988; Williams et al. 1988; Niwa et al. 1998). Also, Smad-dependent
induction of the Id (for inhibitor of differentiation) genes by BMP4 is critical for
maintenance of ES cell pluripotency (Ying et al. 2003). Together, the LIF and BMP4
signaling molecules are sufficient for prolonged cell culture maintenance of ES cell
pluripotency in the absence of serum (Ying et al. 2003).
The transcription factors Oct4, Sox2, and Nanog are additional requirements for
the maintenance of ES cell pluripotency (Nichols et al. 1998; Chambers et al. 2003;
Mitsui et al. 2003; Masui et al. 2007). These three transcription factors frequently colocalize at the promoters of their target genes (Boyer et al. 2005). Oct4/Sox2/Nanog
bound genes can be broadly grouped into two classes: genes that are transcriptionally
active and likely contribute to ES cell identity, and genes that are transcriptionally silent.
The silent class of Oct4/Sox2/Nanog-bound genes are also bound by the Polycomb
complex, and are highly enriched in developmental regulator genes whose ES cell
expression would likely lead to differentiation (Boyer et al. 2006; Lee et al. 2006b).
RNAi also has a role in the maintenance of ES cell identity. Unlike many
differentiated cell types, ES cells can survive deletion of Dicer; however, despite
appearing morphologically normal and expressing wild-type levels of the pluripotency
markers Oct4 and Nanog, Dicer null ES cells are no longer pluripotent (Kanellopoulou et
al. 2005). Given the demonstrated role of miRNAs as essential regulators of cell-fate
specification, this loss of pluripotency is likely not completely due to a change in ES cell
state. Rather, it may partly be due to the inability of putative differentiated precursors to
fully differentiate without the presence of additional non-ES cell miRNAs. Nevertheless,
ES cells express a set of miRNAs specific to early developmental lineages that are likely
important for ES cell identity (Houbaviy et al. 2003; Houbaviy et al. 2005; Tang et al.
2007). Dicer null ES cells also display proliferation defects (Murchison et al. 2005) (see
also Chapter 5 of this thesis), again consistent with a cell autonomous role for RNAi in
the maintenance of ES cell identity.
References
Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003. MicroRNAs
and other tiny endogenous RNAs in C. elegans. CurrBiol 13(10): 807-818.
Andl, T., Murchison, E.P., Liu, F., Zhang, Y., Yunta-Gonzalez, M., Tobias, J.W., Andl,
C.D., Seykora, J.T., Hannon, G.J., and Millar, S.E. 2006. The miRNA-processing
enzyme dicer is essential for the morphogenesis and maintenance of hair follicles.
Curr Biol 16(10): 1041-1049.
Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., lovino, N.,
Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T. et al. 2006. A
novel class of small RNAs bind to MILI protein in mouse testes. Nature
442(7099): 203-207.
Aravin, A.A., Naumova, N.M., Tulin, A.V., Vagin, V.V., Rozovsky, Y.M., and Gvozdev,
V.A. 2001. Double-stranded RNA-mediated silencing of genomic tandem repeats
and transposable elements in the D. melanogaster germline. Curr Biol 11(13):
1017-1027.
Aravin, A.A., Sachidanandam, R., Girard, A., Fejes-Toth, K., and Hannon, G.J. 2007.
Developmentally regulated piRNA clusters implicate MILI in transposon control.
Science 316(5825): 744-747.
Bagga, S., Bracht, J., Hunter, S., Massirer, K., Holtz, J., Eachus, R., and Pasquinelli, A.E.
2005. Regulation by let-7 and lin-4 miRNAs results in target mRNA degradation.
Cell 122(4): 553-563.
Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
Baskerville, S. and Bartel, D.P. 2005. Microarray profiling of microRNAs reveals
frequent coexpression with neighboring miRNAs and host genes. Rna 11(3): 241247.
Beddington, R.S. and Robertson, E.J. 1989. An assessment of the developmental
potential of embryonic stem cells in the midgestation mouse embryo.
Development 105(4): 733-737.
Bernstein, E., Kim, S.Y., Carmell, M.A., Murchison, E.P., Alcorn, H., Li, M.Z., Mills,
A.A., Elledge, S.J., Anderson, K.V., and Hannon, G.J. 2003. Dicer is essential for
mouse development. Nat Genet 35(3): 215-217.
Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther,
M.G., Kumar, R.M., Murray, H.L., Jenner, R.G. et al. 2005. Core transcriptional
regulatory circuitry in human embryonic stem cells. Cell 122(6): 947-956.
Boyer, L.A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L.A., Lee, T.I., Levine,
S.S., Wernig, M., Tajonar, A., Ray, M.K. et al. 2006. Polycomb complexes
repress developmental regulators in murine embryonic stem cells. Nature
441(7091): 349-353.
Brennecke, J., Aravin, A.A., Stark, A., Dus, M., Kellis, M., Sachidanandam, R., and
Hannon, G.J. 2007. Discrete Small RNA-Generating Loci as Master Regulators of
Transposon Activity in Drosophila. Cell.
Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. 2005. Principles of microRNAtarget recognition. PLoS Biol 3(3): e85.
Buhler, M., Verdel, A., and Moazed, D. 2006. Tethering RITS to a nascent transcript
initiates RNAi- and heterochromatin-dependent gene silencing. Cell 125(5): 873886.
Cai, X., Hagedorn, C.H., and Cullen, B.R. 2004. Human microRNAs are processed from
capped, polyadenylated transcripts that can also function as mRNAs. Rna 10(12):
1957-1966.
Carmell, M.A., Girard, A., van de Kant, H.J., Bourc'his, D., Bestor, T.H., de Rooij, D.G.,
and Hannon, G.J. 2007. MIWI2 is essential for spermatogenesis and repression of
transposons in the mouse male germline. Dev Cell 12(4): 503-514.
Carmell, M.A., Xuan, Z., Zhang, M.Q., and Hannon, G.J. 2002. The Argonaute family:
tentacles that reach into RNAi, developmental control, stem cell maintenance, and
tumorigenesis. Gene Dev 16(21): 2733-2742.
Chalfie, M., Horvitz, H.R., and Sulston, J.E. 1981. Mutations that lead to reiterations in
the cell lineages of C. elegans. Cell 24(1): 59-69.
Chambers, I., Colby, D., Robertson, M., Nichols, J., Lee, S., Tweedie, S., and Smith, A.
2003. Functional expression cloning of Nanog, a pluripotency sustaining factor in
embryonic stem cells. Cell 113(5): 643-655.
Chan, S.W., Zilberman, D., Xie, Z., Johansen, L.K., Carrington, J.C., and Jacobsen, S.E.
2004. RNA silencing genes control de novo DNA methylation. Science
303(5662): 1336.
Chendrimada, T.P., Finn, K.J., Ji, X., Baillat, D., Gregory, R.I., Liebhaber, S.A.,
Pasquinelli, A.E., and Shiekhattar, R. 2007. MicroRNA silencing through RISC
recruitment of eIF6. Nature 447(7146): 823-828.
Chendrimada, T.P., Gregory, R.I., Kumaraswamy, E., Norman, J., Cooch, N., Nishikura,
K., and Shiekhattar, R. 2005. TRBP recruits the Dicer complex to Ago2 for
microRNA processing and gene silencing. Nature 436(7051): 740-744.
Cheng, H.Y., Papp, J.W., Varlamova, O., Dziema, H., Russell, B., Curfman, J.P.,
Nakazawa, T., Shimizu, K., Okamura, H., Impey, S. et al. 2007. microRNA
modulation of circadian-clock period and entrainment. Neuron 54(5): 813-829.
Colmenares, S.U., Buker, S.M., Buhler, M., Dlakic, M., and Moazed, D. 2007. Coupling
of double-stranded RNA synthesis and siRNA generation in fission yeast RNAi.
Mol Cell 27(3): 449-461.
Dalmay, T., Hamilton, A., Rudd, S., Angell, S., and Baulcombe, D.C. 2000. An RNAdependent RNA polymerase gene in Arabidopsis is required for
posttranscriptional gene silencing mediated by a transgene but not by a virus. Cell
101(5): 543-553.
Deleris, A., Gallego-Bartolome, J., Bao, J., Kasschau, K.D., Carrington, J.C., and
Voinnet, 0. 2006. Hierarchical action and inhibition of plant Dicer-like proteins
in antiviral defense. Science 313(5783): 68-71.
Deng, W. and Lin, H. 2002. miwi, a murine homolog of piwi, encodes a cytoplasmic
protein essential for spermatogenesis. Dev Cell 2(6): 819-830.
Doench, J.G., Petersen, C.P., and Sharp, P.A. 2003. siRNAs can function as miRNAs.
Genes Dev 17(4): 438-442.
Doench, J.G. and Sharp, P.A. 2004. Specificity ofmicroRNA target selection in
translational repression. Genes Dev 18(5): 504-511.
Duchaine, T.F., Wohlschlegel, J.A., Kennedy, S., Bei, Y., Conte, D., Jr., Pang, K.,
Brownell, D.R., Harding, S., Mitani, S., Ruvkun, G. et al. 2006. Functional
proteomics reveals the biochemical niche of C. elegans DCR-1 in multiple smallRNA-mediated pathways. Cell 124(2): 343-354.
Fagard, M., Boutet, S., Morel, J.B., Bellini, C., and Vaucheret, H. 2000. AGO , QDE-2,
and RDE-1 are related proteins required for post-transcriptional gene silencing in
plants, quelling in fungi, and RNA interference in animals. Proc Natl Acad Sci U
SA 97(21): 11650-11654.
Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B.,
and Bartel, D.P. 2005. The widespread impact of mammalian MicroRNAs on
mRNA repression and evolution. Science 310(5755): 1817-1821.
Forstemann, K., Horwich, M.D., Wee, L., Tomari, Y., and Zamore, P.D. 2007.
Drosophila microRNAs are sorted into functionally distinct argonaute complexes
after production by dicer-1. Cell 130(2): 287-297.
Galiana-Arnoux, D., Dostert, C., Schneemann, A., Hoffmann, J.A., and Imler, J.L. 2006.
Essential function in vivo for Dicer-2 in host defense against RNA viruses in
drosophila. Nat Immunol 7(6): 590-597.
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright,
A.J., and Schier, A.F. 2006. Zebrafish MiR-430 promotes deadenylation and
clearance of maternal mRNAs. Science 312(5770): 75-79.
Girard, A., Sachidanandam, R., Hannon, G.J., and Carmell, M.A. 2006. A germlinespecific class of small RNAs binds mammalian Piwi proteins. Nature 442(7099):
199-202.
Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res 32(Database issue):
D109-111.
Grimaud, C., Bantignies, F., Pal-Bhadra, M., Ghana, P., Bhadra, U., and Cavalli, G.
2006. RNAi components are required for nuclear clustering of Polycomb group
response elements. Cell 124(5): 957-971.
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P.
2007. MicroRNA targeting specificity in mammals: determinants beyond seed
pairing. Mol Cell 27(1): 91-105.
Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A.,
Ruvkun, G., and Mello, C.C. 2001. Genes and mechanisms related to RNA
interference regulate expression of the small temporal RNAs that control C.
elegans developmental timing. Cell 106(1): 23-34.
Grishok, A., Sinskey, J.L., and Sharp, P.A. 2005. Transcriptional silencing of a transgene
by RNAi in the soma of C. elegans. Genes Dev 19(6): 683-696.
Gunawardane, L.S., Saito, K., Nishida, K.M., Miyoshi, K., Kawamura, Y., Nagami, T.,
Siomi, H., and Siomi, M.C. 2007. A slicer-mediated mechanism for repeatassociated siRNA 5' end fobrmation in Drosophila. Science 315(5818): 1587-1590.
Hall, I.M., Noma, K., and Grewal, S.I.S. 2003. RNA interference machinery regulates
chromosome dynamics during mitosis and meiosis in fission yeast. P Natl Acad
Sci USA 100(1): 193-198.
Hall, I.M., Shankaranarayana, G.D., Noma, K., Ayoub, N., Cohen, A., and Grewal, S.I.
2002. Establishment and maintenance of a heterochromatin domain. Science
297(5590): 2232-2237.
Hamilton, A., Voinnet, O., Chappell, L., and Baulcombe, D. 2002. Two classes of short
interfering RNA in RNA silencing. Embo J21(17): 4671-4679.
Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y.,
Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary
microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901.
Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The
RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the
vertebrate limb. Proc Natl Acad Sci USA 102(31): 10898-10903.
Harris, K.S., Zhang, Z., McManus, M.T., Harfe, B.D., and Sun, X. 2006. Dicer function
is essential for lung epithelium morphogenesis. Proc Natl Acad Sci USA 103(7):
2208-2213.
Houbaviy, H.B., Dennis, L., Jaenisch, R., and Sharp, P.A. 2005. Characterization of a
highly variable eutherian microRNA gene. Rna 11(8): 1245-1257.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Humphreys, D.T., Westman, B.J., Martin, D.I., and Preiss, T. 2005. MicroRNAs control
translation initiation by inhibiting eukaryotic initiation factor 4E/cap and poly(A)
tail function. Proc Natl Acad Sci USA 102(47): 16961-16966.
Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D.
2001. A cellular function for the RNA-interference enzyme Dicer in the
maturation of the let-7 small temporal RNA. Science 293(5531): 834-838.
Hutvagner, G. and Zamore, P.D. 2002. A microRNA in a multiple-turnover RNAi
enzyme complex. Science 297(5589): 2056-2060.
Irvine, D.V., Zaratiegui, M., Tolia, N.H., Goto, D.B., Chitwood, D.H., Vaughn, M.W.,
Joshua-Tor, L., and Martienssen, R.A. 2006. Argonaute slicing is required for
heterochromatic silencing and spreading. Science 313(5790): 1134-1137.
Jakymiw, A., Lian, S., Eystathioy, T., Li, S., Satoh, M., Hamel, J.C., Fritzler, M.J., and
Chan, E.K. 2005. Disruption of GW bodies impairs mammalian RNA
interference. Nat Cell Biol 7(12): 1267-1274.
Jia, S., Noma, K., and Grewal, S.I. 2004. RNAi-independent heterochromatin nucleation
by the stress-activated ATF/CREB family proteins. Science 304(5679): 19711976.
Kafri, T., Ariel, M., Brandeis, M., Shemer, R., Urven, L., McCarrey, J., Cedar, H., and
Razin, A. 1992. Developmental pattern of gene-specific DNA methylation in the
mouse embryo and germ line. Genes Dev 6(5): 705-714.
Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T.,
Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem
cells are defective in differentiation and centromeric silencing. Genes Dev 19(4):
489-501.
Karres, J.S., Hilgers, V., Carrera, I., Treisman, J., and Cohen, S.M. 2007. The conserved
microRNA miR-8 tunes atrophin levels to prevent neurodegeneration in
Drosophila. Cell 131(1): 136-145.
Keller, G. 2005. Embryonic stem cell differentiation: emergence of a new era in biology
and medicine. Genes Dev 19(10): 1129-1155.
Kertesz, M., lovino, N., Unnerstall, U., Gaul, U., and Segal, E. 2007. The role of site
accessibility in microRNA target recognition. Nat Genet 39(10): 1278-1284.
Ketting, R.F., Fischer, S.E., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H.
2001. Dicer functions in RNA interference and in synthesis of small RNA
involved in developmental timing in C. elegans. Genes Dev 15(20): 2654-2659.
Ketting, R.F., Haverkamp, T.H., van Luenen, H.G., and Plasterk, R.H. 1999. Mut-7 of C.
elegans, required for transposon silencing and RNA interference, is a homolog of
Werner syndrome helicase and RNaseD. Cell 99(2): 133-141.
Ketting, R.F. and Plasterk, R.H. 2000. A genetic link between co-suppression and RNA
interference in C. elegans. Nature 404(6775): 296-298.
Khvorova, A., Reynolds, A., and Jayasena, S.D. 2003. Functional siRNAs and miRNAs
exhibit strand bias. Cell 115(2): 209-216.
Kiriakidou, M., Tan, G.S., Lamprinaki, S., De Planell-Saguer, M., Nelson, P.T., and
Mourelatos, Z. 2007. An mRNA m7G cap binding-like motif within human Ago2
represses translation. Cell 129(6): 1141-1151.
Kloosterman, W.P. and Plasterk, R.H. 2006. The diverse functions of microRNAs in
animal development and disease. Dev Cell 11(4): 441-450.
Kumar, M.S., Lu, J., Mercer, K.L., Golub, T.R., and Jacks, T. 2007. Impaired microRNA
processing enhances cellular transformation and tumorigenesis. Nat Genet 39(5):
673-677.
Kuramochi-Miyagawa, S., Kimura, T., Ijiri, T.W., Isobe, T., Asada, N., Fujita, Y., Ikawa,
M., Iwai, N., Okabe, M., Deng, W. et al. 2004. Mili, a mammalian member of
piwi family gene, is essential for spermatogenesis. Development 131(4): 839-849.
Lakatos, L., Szittya, G., Silhavy, D., and Burgyan, J. 2004. Molecular mechanism of
RNA silencing suppression mediated by p19 protein of tombusviruses. Embo J
23(4): 876-884.
Lander, E.S. Linton, L.M. Birren, B. Nusbaum, C. Zody, M.C. Baldwin, J. Devon, K.
Dewar, K. Doyle, M. FitzHugh, W. et al. 2001. Initial sequencing and analysis of
the human genome. Nature 409(6822): 860-921.
Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and
Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes.
Science 313(5785): 363-367.
Lee, R.C., Feinbaum, R.L., and Ambros, V. 1993. The C. elegans heterochronic gene lin4 encodes small RNAs with antisense complementarity to lin-14. Cell 75(5): 843854.
Lee, R.C., Hammell, C.M., and Ambros, V. 2006a. Interacting endogenous and
exogenous RNAi pathways in Caenorhabditis elegans. Rna 12(4): 589-597.
Lee, T.I., Jenner, R.G., Boyer, L.A., Guenther, M.G., Levine, S.S., Kumar, R.M.,
Chevalier, B., Johnstone, S.E., Cole, M.F., Isono, K. et al. 2006b. Control of
developmental regulators by Polycomb in human embryonic stem cells. Cell
125(2): 301-313.
Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O.,
Kim, S. et al. 2003. The nuclear RNase III Drosha initiates microRNA
processing. Nature 425(6956): 415-419.
Lee, Y., Kim, M., Han, J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. 2004.
MicroRNA genes are transcribed by RNA polymerase II. Embo J23(20): 40514060.
Lei, H., Oh, S.P., Okano, M., Juttermann, R., Goss, K.A., Jaenisch, R., and Li, E. 1996.
De novo DNA cytosine methyltransferase activities in mouse embryonic stem
cells. Development 122(10): 3195-3205.
Leung, A.K. and Sharp, P.A. 2007. microRNAs: a safeguard against turmoil? Cell
130(4): 581-585.
Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked
by adenosines, indicates that thousands of human genes are microRNA targets.
Cell 120(1): 15-20.
Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. 2003.
Prediction of mammalian microRNA targets. Cell 115(7): 787-798.
Li, C.F., Pontes, O., El-Shami, M., Henderson, I.R., Bernatavichute, Y.V., Chan, S.W.,
Lagrange, T., Pikaard, C.S., and Jacobsen, S.E. 2006. An ARGONAUTE4containing nuclear processing center colocalized with Cajal bodies in Arabidopsis
thaliana. Cell 126(1): 93-106.
Li, H., Li, W.X., and Ding, S.W. 2002. Induction and suppression of RNA silencing by
an animal virus. Science 296(5571): 1319-1321.
Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel,
D.P., Linsley, P.S., and Johnson, J.M. 2005. Microarray analysis shows that some
microRNAs downregulate large numbers of target mRNAs. Nature 433(7027):
769-773.
Lippman, Z., May, B., Yordan, C., Singer, T., and Martienssen, R. 2003. Distinct
Mechanisms Determine Transposon Inheritance and Methylation via Small
Interfering RNA and Histone Modification. PLoS Biol 1(3): E67.
Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J.,
Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. 2004. Argonaute2 is the
catalytic engine of mammalian RNAi. Science 305(5689): 1437-1441.
Liu, J., Rivas, F.V., Wohlschlegel, J., Yates, J.R., 3rd, Parker, R., and Hannon, G.J. 2005.
A role for the P-body component GW 182 in microRNA function. Nat Cell Biol
7(12): 1261-1266.
Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero,
A., Ebert, B.L., Mak, R.H., Ferrando, A.A. et al. 2005. MicroRNA expression
profiles classify human cancers. Nature 435(7043): 834-838.
Mak, W., Nesterova, T.B., de Napoles, M., Appanah, R., Yamanaka, S., Otte, A.P., and
Brockdorff, N. 2004. Reactivation of the paternal X chromosome in early mouse
embryos. Science 303(5658): 666-669.
Maroney, P.A., Yu, Y., Fisher, J., and Nilsen, T.W. 2006. Evidence that microRNAs are
associated with translating messenger RNAs in human cells. Nat Struct Mol Biol
13(12): 1102-1107.
Martinez, J. and Tuschl, T. 2004. RISC is a 5' phosphomonoester-producing RNA
endonuclease. Genes Dev 18(9): 975-980.
Masui, S., Nakatake, Y., Toyooka, Y., Shimosato, D., Yagi, R., Takahashi, K., Okochi,
H., Okuda, A., Matoba, R., Sharov, A.A. et al. 2007. Pluripotency governed by
Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nat Cell
Biol 9(6): 625-635.
Mathonnet, G., Fabian, M.R., Svitkin, Y.V., Parsyan, A., Huck, L., Murata, T., Biffo, S.,
Merrick, W.C., Darzynkiewicz, E., Pillai, R.S. et al. 2007. MicroRNA inhibition
of translation initiation in vitro by targeting the cap-binding complex eIF4F.
Science 317(5845): 1764-1767.
Meister, G., Landthaler, M., Peters, L., Chen, P.Y., Urlaub, H., Luhrmann, R., and
Tuschl, T. 2005. Identification of novel argonaute-associated proteins. Curr Biol
15(23): 2149-2155.
Mitsui, K., Tokuzawa, Y., Itoh, H., Segawa, K., Murakami, M., Takahashi, K.,
Maruyama, M., Maeda, M., and Yamanaka, S. 2003. The homeoprotein Nanog is
required for maintenance of pluripotency in mouse epiblast and ES cells. Cell
113(5): 631-642.
Molnar, A., Schwach, F., Studholme, D.J., Thuenemann, E.C., and Baulcombe, D.C.
2007. miRNAs control gene expression in the single-cell alga Chlamydomonas
reinhardtii. Nature 447(7148): 1126-1129.
Motamedi, M.R., Verdel, A., Colmenares, S.U., Gerber, S.A., Gygi, S.P., and Moazed, D.
2004. Two RNAi complexes, RITS and RDRC, physically interact and localize to
noncoding centromeric RNAs. Cell 119(6): 789-802.
Mourrain, P., Beclin, C., Elmayan, T., Feuerbach, F., Godon, C., Morel, J.B., Jouette, D.,
Lacombe, A.M., Nikic, S., Picault, N. et al. 2000. Arabidopsis SGS2 and SGS3
genes are required for posttranscriptional gene silencing and natural virus
resistance. Cell 101(5): 533-542.
Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005.
Characterization of Dicer-deficient murine embryonic stem cells. ProcNatl Acad
Sci USA 102(34): 12135-12140.
Murchison, E.P., Stein, P., Xuan, Z., Pan, H., Zhang, M.Q., Schultz, R.M., and Hannon,
G.J. 2007. Critical roles for Dicer in the female germline. Genes Dev 21(6): 682693.
Neilson, J.R., Zheng, G.X., Burge, C.B., and Sharp, P.A. 2007. Dynamic regulation of
miRNA expression in ordered stages of cellular development. Genes Dev 21(5):
578-589.
Nichols, J., Zevnik, B., Anastassiadis, K., Niwa, H., Klewe-Nebenius, D., Chambers, I.,
Scholer, H., and Smith, A. 1998. Formation of pluripotent stem cells in the
mammalian embryo depends on the POU transcription factor Oct4. Cell 95(3):
379-391.
Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B.
2007. Determinants of targeting by endogenous and exogenous microRNAs and
siRNAs. Rna 13(11): 1894-1910.
Niwa, H. 2007. How is pluripotency determined and maintained? Development 134(4):
635-646.
Niwa, H., Burdon, T., Chambers, I., and Smith, A. 1998. Self-renewal of pluripotent
embryonic stem cells is mediated via activation of STAT3. Genes Dev 12(13):
2048-2060.
Noma, K., Sugiyama, T., Cam, H., Verdel, A., Zofall, M., Jia, S., Moazed, D., and
Grewal, S.I. 2004. RITS acts in cis to promote RNA interference-mediated
transcriptional and post-transcriptional silencing. Nat Genet 36(11): 1174-1180.
Nottrott, S., Simard, M.J., and Richter, J.D. 2006. Human let-7a miRNA blocks protein
production on actively translating polyribosomes. Nat Struct Mol Biol 13(12):
1108-1114.
O'Donnell, K.A. and Boeke, J.D. 2007. Mighty Piwis defend the germline against
genome intruders. Cell 129(1): 37-44.
Obernosterer, G., Leuschner, P.J., Alenius, M., and Martinez, J. 2006. Posttranscriptional regulation of microRNA expression. Rna 12(7): 1161-1167.
Okano, M., Bell, D.W., Haber, D.A., and Li, E. 1999. DNA methyltransferases Dnmt3a
and Dnmt3b are essential for de novo methylation and mammalian development.
Cell 99(3): 247-257.
Olsen, P.H. and Ambros, V. 1999. The lin-4 regulatory RNA controls developmental
timing in Caenorhabditis elegans by blocking LIN-14 protein synthesis after the
initiation of translation. Dev Biol 216(2): 671-680.
Onodera, Y., Haag, J.R., Ream, T., Nunes, P.C., Pontes, 0., and Pikaard, C.S. 2005. Plant
nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent
heterochromatin formation. Cell 120(5): 613-622.
Pak, J. and Fire, A. 2007. Distinct populations of primary and secondary effectors during
RNAi in C. elegans. Science 315(5809): 241-244.
Pal-Bhadra, M., Bhadra, U., and Birchler, J.A. 2002. RNAi related mechanisms affect
both transcriptional and posttranscriptional transgene silencing in Drosophila. Mol
Cell 9(2): 315-327.
Pal-Bhadra, M., Leibovitch, B.A., Gandhi, S.G., Rao, M., Bhadra, U., Birchler, J.A., and
Elgin, S.C. 2004. Heterochromatic silencing and HP1 localization in Drosophila
are dependent on the RNAi machinery. Science 303(5658): 669-672.
Pera, M.F. and Trounson, A.O. 2004. Human embryonic stem cells: prospects for
development. Development 131(22): 5515-5525.
Petersen, C.P., Bordeleau, M.E., Pelletier, J., and Sharp, P.A. 2006. Short RNAs repress
translation after initiation in mammalian cells. Mol Cell 21(4): 533-542.
Pillai, R.S., Artus, C.G., and Filipowicz, W. 2004. Tethering of human Ago proteins to
mRNA mimics the miRNA-mediated repression of protein synthesis. Rna 10(10):
1518-1525.
Pillai, R.S., Bhattacharyya, S.N., Artus, C.G., Zoller, T., Cougot, N., Basyuk, E.,
Bertrand, E., and Filipowicz, W. 2005. Inhibition of translational initiation by
Let-7 MicroRNA in human cells. Science 309(5740): 1573-1576.
Rastan, S. and Robertson, E.J. 1985. X-chromosome deletions in embryo-derived (EK)
cell lines associated with lack of X-chromosome inactivation. JEmbryol Exp
Morphol 90: 379-388.
Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E.,
Horvitz, H.R., and Ruvkun, G. 2000. The 21-nucleotide let-7 RNA regulates
developmental timing in Caenorhabditis elegans. Nature 403(6772): 901-906.
Robert, V.J., Sijen, T., van Wolfswinkel, J., and Plasterk, R.H. 2005. Chromatin and
RNAi factors protect the C. elegans germline against repetitive sequences. Genes
Dev 19(7): 782-787.
Rodriguez, A., Griffiths-Jones, S., Ashurst, J.L., and Bradley, A. 2004. Identification of
mammalian microRNA host genes and transcription units. Genome Res 14(10A):
1902-1910.
Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van
Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A. et al. 2007. Requirement of
bic/microRNA-155 for normal immune function. Science 316(5824): 608-611.
Rougier, N., Bourc'his, D., Gomes, D.M., Niveleau, A., Plachot, M., Paldi, A., and
Viegas-Pequignot, E. 1998. Chromosome methylation patterns during mammalian
preimplantation development. Genes Dev 12(14): 2108-2113.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel,
D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs
and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207.
Saini, H.K., Griffiths-Jones, S., and Enright, A.J. 2007. Genomic analysis of human
microRNA transcripts. Proc Natl Acad Sci USA.
Scholthof, H.B. 2006. The Tombusvirus-encoded P19: from irrelevance to elegance. Nat
Rev Microbiol.
Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P.D. 2003.
Asymmetry in the assembly of the RNAi enzyme complex. Cell 115(2): 199-208.
Seggerson, K., Tang, L., and Moss, E.G. 2002. Two genetic circuits repress the
Caenorhabditis elegans heterochronic gene lin-28 after translation initiation. Dev
Biol 243(2): 215-225.
Sen, G.L. and Blau, H.M. 2005. Argonaute 2/RISC resides in sites of mammalian mRNA
decay known as cytoplasmic bodies. Nat Cell Biol 7(6): 633-636.
Sijen, T., Steiner, F.A., Thijssen, K.L., and Plasterk, R.H. 2007. Secondary siRNAs result
from unprimed RNA synthesis and form a distinct class. Science 315(5809): 244247.
Smith, A.G., Heath, J.K., Donaldson, D.D., Wong, G.G., Moreau, J., Stahl, M., and
Rogers, D. 1988. Inhibition of pluripotential embryonic stem cell differentiation
by purified polypeptides. Nature 336(6200): 688-690.
Song, E., Lee, S.K., Dykxhoorn, D.M., Novina, C., Zhang, D., Crawford, K., Cerny, J.,
Sharp, P.A., Lieberman, J., Manjunath, N. et al. 2003. Sustained small interfering
RNA-mediated human immunodeficiency virus type 1 inhibition in primary
macrophages. J Virol 77(13): 7174-7181.
Sontheimer, E.J. 2005. Assembly and function of RNA silencing complexes. Nat Rev
Mol Cell Biol 6(2): 127-138.
Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. 2005. Animal
MicroRNAs confer robustness to gene expression and have a significant impact
on 3'UTR evolution. Cell 123(6): 1133-1146.
Sugiyama, T., Cam, H., Verdel, A., Moazed, D., and Grewal, S.I. 2005. RNA-dependent
RNA polymerase is an essential component of a self-enforcing loop coupling
heterochromatin assembly to siRNA production. ProcNatl Acad Sci USA
102(1): 152-157.
Svoboda, P., Stein, P., Anger, M., Bernstein, E., Hannon, G.J., and Schultz, R.M. 2004.
RNAi and expression of retrotransposons MuERV-L and IAP in preimplantation
mouse embryos. Dev Biol 269(1): 276-285.
Tang, F., Kaneda, M., O'Carroll, D., Hajkova, P., Barton, S.C., Sun, Y.A., Lee, C.,
Tarakhovsky, A., Lao, K., and Surani, M.A. 2007. Maternal microRNAs are
essential for mouse zygotic development. Genes Dev 21(6): 644-648.
Thai, T.H., Calado, D.P., Casola, S., Ansel, K.M., Xiao, C., Xue, Y., Murphy, A.,
Frendewey, D., Valenzuela, D., Kutok, J.L. et al. 2007. Regulation of the
germinal center response by microRNA-155. Science 316(5824): 604-608.
Thomson, J.M., Newman, M., Parker, J.S., Morin-Kensicki, E.M., Wright, T., and
Hammond, S.M. 2006. Extensive post-transcriptional regulation of microRNAs
and its implications for cancer. Genes Dev 20(16): 2202-2207.
Thurman, R.E., Day, N., Noble, W.S., and Stamatoyannopoulos, J.A. 2007. Identification
of higher-order functional domains in the human ENCODE regions. Genome Res
17(6): 917-927.
Tomari, Y., Du, T., and Zamore, P.D. 2007. Sorting of Drosophila small silencing RNAs.
Cell 130(2): 299-308.
Vagin, V.V., Sigova, A., Li, C., Seitz, H., Gvozdev, V., and Zamore, P.D. 2006. A
distinct small RNA pathway silences selfish genetic elements in the germline.
Science 313(5785): 320-324.
van Rij, R.P., Saleh, M.C., Berry, B., Foo, C., Houk, A., Antoniewski, C., and Andino, R.
2006. The RNA silencing endonuclease Argonaute 2 mediates specific antiviral
immunity in Drosophila melanogaster. Genes Dev 20(21): 2985-2995.
Vargason, J.M., Szittya, G., Burgyan, J., and Tanaka Hall, T.M. 2003. Size selective
recognition of siRNA by an RNA silencing suppressor. Cell 115(7): 799-811.
Vastenhouw, N.L., Fischer, S.E., Robert, V.J., Thijssen, K.L., Fraser, A.G., Kamath,
R.S., Ahringer, J., and Plasterk, R.H. 2003. A genome-wide screen identifies 27
genes involved in transposon silencing in C. elegans. Curr Biol 13(15): 13111316.
Vasudevan, S. and Steitz, J.A. 2007. AU-rich-element-mediated upregulation of
translation by FXR1 and Argonaute 2. Cell 128(6): 1105-1118.
Verdel, A., Jia, S., Gerber, S., Sugiyama, T., Gygi, S., Grewal, S.I., and Moazed, D.
2004. RNAi-mediated targeting of heterochromatin by the RITS complex. Science
303(5658): 672-676.
Voinnet, 0. 2005. Induction and suppression of RNA silencing: insights from viral
infections. Nat Rev Genet 6(3): 206-220.
Volpe, T.A., Kidner, C., Hall, I.M., Teng, G., Grewal, S.I., and Martienssen, R.A. 2002.
Regulation of heterochromatic silencing and histone H3 lysine-9 methylation by
RNAi. Science 297(5588): 1833-1837.
Wakiyama, M., Takimoto, K., Ohara, 0., and Yokoyama, S. 2007. Let-7 microRNAmediated mRNA deadenylation and translational repression in a mammalian cellfree system. Genes Dev 21(15): 1857-1862.
Wang, B., Love, T.M., Call, M.E., Doench, J.G., and Novina, C.D. 2006a. Recapitulation
of short RNA-directed translational gene silencing in vitro. Mol Cell 22(4): 553560.
Wang, X.H., Aliyari, R., Li, W.X., Li, H.W., Kim, K., Carthew, R., Atkinson, P., and
Ding, S.W. 2006b. RNA interference directs innate immunity against viruses in
adult Drosophila. Science 312(5772): 452-454.
Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P.,
Agarwala, R., Ainscough, R., Alexandersson, M., An, P. et al. 2002. Initial
sequencing and comparative analysis of the mouse genome. Nature 420(6915):
520-562.
Wightman, B., Ha, I., and Ruvkun, G. 1993. Posttranscriptional regulation of the
heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C.
elegans. Cell 75(5): 855-862.
Williams, R.L., Hilton, D.J., Pease, S., Willson, T.A., Stewart, C.L., Gearing, D.P.,
Wagner, E.F., Metcalf, D., Nicola, N.A., and Gough, N.M. 1988. Myeloid
leukaemia inhibitory factor maintains the developmental potential of embryonic
stem cells. Nature 336(6200): 684-687.
Wu, H., Neilson, J.R., Kumar, P., Manocha, M., Shankar, P., Sharp, P.A., and Manjunath,
N. 2007. miRNA Profiling of Naive, Effector and Memory CD8 T Cells. PLoS
ONE 2(10): e1020.
Xiao, C., Calado, D.P., Galler, G., Thai, T.H., Patterson, H.C., Wang, J., Rajewsky, N.,
Bender, T.P., and Rajewsky, K. 2007. MiR-150 controls B cell differentiation by
targeting the transcription factor c-Myb. Cell 131(1): 146-159.
Xie, Q. and Guo, H.S. 2006. Systemic antiviral silencing in plants. Virus Res 118(1-2): 16.
Xie, Z., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D.,
Jacobsen, S.E., and Carrington, J.C. 2004. Genetic and functional diversification
of small RNA pathways in plants. PLoS Biol 2(5): E 104.
Xu, S., Witmer, P.D., Lumayag, S., Kovacs, B., and Valle, D. 2007. MicroRNA
(miRNA) transcriptome of mouse retina and identification of a sensory organspecific miRNA cluster. JBiol Chem 282(34): 25053-25066.
Yang, N. and Kazazian, H.H., Jr. 2006. L retrotransposition is suppressed by
endogenously encoded small interfering RNAs in human cultured cells. Nat Struct
Mol Biol 13(9): 763-771.
Ye, K., Malinina, L., and Patel, D.J. 2003. Recognition of small interfering RNA by a
viral suppressor of RNA silencing. Nature 426(6968): 874-878.
Yekta, S., Shih, I.H., and Bartel, D.P. 2004. MicroRNA-directed cleavage of HOXB8
mRNA. Science 304(5670): 594-596.
Yi, R., O'Carroll, D., Pasolli, H.A., Zhang, Z., Dietrich, F.S., Tarakhovsky, A., and
Fuchs, E. 2006. Morphogenesis in skin is governed by discrete sets of
differentially expressed microRNAs. Nat Genet 38(3): 356-362.
Yin, H. and Lin, H. 2007. An epigenetic activation role of Piwi and a Piwi-associated
piRNA in Drosophila melanogaster. Nature.
Ying, Q.L., Nichols, J., Chambers, I., and Smith, A. 2003. BMP induction of Id proteins
suppresses differentiation and sustains embryonic stem cell self-renewal in
collaboration with STAT3. Cell 115(3): 281-292.
Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA
precursors by the nuclear processing enzyme Drosha. Embo J 24(1): 138-148.
Zhang, X., Henderson, I.R., Lu, C., Green, P.J., and Jacobsen, S.E. 2007. Role of RNA
polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci US A
104(11): 4536-4541.
Zhang, X., Yuan, Y.R., Pei, Y., Lin, S.S., Tuschl, T., Patel, D.J., and Chua, N.H. 2006.
Cucumber mosaic virus-encoded 2b suppressor inhibits Arabidopsis Argonaute 1
cleavage activity to counter plant defense. Genes Dev 20(23): 3255-3268.
Zhao, T., Li, G., Mi, S., Li, S., Hannon, G.J., Wang, X.J., and Qi, Y. 2007a. A complex
system of small RNAs in the unicellular green alga Chlamydomonas reinhardtii.
Genes Dev 21(10): 1190-1203.
Zhao, Y., Ransom, J.F., Li, A., Vedantham, V., von Drehle, M., Muth, A.N., Tsuchihashi,
T., McManus, M.T., Schwartz, R.J., and Srivastava, D. 2007b. Dysregulation of
cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2.
Cell 129(2): 303-317.
Zilberman, D., Cao, X., and Jacobsen, S.E. 2003. ARGONAUTE4 control of locusspecific siRNA accumulation and DNA and histone methylation. Science
299(5607): 716-719.
Chapter 2
Characterization of the short RNAs bound by the P 19
suppressor of RNA silencing in mouse embryonic stem
cells
This chapter has been presented in the context of its contemporary science, and originally
appeared in RNA 12:2092-2102.
Abstract
Studies of mammalian RNA interference (RNAi) have focused largely on the
actions of microRNAs; however, in other organisms, endogenous short-interfering RNAs
(siRNAs) are involved in silencing processes. To date, similar molecules have been
difficult to characterize in mammalian cells. P19 is a plant suppressor of RNA silencing
that binds with high affinity to siRNAs. Here, the short RNAs bound by P19 in mouse
embryonic stem (ES) cells have been characterized. We show that P19 selectively
immunoprecipitates endogenous short RNAs from ES cells. Cloning of
immunoprecipitated RNA reveals a strong selection for short RNAs that are exact
matches to ribosomal RNA (rRNA), with particular short rRNA species highly enriched
in P19 immunoprecipitates. Complementary strands to the enriched rRNAs were not
cloned, surprising because P19 was previously though to bind only siRNAs. We show
that P19 binds tightly to a non-canonical dsRNA substrate comprised of a short RNA
annealed to a much longer partner, such that double-stranded region between the two is
19 base pairs long. Binding to similar endogenous species might explain the association
of P19 with short rRNAs in ES cells. Finally, we show that the P19-enriched rRNAs are
not involved in canonical RNAi, as they exist in the absence of Dicer and do not function
as post-transcriptional gene silencers. Our results support the previous observation that
endogenous siRNAs are not abundant molecules in mouse ES cells.
Introduction
Short RNAs play a central role in eukaryotic biology by regulating gene
expression though a process called RNA interference (Novina and Sharp 2004). cDNA
libraries sampling the short RNA population in mammalian cells reveal predominantly
the products of a conserved class of non-coding RNA genes called microRNAs
(miRNAs). Mature miRNAs are processed from longer RNA precursors through
sequential cleavage by the RNAse III enzymes Drosha and Dicer, generating -22
nucleotide long species with defined 5' and 3' ends. It is thought that mammalian
miRNAs primarily exert their influence on gene expression post-transcriptionally by
binding with imperfect complementarity to the 3' UTR of their target mRNAs. An
accumulating body of evidence suggests miRNAs are key regulators of both
developmental transitions and cell-type specification (Ambros 2004; Bartel 2004;
Alvarez-Garcia and Miska 2005; Farh et al. 2005; Stark et al. 2005). miRNAs are also
misregulated in human cancers, potentially indicating a causal role in tumorigenesis (He
et al. 2005; Lu et al. 2005; Voorhoeve et al. 2006).
Almost all of the known functions of endogenous RNAi in mammals are attributed to
miRNAs; in other eukaryotes this is not the case. The cloning of short RNAs from S. pombe,
Arabidopsis, C. elegans, and Drosophilahas identified endogenous short-interfering RNAs
(siRNAs) that are encoded by high copy sequences in the genome, such as transposons and
retrotransposons. Except in Drosophila,these repeat-associated siRNAs (rasiRNAs) are thought to
be processed by Dicer from long, repeat-derived double-stranded RNA (dsRNA). In S. pombe,
Arabidopsis, and C. elegans, this dsRNA-precursor is generated in part by the action of an RNAdependent RNA polymerase, while in DrosophilarasiRNA biogenesis is less clear. rasiRNAs map
predominantly to heterochromatic regions of the genome, and are proposed to be the guides for an
siRNA-mediated transcriptional repression complex that nucleates heterochromatin (Ambros et al.
2003; Baulcombe 2004; Lippman and Martienssen 2004; Sontheimer and Carthew 2005; Lee et al.
2006; Vagin et al. 2006).
Mouse embryonic stem (ES) cells are pluripotent cells derived from the inner cell
mass of the blastocyst in the midst of the epigenetic reprogramming that occurs during
early development (Jaenisch 1997; Li 2002). ES cells are capable of executing wellstudied epigenetic silencing processes, such as X-inactivation and silencing of Moloney
leukemia viruses (Stewart et al. 1982; Cherry et al. 2000; Plath et al. 2002). These
processes are similar to examples of rasiRNA-mediated silencing in other organisms.
However, previous cloning efforts from ES cells did not identify repeat-associated
siRNAs, potentially because they are too low in abundance compared to miRNAs and
other short RNAs (Houbaviy et al. 2003). Developing methods to facilitate the
identification of low abundance RNAi-specific short RNAs will therefore be necessary to
more completely understand the function of RNAi in mammals.
In this work, we express epitope-tagged versions of the P19 suppressor of RNAi
silencing in ES cells in an attempt to identify endogenous siRNAs. The P19 protein is
expressed by the Tombusvirus as a counter-defense against the plant RNAi-pathway that
degrades RNA viruses, and functions by specifically binding and sequestering siRNAs
that would otherwise mediate viral RNA destruction (Scholthof 2006). Biochemical and
structural studies show that P19 binds with high affinity and specificity to siRNA-like
molecules that are 5' phosphorylated with double stranded RNA segments 19 base pairs
long. This affinity dramatically decreases if the double stranded RNA segment is either
shorter or longer than 19 base pairs. P19 also has no measurable affinity for singlestranded RNA, or double-stranded DNA (Vargason et al. 2003; Ye et al. 2003; Lakatos et
al. 2004). P19 transgenic plants display developmental defects associated with loss of
miRNA function, and immunoprecipitations (IPs) of P19 from these plants enrich for
miRNA duplexes, showing that P19 inhibits not only virally-induced gene silencing but
also other endogenous RNAi processes (Silhavy et al. 2002; Chapman et al. 2004;
Dunoyer et al. 2004). Together, these observations indicate P19 may be a useful tool to
identify endogenous siRNAs present in mammalian cells. Indeed, P19 has previously
been shown to inhibit RNAi in human cells, suggesting that its functions carry over to
mammals (Dunoyer et al. 2004; Lecellier et al. 2005).
Results
Epitope-tagged P19 binds short RNAs from ES cells.
Putative rasiRNAs involved in heterochromatin formation might be localized
specifically to the nucleus. For this reason, we constructed two vectors for mammalian
P19 expression; both were tagged at the C-terminus with a V5-6xHis epitope, and one
contained two tandem SV40 nuclear localization sequences, increasing its nuclear
concentration (hereafter referred to as P19V5 and P19NLS, respectively; Figure lA).
Immunofluorescence of transiently transfected ES cells shows P19V5 is present in both
the cytoplasm and nucleus, while P 19NLS is localized almost exclusively to the nucleus
(Figure l B).
To test if P19 bound endogenous short RNAs from ES cells, we extracted the
nucleic acids that co-immunoprecipitated with our epitope-tagged constructs and
radioactively labeled the 3' end of the bound RNA with 5'32p cytidine 3',5'bis(phosphate). To recover both cytoplasmic and nuclear siRNA molecules, P19V5 was
immunoprecipitated from whole cell extracts (WCE). Western blots of the WCE
detected both cytoplasmic and nuclear markers, indicating the presence of proteins from
both sub-cellular compartments (Figure IC). Comparing the P19V5 bound RNA to the
RNA from the WCE supernatant shows a strong enrichment for short RNAs -20
nucleotides long in the P19V5 immunoprecipitate (Figure ID, lane 7 vs. 8). This
enrichment is P19V5-dependent, as there is no detectable RNA immunoprecipitated from
cells transfected with GFP (Figure lD, lane 2).
To better enrich for putative low-abundance nuclear siRNAs, we performed an
immunoprecipitation of P19NLS from a nuclear extract made under non-denaturing
conditions. Western blots comparing the intensities of GAPDH and Cyclin T1 between
the cytoplasmic (CE) and nuclear (NE) fractions of the extract show a decrease in
GAPDH and an increase in Cyclin Tl in the nuclear as compared to the cytoplasmic
extract, indicating a -3-fold nuclear enrichment (Figure IC). Immunoprecipitation of
P 19NLS from the NE enriches for short RNAs identical in length to those
immunoprecipitated from the CE of the P 19NLS-transfected cells, as well as the WCE of
P19V5-transfected cells (Figure ID lanes 4, 6, and 8).
We cloned the short RNAs bound by the P19 constructs in ES cell extracts using a
procedure that selects for 18-26 nucleotide long RNAs that have 5' phosphates and 2' or
3' hydroxyls ((Lagos-Quintana et al. 2001; Lau et al. 2001); J.R. Neilson and P.A.S,
54
manuscript in preparation). As controls, we also cloned short RNA from the supernatants
of the P19-transfected cell extracts. In each case, roughly 300 independent clones were
analyzed (Table 1). Sequences were initially annotated as known non-coding RNAs
(ncRNAs) if they had an exact match to any annotated non-protein coding RNA in the
Rfam and NONCODE RNA databases, including: rRNAs, tRNAs, snRNAs, snoRNAs,
and miRNAs, as well as ncRNAs involved in imprinting and other processes (GriffithsJones et al. 2003; Griffiths-Jones et al. 2005). Those clones that had no matches in either
database were deemed novel, and analyzed against the mouse genome using the UCSC
genome browser to identify overlapping genomic features (Kent et al. 2002; Karolchik et
al. 2003). Sequences with no exact match to the genome were re-annotated as known
ncRNAs or novel if they had at least 90% identity with a sequence in either set. Tables I
and 2 are summaries of the cloning data and short RNA annotation.
The majority of all sequences cloned were known ncRNAs, regardless of the
RNA starting material (Tables 1 and 2). Short RNAs immunoprecipitated by P19 were
on average 2 nucleotides shorter than those cloned from control supernatants (22 vs. 20
nucleotides for the WCE supernatant vs. IP and 23 vs 21 nucleotides for the NE
supernatant vs. IP, p <0.0001 for both; Table 1), consistent with previous observations
that P19 has a decreased affinity for double-stranded RNA both shorter and longer than
19 base pairs (Vargason et al. 2003; Ye et al. 2003). Moreover, the GC content of the
P19-bound RNA was significantly higher than control RNA (53% vs 75% for the WCE
sup vs. IP, and 49% vs. 67% for the NE sup vs. IP; p <0.0001 for both; Table 1). This
large average difference in GC-content was observed for both known ncRNAs and novel
RNAs, indicating an overall preference of P19 for GC-rich RNA (Table 1).
55
A proportion of the novel sequences, between 2-3% of all sequences cloned from
each population, were exact matches to known repetitive elements catalogued by
Repeatmasker (Table 2, Supplementary Data). On average, these short RNAs were
slightly shorter than the average miRNA cloned in this study (20.9 vs. 22.3 nucleotides
for repeat-associated RNAs vs. miRNAs; p<0.0001). Similar repeat-derived short RNAs
were not identified in previous ES cell cloning efforts, most likely due to limitations in
the depth of short RNAs sequenced (Houbaviy et al. 2003).
Strikingly, P19 immunoprecipitation selected against miRNAs and enriched for
short RNAs matching mature ribosomal RNA (rRNA) species (Table 2, Supplementary
Data). 48.9% and 57.9% of the short RNAs cloned from the WCE and NE supernatants
mapped to annotated miRNA hairpins (miRNAs and miRNA*s), compared to only 7.5%
and 17.3% of those cloned from the immunoprecipitates. Conversely, 69.4% and 51.0%
of all sequences cloned from the WCE and NE immunoprecipitates were short rRNAs,
compared to 29.4% and 22.1% of sequences cloned from the WCE and NE supernatants,
respectively (Table 2).
Interestingly, the immunoprecipitates also lacked the natural complementary
sequence of the miRNA duplex, the miRNA* strand (Table 2). The ratio of miRNA to
miRNA* strands in the supernatants was roughly 24:1, and about the same level in P19
immunoprecipitates, indicating there was no selection for miRNA duplexes by P19.
This was surprising given that P19 binds tightly to double-stranded RNA with almost no
affinity for single stranded RNA substrates, and associates with miRNA duplexes in
plants (Silhavy et al. 2002; Vargason et al. 2003; Ye et al. 2003; Chapman et al. 2004;
Dunoyer et al. 2004; Lakatos et al. 2004).
The specific miRNAs cloned from the immunoprecipitates were similar to those
cloned from the supernatants, perhaps consistent with the majority of P19-associated
miRNAs being derived from a non-specific background. The same group of 20 miRNAs
comprised 79% and 83% of all miRNAs cloned from WCE IP and supernatant,
respectively, and comprised 78% and 77% of all miRNAs cloned from the NE IP and
supernatant, respectively. There was no significant difference in the GC-content or
length of the miRNAs cloned from the immunoprecipitates compared to those cloned
from the supernatants (not shown). If indeed the miRNAs present in P19
immunoprecipitates were due to background contamination, then a comparison of the
number cloned from each population would suggest that P19 immunoprecipitation gave a
7-fold enrichment of bound short RNAs versus miRNAs from the supernatant of the
WCE, and a 3-fold enrichment of bound short RNAs versus miRNAs from the
supernatant of the NE. These are minimal estimates as some miRNAs might be
preferentially bound to P19.
All but one of the short rRNAs cloned from the supernatants and the
immunoprecipitations were in the sense orientation relative to the full-length transcribed
45S pre-rRNA and 5S RNA, and 97% of these sequences mapped to the mature 18S,
5.8S, or 28S rRNA. Figure 2A shows a representation of all the short rRNAs cloned,
aligned to bases 3,900 to 13,000 of the 13,404 base 45S pre-ribosomal RNA. Only 13
out of 564 total cloned short rRNAs did not have an exact match in this region; 7 of these
were matches to the 5S rRNA and the remaining 6 mapped to regions of the 45S
precursor not included in Figure 2A. Particular classes of short rRNAs were highly
enriched in the P19-bound RNA compared to unbound controls (marked with an * in
57
Figure 2A). Mapping these RNAs to established rRNA secondary structure maps shows
the enriched short rRNAs are not necessarily from regions of the ribosome that resemble
canonical Drosha or Dicer substrates (not shown; (Cannone et al. 2002)).
Because P19 binds almost exclusively to double-stranded RNA, it was surprising
that no short RNAs with exact complementarity P19-enriched short rRNAs were cloned.
Speculating that these RNAs could possibly form partial duplexes with other short RNAs,
we investigated whether the most abundantly cloned rRNAs in the immunoprecipitations
(cloned more than 5x) could form bulged duplexes with other abundantly cloned species.
We limited our initial analysis to the most abundantly cloned short RNAs, reasoning that
abundantly cloned short RNAs could have been likely complements in binding to P19.
The short rRNAs numbered 1 through 12 in Figure 2A were folded against each other in
all possible permutations using the Mfold server (Zuker 2003). Of the 140 predicted
structures, we found 14 that when annealed had a double-stranded region of the right size
to potentially bind P19 with high affinity, between 18 and 20 base pairs (Vargason et al.
2003). In all cases, there was at least one bulged region in the RNA duplex predicted by
Mfold (Figure 2B for selected examples); however, it has been previously observed that
P19 associates with bulged duplexes in plants (Chapman et al. 2004). This partial
complementarity between particular abundantly cloned species provides a possible
explanation for the enrichment of some short rRNAs in the P19 IP, but not all. Notably,
although short rRNA #6 was the second most abundantly cloned short rRNA in the P19
immunoprecipitations, it was not predicted to form any duplexed structures with other
abundantly cloned short rRNAs.
58
P19 binds to regions of short dsRNA.
It is possible that P19 also binds substrates with dsRNA regions that are siRNAlike in length, but contain one strand that is significantly longer than the standard siRNA
length of 21 nucleotides. Biochemical analysis shows that RNA duplexes without 3'
overhangs bind with slightly higher affinity to P19 than their 3' overhang-containing
counterparts, indicating that the length of the RNA duplex and not the overhang is the
key factor in determining P19 binding (Vargason et al. 2003). This hypothesis could
explain why some molecules immunoprecipitated by P19 form no canonical siRNA-like
duplexes with other cloned RNAs; short RNAs complementary to longer RNAs could be
bound by P19, and the iterative size selection in the short RNA cloning protocol would
exclude the longer RNA binding partner from the final set of cloned sequences. The high
GC content of the short RNAs bound by P19 is consistent with this hypothesis (Table 1),
allowing a greater promiscuity for stable binding to longer RNA partners. Indeed, in
addition to enriching for short RNAs, P19 immunoprecipitation reproducibly pulls down
larger RNA species that are the same size as abundant tRNAs, snRNAs, and rRNAs
(Figure iD, 4A). While no immunoprecipitated RNAs have exact complements in the set
of known tRNAs, snRNAs, and rRNAs, gapped alignments allowing G:U pairing show
that P 19-enriched RNAs could form duplexes with several ncRNA species, potentially
explaining the enrichment for ncRNAs as well as short rRNAs in the
immunoprecipitation (not shown).
We designed three different RNA duplexes to test the affinity of P19V5 for short
RNAs bound to longer RNAs (Figure 3A). One strand of each duplex was a frequently
cloned short rRNA, while the other varied such that its complement was either: in the 5'
region of a 32 nucleotide long RNA species (5' complementary), in the 3' region of a 35
nucleotide long RNA species (3'complementary), or a 21 nucleotide long RNA that
formed a canonical 19 base pair siRNA duplex (siRNA). The length of the doublestranded region in the 5' complementary RNA was 19 base pairs, and 21 base pairs in the
3' complementary RNA. Since P19 preferentially binds 19 base pair duplexes, the latter
may be expected to have a lower affinity than the former. In all cases, the dsRNA
species had 5' phosphate and 3' hydroxyl groups, mimicking endogenous siRNA
structure. Binding assays were performed by incubating radiolabelled RNA substrate in
increasing concentration with P19V5-Protein G agarose bead complexes (Figure 3).
Controls experiments showed that all input RNA was double-stranded and of the
appropriate dilution (not shown). Under the conditions assayed, there was negligible
non-specific association of RNA with beads, and the P19V5-bead complexes had no
affinity for single-stranded RNA (Figure 3B). The apparent dissociation constant, Kapp,
of P19V5 for an siRNA duplex was determined to be 2.8 t 0.5 nM, nearly identical to
previously published binding studies using a P19 C-terminally tagged with GST (Figure
3C, (Lakatos et al. 2004)). P19V5 bound the 5' and 3' complementary RNAs with a Kapp
of 7.4 ± 0.7 nM and 27 ± 4 nM, respectively, supporting our hypothesis that P19 binds
regions of dsRNA -19 base pairs long and not only siRNAs (Figure 3C).
Unknown function of P19 bound short rRNAs.
We next tried to determine the function of the short rRNAs enriched in the P19
immunoprecipitation. Short RNAs with exact matches to abundant tRNAs, rRNAs, and
snRNAs have been frequently dismissed as non-functional degradation products of
abundant ncRNAs, particularly in mammalian cloning efforts. While this may be true, it
is worth noting that these non-coding RNAs are derived from highly repetitive elements
in mammalian genomes, and in this respect are similar to other known targets of RNAimediated transcriptional silencing from which rasiRNAs have been cloned (Lander et al.
2001; Waterston et al. 2002).
Studies from S. pombe and Arabidopsis have implicated short rRNAs in the
formation of heterochromatin at rDNA repeats, setting a precedent for short rRNA
functionality (Xie et al. 2004; Cam et al. 2005). One common feature of these short
rRNAs involved in chromatin silencing is that they are both sense and antisense to the
full-length, transcribed rRNA, supporting the idea that they are generated from
processing of a longer dsRNA precursor (Xie et al. 2004; Cam et al. 2005). In contrast,
all but one of the short rRNAs cloned in this study were in the sense orientation relative
to transcription of the mature rRNAs, suggesting that they arose either by breakdown or
processing of mature rRNA sequence. However, this observation does not exclude a
possible role for short rRNAs in the chromatin silencing of ES cell rDNA repeats.
Recently, ncRNAs mapping directly upstream of the rDNA transcriptional start
site have been shown to direct the nucleolar remodeling complex, NoRC, to
transcriptionally silence rDNA repeats in mouse 3T3 cells (Mayer et al. 2006). No short
RNAs cloned in this study map to this region of the rDNA repeat, consistent with the
authors' observation that the NoRC associated RNAs are 150-300 nucleotides long.
It is also possible that the short rRNAs immunoprecipitated by P19 are involved
in post-transcriptional gene silencing (PTGS). If this were true, one might expect
complementary sequences to be present in exons or 3' UTRs of known genes. BLAST
analysis of the P19-enriched rRNAs against the mouse genome shows that exactly
complementary sequences are not present in known mRNAs, suggesting that these
enriched rRNAs are not involved in siRNA-like PTGS. In support of this, no effect was
seen on expression of a luciferase reporter when two perfectly complementary binding
sites to selected short rRNAs were inserted into its 3' UTR (Supplementary Figure IA).
Finally, we examined whether P 19V5 associates with short RNAs in an ES cell
line lacking Dicer RNAse III activity. These ES cells were derived from mice
homozygous for a conditional allele of Dicer in which the key catalytic residues in the
second RNAse III domain are floxed (Harfe et al. 2005). Consistent with previously
published results, excision of the floxed region via transient transfection of Cre
recombinase results in viable ES cells that do not express miRNAs ((Murchison et al.
2005); Figure 4B; J.M.C. and P.A.S. unpublished). P19V5 immunoprecipitates short
RNAs from Dicer null ES cells as efficiently as from Dicer containing cells, shown by 3'
end labeling of immunoprecipitated RNA (Figure 4A). Similarly, short RNA northern
blots probing P19 immunoprecipitates for enriched short rRNA #3 show the same
enrichment in the presence and absence of Dicer (Figure 4B). Reprobing of the same
northern blot shows miR295 is absent from P 19-immmunoprecpitates but present in the
supernatants from Dicer positive cells, confirming the cloning data that shows a selection
against miRNAs in P 19 immunoprecipitates (Figure 4B). Northern blots to total RNA
preparations show that RNA#3 is present in ES cells at the same level in the presence and
absence of P 19V5, and is not induced non-specifically by transfection (Supplementary
Figure IB). A survey of RNA from various mouse tissues indicates that RNA #3 is
detectable in several samples, but at levels much lower than those observed in ES cells
(Supplementary Figure 1C). Together, these results indicate that enriched rRNAs are
generated independently of Dicer and are not byproducts of P19 expression or
transfection.
Discussion
We have shown that V5-tagged P19 associates with endogenous short RNAs in
mouse ES cells. Cloning of these RNAs from whole-cell and nuclear extracts revealed
that P19 associates predominantly with short, GC-rich RNAs that are exact matches to
portions of the mature 28S and 18S rRNAs. The function of these short rRNAs is not
clear. We show that they exist in the absence of Dicer and do not function in endogenous
PTGS, suggesting that they are not in the canonical RNAi pathway. In other organisms,
short rRNAs have been implicated in the chromatin silencing of rDNA repeats (Xie et al.
2004; Cam et al. 2005); whether the same is true of the short rRNAs identified in this
work is not clear. Intriguingly, P19V5 did not immunoprecipitate an easily detectable
pool of short RNAs from 293T cells as it did in ES cells, suggesting that similar short
rRNAs are not produced to the same extent in this human embryonic kidney cell line, or
that they are inaccessible to P19 (Supplementary Figure 2A). In plants, over-expression
of P19 leads to a general accumulation of miRNA* strands, and P19 immunoprecipitates
contain both miRNA and miRNA* strands, as detected by short RNA northern blots to
total and P19 immunoprecipitated RNA, respectively (Chapman et al. 2004; Dunoyer et
al. 2004). In contrast, P19 expression in mouse ES cells did not lead to accumulation of
miRNA* strands in cell extracts, nor were miRNA* strands selectively detected in P19
immunoprecipitates, suggesting that P19 is unable to access miRNA duplexes as
efficiently in ES cells as in plants.
Because of our experimental design, we obtained cloning data from a nuclearenriched extract. We observed few significant differences between the short RNA
profiles of nuclear compared to cytoplasmic extracts, suggesting that novel short RNAs
are not abundant in the nucleus of ES cells. A surprisingly high proportion of short
RNAs in the nuclear extract were miRNAs given that mature miRNAs are predominantly
cytoplasmic (Houbaviy et al. 2005). A simple explanation for these results is that the
profile of short RNAs in the nucleus is similar to that in the cytoplasm. It is also possible
that most of the short RNAs present in the nuclear extract were from cytoplasmic
contamination. Controls testing the extent of fractionation in nuclear extracts suggest a
three-fold enrichment for components of this compartment. This level of enrichment
might not be sufficient for the identification of short, nuclear-localized RNAs if they are
low in abundance compared to cytoplasmic short RNAs. Alternatively, nuclear siRNAs
might be chemically modified such that they are not identifiable by the cloning methods
used here, which require a 5' phosphate and 2' or 3' hydroxyl groups.
Notably, between 2 to 3% of the short RNAs cloned in this study overlapped with
repetitive elements catalogued by Repeatmasker. It is unlikely that these RNAs exist
predominantly as double-stranded siRNAs in cell extracts as they were not enriched in
P19 immunoprecipitates. Their function is unclear, but may they may be analogous to
repeat-associated siRNAs identified in S.pombe, C.elegans, Drosophila,and
Arabidopsis. Alternatively, because such a large proportion of the mouse genome is
annotated as repetitive, overlap of specific short RNAs with repeats may be coincidental
and not indicative of novel function (Lander et al. 2001; Waterston et al. 2002).
Recently, repeat-associated short RNAs were identified from mouse oocytes,
where they appear to be as abundant as miRNAs. Also, reporter constructs with
complementary 3' UTR sequences were destabilized, suggesting that at this
developmental stage, repeat-associated short RNAs are involved at least in the PTGS of
complementary transcripts (Watanabe et al. 2006). The short RNAs overlapping
repetitive elements identified in this study are probably at a far lower abundance, being
about 25 fold less abundant than miRNAs. This difference suggests a difference in
activity of these sequences in ES cells compared to oocytes.
Previous biochemical studies have shown that P19 dimers bind tightly to siRNA
duplexes (Vargason et al. 2003; Ye et al. 2003; Lakatos et al. 2004). Here, we show that
compared to an siRNA, P19 binds with roughly 3-fold reduced affinity to a dsRNA
species containing one strand that extends far beyond the edge of the RNA duplex. We
therefore conclude the likely reason for the association of P19 with specific short rRNAs
in ES cells is that they bind with partial complementarity to larger, abundant non-coding
RNAs to form RNA duplexes - 19 base pairs long. The observed 3-fold difference in
affinity for a 19 base pair duplexed siRNA compared to a 19 base pair duplex with an
extended 3' strand is small enough that abundant endogenous dsRNAs similar in
structure to the latter could compete with endogenous siRNAs for P 19-binding.
It should be noted that P19V5 as well as a previously published, HA epitopetagged P19 construct, did not inhibit siRNA-mediated knockdown of a reporter gene in
293T cells (Supplementary Figure 2C). This conflicts with results from HeLa cells,
where epitope-tagged P19 was an effective inhibitor of exogenously introduced siRNAs,
and with results from 293T cells, where untagged P19 was able to interfere with
endogenous miRNA activity (Dunoyer et al. 2004; Lecellier et al. 2005). The reason for
these discrepancies is unclear, but it suggests that P19 may not inhibit RNAi in
heterologous systems as robustly as previously anticipated.
The original objective of these experiments was to use the affinity of P19 for
siRNA duplexes to identify these types of short RNAs in mouse ES cells. Functional and
biochemical studies of P 19 in plants and animals were consistent with this approach.
Analysis of the short RNAs bound by P 19 in ES cells did not generate an apparent siRNA
fraction in spite of evidence for a strong enrichment of certain RNA species as compared
to the supematants from associated ES cell extracts. RNAs with siRNA structure do bind
to P19 with high affinity, and the failure to detect them might indicate their scarcity in
these cells. Alternatively, P19 might be denied access to endogenous RNAi-related short
RNAs if the majority of siRNAs in ES cells are bound by RNAi pathway components
that have a higher affinity for these molecules than P19.
Material and methods
Plasmid construction.
CIRV P19 (obtained from J. Burgyan) was amplified via PCR and cloned into pcDNA3.1
(Invitrogen), using the primers 5'-CACCATGGAACGAGCTATACAAGGA-3'
and 5'-
CTCGCTTTCTTTCTTGAAGGTTTCA-3'. To make pl9NLS, two SV40 NLS
sequences were added to p19V5 by annealing two DNA oligos with an 18 bp overlap,
(5'-CCGCTCGAGTGATCCAAAAAAGAAGAGAAAGGTAGATCCAAAA-3'and
5'-
CGAACCGCGGTACCTTTCTCTTCTTTTTTGGATCTACCTTTCT-3'), filling-in with
Taq DNA polymerase, and inserting into the XhoI/SacII sites of pcDNA3.1. pRL-CMV6xCXCR4 was described previously (Doench et al. 2003). The P19HA construct was
obtained from O. Voinnet (Dunoyer et al. 2004). The V5-tagged PGalactosidase control
plasmid was obtained from H. Houbaviy. The 3' UTR reporter constructs were made by
inserting 5' phosphorylated, annealed dsDNA oligos into the XhoI /ApaI sites in the 3'
UTR of pRL-CMV (Promega). The sequences of the DNA oligos used to make the UTR
constructs are available upon request. All DNA oligos were purchased from IDT.
Cell Culture, transfection, and luciferase assays.
J1 ES and 293T cells were grown as in (Houbaviy et al. 2003; Petersen et al. 2006).
Cells were transfected with Lipofectamine 2000 (Invitrogen) according to the
manufacturer's instructions. 25 [tg of plasmid per 10cm plate was used for ES cell and
293T immunoprecipitation experiments. Transfection efficiency as assessed by GFP
expression was between 60-80% for ES cell immunoprecipitation experiments, and >95%
for 293T experiments.
Dual luciferase assays were performed essentially as previously described
(Doench and Sharp 2004; Petersen et al. 2006). For ES cell assays, Ing of each 3' UTR
reporter construct, 100ng of pG13 control (Promega), and 700ng of pWhiteScript carrier
DNA were transfected per well of a 24-well plate and lysed 24 hours post-transfection.
For 293T assays, 4ug of plasmid (P19V5/NLS/HA or fPGal control) was transfected per
well of a 6-well plate, and split 8 hours post-transfection into a 24-well plate, seeding
cells at 2x10A5 cells per well. 24 hours later, an additional 0.7[tg of plasmid was
cotransfected along with 30 ng of pRL and pGL3 plasmids (Promega) and the appropriate
amount of siRNA to attain the concentration indicated in Supplementary Figure 1. Cells
were lysed 24 hours post-transfection. The siRNA used perfectly targeted the coding
sequence of Renilla Luciferase (target sequence: 5'-GCCAAGAAGUUUCCUAAUA-
3').
Western blots and immunohistochemistry.
For western analysis, cell extracts were fractionated on 4-20% SDS-polyacrylamide
gradient gels (Biorad) and transferred to Hybond-C membranes (Amersham).
Membranes were blocked with 5% milk in PBS then incubated with the indicated
antibody. Antibodies (aGAPDH (Chemicon); aV5 (Invitrogen); aCyclin Tl (Abcam))
were detected using HRP-conjugated antisera (Amersham) and chemiluminescence. For
immunofluorescent staining, ES cells were transfected on gelatinized cover slips, then
fixed with 4% PFA in PBS and permeabilized with 1%Triton X-100. Coverslips were
stained with primary and secondary antibodies in PBS for one hour and affixed to glass
slides with Prolong Gold with DAPI (Molecular Probes). Actin was visualized using
Alexa 488 Phalloidin (Molecular Probes).
Immunoprecipitations and RNA isolation and visualization.
For each immunoprecipitation experiment three 10cm plates of ES cells were lysed two
days post-transfection with WCE buffer (1%NP40, 30 mM HEPES KOH pH 7.5, 100
mM NaCi, 66 mM KC1, 1 mM MgCl 2, ImM DTT, 1U/ul SUPERaseln (Ambion),
Complete Protease Inhibitor EDTA-free (Roche), and Phosphatase Inhibitors I and II
(Sigma)) or NE buffer (0.2% NP-40, 100mM NaC1, 5mM MgCl 2, ImM DTT, 1U/ul
SUPERaseln, Complete Protease Inhibitor EDTA-free, and Phosphatase Inhibitors I and
II). The WCE lysate was spun at 20000 rpm in a tabletop centrifuge for 15 minutes after
lysis to pellet insoluble material. The NE cells were lysed for 5 minutes at 40 C and spun
for 5 minutes at 1000g. The pellet (containing nuclei) was resuspended in WCE buffer
and sonicated on ice 3 times for 5 seconds at power level 2 on a Branson Sonifier 450
sonicator, with one-minute rest on ice between sonication. These sonication conditions
did not disrupt endogenous short RNA binding to P19 (not shown). The sonicated
nuclear fraction was spun at 20000 rpm for 15 minutes in a tabletop centrifuge to remove
particulate. P19 was immunoprecipitated from extracts by adding 20ul of Protein G Plus
agarose beads (Pierce) preconjugated overnight with 2ug of V5 antibody per 10cm plate,
and rotating at 40 C for 45 minutes. Beads were washed with Iml of WCE buffer 3
times, switching tubes for the final wash. Beads were then resuspended in 300ul lx
Proteinase K buffer (Nykanen et al. 2001) with 5ul of Proteinase K (Roche) and rotated at
RT for 30 minutes, then extracted with phenol/chloroform and precipitated. To isolate
supernatant RNA, 300ul of supernatant from the immunoprecipitation was extracted with
phenol/chloroform and precipitated. 100ng of supernatant RNA and 1/10 th the volume of
immunoprecipitated RNA were 3' end labeled with T4 RNA ligase (NEB) and 5' a32P 5'3' cytidine bis-phosphate (NEN) overnight at 40 C in lx RNA ligase buffer and 30% v/v
DMSO. Unincorporated radioactivity was removed with G25 microspin columns
(Amersham), and one-half of the labeling reaction was resolved on a 12% or 15%
denaturing polyacrylamide gel (National Diagnostics). For size markers, 10 bp DNA
ladder (Invitrogen) or a 21 bp siRNA was 5' end-labeled using Y32 P-ATP (NEN) and T4
PNK (NEB). Gels were wrapped in saran wrap and quantitated on a phosphoimager.
Northern blots were performed as in (Houbaviy et al. 2003), except hybridization was
carried out in Oligo-Hyb (Ambion) at 370 C. Total mouse tissue RNA was purchased
from Ambion. The same conditions for the ES cell WCE immunoprecipitations were
used for 293T immunoprecipitations.
Binding assays.
293T lysates were used to make P19-containing cell extracts for binding assays because
of the lack of association between P19 and endogenous RNAs in these cells
(Supplementary Figure 1). Extracts were made 48 hours post-transfection with P19V5
using 400ul of WCE buffer per plate. Cells were lysed, and extracts spun at top speed in
a tabletop centrifuge for 15 minutes. Protein G Plus agarose beads pre-conjugated
overnight with V5 antibody were added to cleared lysate, and P19V5 was
immunoprecipitated for 45 minutes at 40 C. Beads were washed 3x in WCE buffer, and
resuspended in WCE buffer such that for each binding condition, 30ug of extract, 10•lI of
50% protein G slurry, and 0.5 gpg of V5 antibody were used. Radiolabeled dsRNA was
added to P 19V5-bead mixtures at the indicated concentrations, and the binding reactions
were rotated for 30 minutes at room temperature. Beads were washed 2x with 500ul of
WCE buffer, then 2x with iml of WCE buffer, switching tubes for the final wash. The
bound RNA was eluted using the Proteinase K phenol/chloroform treatment described
above. One-half of each reaction was resolved on a 15% polyacrylamide gel and
visualized with a phosphoimager, or quantitated via scintillation counting. Data points
were then fit to a fixed-endpoint curve, (m2)*mO/(mO+ml), using KaleidaGraph
software, where m2 is the maximum amount bound, ml is the apparent dissociation
constant, Kapp, and mO is the [RNA].
To make radiolabeled, duplexed RNA, 5' phosphorylated RNA oligos were
("RNA#3" 5'- CGGCUCCGGGACGGCCGGGAA-3'; "5' complementary" 5'CCCGGCCGUCCCGGAGCCGGCUUGGCUUCGU-3'; "3'complementary" 5'CUUGGCUUCGUUUCCCGGCCGUCCCGGAGCCGUU-3'; "siRNA complementary"
5'- CCCGGCCGUCCCGGAGCCGUU-3') annealed to their complementary strands at
12 jtM in a solution containing 10mM HEPES pH 7.5, 20mM NaCl and ImM EDTA by
heating the RNAs to 950 C and cooling 1 degree/minute until the samples reached room
temperature. Radioactive 5' end-labeled RNA#3 was spiked into annealing reactions
before heating. Dilution series of each dsRNA species were made, and portions of each
were run on a 20% native polyacrylamide gel to assess efficiency of annealing and
accuracy of dilution. By this analysis, all dsRNA preparations used for binding were
>99% dsRNA and had an R2-value for the dilution series of at least 0.99 (not shown).
Short RNA cloning and sequence analysis.
Short RNAs were cloned using a procedure modified from (Lagos-Quintana et al. 2001;
Lau et al. 2001) (J.R. Neilson and P.A.S, manuscript in preparation). Before adaptor
ligation, 18-26 nucleotide long RNA was gel-purified from 5ug of supernatant RNA to
use as starting material; immunoprecipitated RNA from one 10cm plate equivalent was
used as starting material and was not gel purified. Short RNA sequences were extracted
from concatamers using scripts from (Houbaviy et al. 2003). Rfam
(http://www.sanger.ac.uk/Software/Rfam/) and NONCODE
(http://www.bioinfo.org.cn/NONCODE/) RNA databases were used to define known
miRNAs, tRNAs, rRNAs, snRNAs, and snoRNAs. All genome analysis was performed
using the August 2005 assembly of the mouse genome (mm7). BLAST was run with a
word size of 7 and the gap-opening penalty set to 1. Repeat and EST overlap was
determined using the UCSC genome browser (http://genome.ucsc.edu/) Repeatmasker
(http://wwwv.repeatmasker.org) and mouse EST tracks. Sequences with multiple repeat
overlaps were annotated as the class of repeat that overlapped most frequently with the
short RNA in question. p-values comparing values between data sets were obtained
using a two-sample test for the difference in proportions.
Supplementary Information.
Supplementary information can be found at http://web.mit.edu/sharplab/calabrese_supp/.
Acknowledgements.
The authors thank J. Burgyan and 0. Voinnet for providing P19 plasmids, and are
especially grateful to J. Neilson for protocols and help with short RNA cloning. We also
72
thank A. Seila for help with bioinformatics, C. Whittaker for help with Figure 2, A.
Leung for help with microscopy, and S. Erkeland, A. Garfinkel, J. Neilson, and A. Seila
for critical reading of this manuscript. This work was supported by a Program Project
Grant from the National Cancer Institute to P.A.S. and partially by the Cancer Center
Support (core) grant from the National Cancer Institute.
Table and Figure legends.
Table 1. WCE = whole cell extract (1%NP40 lysis); NE = nuclear extract (resuspended
pellet from 0.2% NP40 lysis); sup = supernatant; known ncRNAs = clones with at least
90% sequence identity to miRNAs, rRNAs, tRNAs, and snRNAs, as well as RNAs
involved in imprinting and other processes; novel short RNAs = clones that are not
known ncRNAs.
Figure 1. P19 binds endogenous short RNAs when expressed in ES cells. (A) P19
expression constructs used in this study. (B) Sub-cellular localization ofP19V5 and
P19NLS in ES cells (scale bar = 15gtm). (C) Western blots showing protein composition
of P19-containing extracts. GAPDH (cytoplasmic) and Cyclin T1 (nuclear) serve as
fractionation controls. WCE = whole cell extract; CE = cytoplasmic extract; NE =
nuclear extract. (D) P19 constructs bind short RNAs when expressed in ES cells. Cells
transfected with GFP, P19V5, or P19NLS were lysed with either WCE or NE buffer and
immunoprecipitations were performed with aV5 Protein G agarose beads. Bound RNAs
were 3' end-labeled with 5'32P cytidine 3', 5'-bis(phosphate) and resolved on a 12%
denaturing polyacrylamide gel. The size markers correspond to a 10 bp DNA ladder.
The arrow denotes a -20 nucleotide band observed in the P19 immunoprecipitations.
Figure 2. P 19 enriches for particular short rRNA species when expressed in ES cells.
(A) Specific short rRNAs are highly enriched in P19 immunoprecipitations compared to
control supernatants. Shown is a scaled representation of all the short rRNAs cloned,
aligned to bases 3,900 to 13,000 of the 13,404 base pair 45S pre-rRNA. Highlighted in
bold along the X-axis are the locations of the mature 18S, 5.8S, and 28S rRNA species
relative to the full-length 45S pre-rRNA. Each grey bar represents one cloned short
rRNA positioned directly above or below its matching sequence in the 45S pre-rRNA.
Grey bars above the X-axis were cloned from one or both of the immunoprecipitations,
and those below the X-axis were cloned from one or both of the supernatants. (B) Certain
P19-enriched short rRNAs form partial double-stranded RNA structures with themselves.
Shown are selected Mfold-predicted dsRNA structures of enriched rRNAs folded against
each other.
Figure 3. P19 binds with high affinity to RNAs containing 19 to 21 base pair doublestranded regions with extended 5' or 3' single-stranded segments. (A) Base composition
and secondary structure of short RNAs tested for binding to P19V5. In all cases, the
strand in grey is short rRNA #3 from Figure 2A. (B) Representative P19-binding assay.
Shown is the RNA bound by P 19V5-bead complexes after incubation with increasing
concentrations of radiolabeled RNA (1, 5, 10, 50, and 100 nM). (C) Determination of the
affinity of P19 for selected RNA species. Shown is the quantitation of a binding assay
similar to that in (B). The Kapp was determined by fitting the data points to a fixed
endpoint curve using KaleidaGraph data analysis software.
Figure 4. Endogenous short rRNAs exist independently of Dicer. (A) P19 associates
with short RNAs in the absence of Dicer. Dicer +/+ or -/- cells were transfected with
either P19V5 or GFP as a negative control. Immunoprecipitations using V5 antibody
were performed, and the associated RNA was 3' end labeled and visualized as in Figure
ID. (B) Similar RNA species associate with P19V5 in the presence or absence of Dicer.
Shown is a short RNA northern blot probing immunoprecipitated and supernatant RNA
from 4A with probes complementary to RNA#3, miR295, or U6 snRNA.
Supplementary Figure 1. (A) Endogenous short rRNAs do not repress expression of a
luciferase reporter with 2 perfectly complementary binding sites inserted in its 3' UTR.
Three short rRNA sequences were tested for repression: RNA #3 was cloned frequently
in P19 immunoprecipitations but not in supernatants, RNA #5 was cloned frequently in
both P19 immunoprecipitations and supernatants, and RNA #13 was cloned only in
supernatants. Target sequences complementary to the endogenously expressed miR295
were included as a positive control. (B) RNA #3 is not upregulated in ES cells by
transfection of Pl9V5, Pl9NLS, or non-specifically by a GFP control. Shown is a short
RNA northern blot to total RNA prepared from ES cells after transfection of Pl9V5,
Pl19NLS, GFP, or from untransfected cells (Neg), probed either with the sequence
complementary to RNA #3 (top panel; the arrow denotes the -20 bp species enriched in
P19 immunoprecipitates) or a tRNA loading control (bottom panel). (C) RNA #3 is
present at low levels in various mouse tissues, assessed here by short RNA northern
blotting. The blot was probed as in (B). In addition to the signal seen in ES cells, there is
a low but detectable signal migrating at -20 bp in RNA from D 10-12 embryo, ovary,
testicle, thymus, and spleen.
Supplementary Figure 2. (A) Immunoprecipitation of either P19V5 orP 19HA does not
generate a comparable band of short RNAs in 293T cells as in ES cells. 293T cells were
transfected with either GFP, P19 tagged at the C-terminus with 3 HA epitopes (Dunoyer
et al. 2004), or P19V5, and processed as in Figures ID and 4A. P19V5-associated RNA
from ES cells was used as a positive control for labeling. (B) Western blots to samples
from (A) confirming successful P19V5/HA immunoprecipitation in 293T cells. (C)
Expression of P I9V5/NLS/HA does not inhibit siRNA-mediated knockdown of
luciferase in 293T cells. 293T cells were transfected sequentially, first with
P19V5/NLS/HA or BGalactosidaseV5 (as a negative control), then again with luciferase
plasmids, siRNA, and more P19V5/NLS/HA or PGal. Luciferase assays were performed
24 hours after the second transfection.
77
Table 1.
Gross statistics of short RNAs cloned from indicated RNA starting material.
RNA source:
WCE sup
WCE P19V5 IP
NE sup
NE P19NLS IP
# of sequences cloned
known ncRNAs cloned
novel short RNAs
avg length of clone (in nucleotides)
avg %GC of clone
avg %GC of cloned ncRNA
avg %GC of novel clone
303
250
53
2212
53117
53- 17
52113
373
296
77
20+2
75 +17
76i16
71121
380
325
55
23-2
49 15
50:15
47i13
261
198
63
21:2
67119
68±18
63-20
78
Table 2. Percentage of cloned short RNAs from Table 1 mapping to selected genomic features.
---
--
RNA source:
% of clones mapping to:
known ncRNAs
miRs
miR*s
rRNAs
tRNAs
snRNAs
ESTs
known repeats
no match
WCE sup
WCE P19VS IP
NE sup
NE P19NLS IP
82.5
46.9
2.0
29.4
3.6
79.4
85.5
54.5
3.4
22.1
4.5
75.9
16.9
0.7
2.6
2.3
11.9
7.5
0.0
69.4
0.5
1.9
2.1
2.1
13.7
0.4
51.0
5.7
1.1
1.9
3.7
2.6
7.4
5.7
3.1
14.6
A
p19V5
pl9NLS
B
CMV
CIRV-p19
H
CMV
I
H
CIRV-p9
Stain:
Actin
DAPI
merged
P19V5
P19NLS
P19V5
WCE
D
P19NLS
CE
-
-m
GFP
WCE
NE
P19NLS
CE
P19NLS
NE
P19V5
WCE
Ir
vs
oo00nts
i,...
4 IfI
..
GAPDH
Cyclin Ti
30nts
20nts
1
Figure 1.
2
3
4
5
6
7
8
3
60.
# of hits
40.
(combined P19 IPs)
20-
6
I
I
9
2
10 11
* *12
4E
,· i
8
i: i
.iItI-i:^
-··- -i·
# of hits
(combined P19 sups)
20o
13
18S
5' -UCCGGUGAGCUCUCGCUGGCCC- 3'
3 -AAGGGCCGGCAGGGCCUCGGC- 5'
3-9
dG = -19.9
Figure 2.
5.85S
28S
5.-CGCCGAGGGCGCACCACCGG- 3'
3' UCGACUCCGCUAGGUGCCC-5'
5 -UCCGGUGAGCUCUCGCUGGCCC-3'
-
3 -CCCGGUCGCUCUCGAGUGGCCU 5'
4-11
9-9
dG = -16.9
dG = -24.3
5' complementary
3' complementary
5'-CCCGGCCGUCCCGGAGCCGGCUUGGCUUCGU-3'
5'-CUUGGCUUCGUUUCCCGGCCGUCCCGGAGCCGUU-
3'
5'- CCCGGCCGUCCCGGAGCCGUU- 3'
siRNA
S c. .ob:1
B
2b;.,,,
,,,.
,,..
input [RNA]
.-------7
30nts
20nts
*
bound RNA
5'complementary
C
siRNA
3' complementary
single-stranded #3
bead-only IP
3500
3000
-0
2500
I
2000
1500
---- slRNA
1000
-- -5'complementary
- - 3'complementary
500
0
"i
0
50
100
150
[RNA] nM
Figure 3.
200
250
A
P19
GFP
P19
Dicer +/+ Dicer +/+ Dicer - S
rr
IP
W·
n
Io
a
IDo
B
P19
GFP
P19
Dicer +/+ Dicer +/+ Dicer -
S IP
S IP
I
S IP
RNA#3
100 bi
d0
miR295
pre-miR295
04.
20 bp
Figure 4.
r:: r
U6 control
B
20
* Dicer +/+
Dicer -/-
Neg
GFP
P19V5
P19NLS
60 bp
2
0
0P10
30 bp
C5
20 bp
0
no site
RNA#3
RNA #5
RNA #13
miR295
-5
3' UTR complementary sites
C
30 bp
20 bp
-4-
-...
-
-
tRNA control
Supplementary Figure 1.
j
*
a*
000
control
WtRNA
A
t
1.40
z
C<
a.
a
C.
$
GFP
293T
S
IP
P19HA
293T
S
IP
P19V5
293T
S IP
P19V5
ES
S IP
"
RBaIVS
~P19V5
1.20
*914A L
I P19HA
1.00oo
-i.
0.80
S0.60
100 bp
0.40
0.20
r
0.00
0
10
I
100
[siRNA] pM
Ij
20 b
WB:
293T GFP
pre-IP post-IP
sup sup aV5 IP
293T P19HA
pre-IP post-IP
sup sup aHA IP
oaV5
aHA
Supplementary Figure 2.
293T P19V5
pre-IP post-IP
sup sup aV5 IP
jI
1000
References
Alvarez-Garcia, I. and Miska, E.A. 2005. MicroRNA functions in animal development
and human disease. Development 132(21): 4653-4662.
Ambros, V. 2004. The functions of animal microRNAs. Nature 431(7006): 350-355.
Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003. MicroRNAs
and other tiny endogenous RNAs in C. elegans. CurrBiol 13(10): 807-818.
Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
Baulcombe, D. 2004. RNA silencing in plants. Nature 431(7006): 356-363.
Cam, H.P., Sugiyama, T., Chen, E.S., Chen, X., FitzGerald, P.C., and Grewal, S.I. 2005.
Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic
control of the fission yeast genome. Nat Genet 37(8): 809-819.
Cannone, J.J., Subramanian, S., Schnare, M.N., Collett, J.R., D'Souza, L.M., Du, Y.,
Feng, B., Lin, N., Madabusi, L.V., Muller, K.M. et al. 2002. The comparative
RNA web (CRW) site: an online database of comparative sequence and structure
information for ribosomal, intron, and other RNAs. BMC Bioinformatics3: 2.
Chapman, E.J., Prokhnevsky, A.I., Gopinath, K., Dolja, V.V., and Carrington, J.C. 2004.
Viral RNA silencing suppressors inhibit the microRNA pathway at an
intermediate step. Genes Dev 18(10): 1179-1186.
Cherry, S.R., Biniszkiewicz, D., van Parijs, L., Baltimore, D., and Jaenisch, R. 2000.
Retroviral expression in embryonic stem cells and hematopoietic stem cells. Mol
Cell Biol 20(20): 7419-7426.
Doench, J.G., Petersen, C.P., and Sharp, P.A. 2003. siRNAs can function as miRNAs.
Genes Dev 17(4): 438-442.
Doench, J.G. and Sharp, P.A. 2004. Specificity ofmicroRNA target selection in
translational repression. Genes Dev 18(5): 504-511.
Dunoyer, P., Lecellier, C.H., Parizotto, E.A., Himber, C., and Voinnet, 0. 2004. Probing
the microRNA and small interfering RNA pathways with virus-encoded
suppressors of RNA silencing. PlantCell 16(5): 1235-1250.
Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B.,
and Bartel, D.P. 2005. The widespread impact of mammalian MicroRNAs on
mRNA repression and evolution. Science 310(5755): 1817-1821.
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., and Eddy, S.R. 2003. Rfam:
an RNA family database. Nucleic Acids Res 31(1): 439-441.
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S.R., and Bateman, A.
2005. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids
Res 33(Database issue): D121-124.
Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The
RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the
vertebrate limb. ProcNatl Acad Sci USA 102(31): 10898-10903.
He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S.,
Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J. et al. 2005. A
microRNA polycistron as a potential human oncogene. Nature 435(7043): 828833.
Houbaviy, H.B., Dennis, L., Jaenisch, R., and Sharp, P.A. 2005. Characterization of a
highly variable eutherian microRNA gene. Rna 11(8): 1245-1257.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Jaenisch, R. 1997. DNA methylation and imprinting: why bother? Trends Genet 13(8):
323-329.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin,
K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J. et al. 2003. The UCSC Genome
Browser Database. Nucleic Acids Res 31(1): 51-54.
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and
Haussler, D. 2002. The human genome browser at UCSC. Genome Res 12(6):
996-1006.
Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. 2001. Identification of
novel genes coding for small expressed RNAs. Science 294(5543): 853-858.
Lakatos, L., Szittya, G., Silhavy, D., and Burgyan, J. 2004. Molecular mechanism of
RNA silencing suppression mediated by p 19 protein of tombusviruses. Embo J
23(4): 876-884.
Lander, E.S. Linton, L.M. Birren, B. Nusbaum, C. Zody, M.C. Baldwin, J. Devon, K.
Dewar, K. Doyle, M. FitzHugh, W. et al. 2001. Initial sequencing and analysis of
the human genome. Nature 409(6822): 860-921.
Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. 2001. An abundant class of tiny
RNAs with probable regulatory roles in Caenorhabditis elegans. Science
294(5543): 858-862.
Lecellier, C.H., Dunoyer, P., Arar, K., Lehmann-Che, J., Eyquem, S., Himber, C., Saib,
A., and Voinnet, 0. 2005. A cellular microRNA mediates antiviral defense in
human cells. Science 308(5721): 557-560.
Lee, R.C., Hammell, C.M., and Ambros, V. 2006. Interacting endogenous and exogenous
RNAi pathways in Caenorhabditis elegans. Rna 12(4): 589-597.
Li, E. 2002. Chromatin modification and epigenetic reprogramming in mammalian
development. Nature Review Genetics 3(9): 662-673.
Lippman, Z. and Martienssen, R. 2004. The role of RNA interference in heterochromatic
silencing. Nature 431(7006): 364-370.
Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero,
A., Ebert, B.L., Mak, R.H., Ferrando, A.A. et al. 2005. MicroRNA expression
profiles classify human cancers. Nature 435(7043): 834-838.
Mayer, C., Schmitz, K.M., Li, J., Grummt, I., and Santoro, R. 2006. Intergenic
Transcripts Regulate the Epigenetic State of rRNA Genes. Mol Cell 22(3): 351361.
Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005.
Characterization of Dicer-deficient murine embryonic stem cells. ProcNatl Acad
Sci USA 102(34): 12135-12140.
Novina, C.D. and Sharp, P.A. 2004. The RNAi revolution. Nature 430(6996): 161-164.
Nykanen, A., Haley, B., and Zamore, P.D. 2001. ATP requirements and small interfering
RNA structure in the RNA interference pathway. Cell 107(3): 309-321.
Petersen, C.P., Bordeleau, M.E., Pelletier, J., and Sharp, P.A. 2006. Short RNAs repress
translation after initiation in mammalian cells. Mol Cell 21(4): 533-542.
Plath, K., Mlynarczyk-Evans, S., Nusinow, D.A., and Panning, B. 2002. Xist RNA and
the mechanism of X chromosome inactivation. Annu Rev Genet 36: 233-278.
Scholthof, H.B. 2006. The Tombusvirus-encoded P 19: from irrelevance to elegance. Nat
Rev Microbiol.
Silhavy, D., Molnar, A., Lucioli, A., Szittya, G., Hornyik, C., Tavazza, M., and Burgyan,
J. 2002. A viral protein suppresses RNA silencing and binds silencing-generated,
21- to 25-nucleotide double-stranded RNAs. Embo J21(12): 3070-3080.
Sontheimer, E.J. and Carthew, R.W. 2005. Silence from within: endogenous siRNAs and
miRNAs. Cell 122(1): 9-12.
Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. 2005. Animal
MicroRNAs confer robustness to gene expression and have a significant impact
on 3'UTR evolution. Cell 123(6): 1133-1146.
Stewart, C.L., Stuhlmann, H., Jahner, D., and Jaenisch, R. 1982. De novo methylation,
expression, and infectivity of retroviral genomes introduced into embryonal
carcinoma cells. ProcNatl Acad Sci USA 79(13): 4098-4102.
Vagin, V.V., Sigova, A., Li, C., Seitz, H., Gvozdev, V., and Zamore, P.D. 2006. A
distinct small RNA pathway silences selfish genetic elements in the germline.
Science 313(5785): 320-324.
Vargason, J.M., Szittya, G., Burgyan, J., and Tanaka Hall, T.M. 2003. Size selective
recognition of siRNA by an RNA silencing suppressor. Cell 115(7): 799-811.
Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J., Stoop, H., Nagel, R., Liu, Y.P.,
van Duijse, J., Drost, J., Griekspoor, A. et al. 2006. A genetic screen implicates
miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell
124(6): 1169-1181.
Watanabe, T., Takeda, A., Tsukiyama, T., Mise, K., Okuno, T., Sasaki, H., Minami, N.,
and Imai, H. 2006. Identification and characterization of two novel classes of
small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes
and germline small RNAs in testes. Genes Dev 20(13): 1732-1743.
Waterston, R.H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J.F., Agarwal, P.,
Agarwala, R., Ainscough, R., Alexandersson, M., An, P. et al. 2002. Initial
sequencing and comparative analysis of the mouse genome. Nature 420(6915):
520-562.
Xie, Z., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D.,
Jacobsen, S.E., and Carrington, J.C. 2004. Genetic and functional diversification
of small RNA pathways in plants. PLoS Biol 2(5): El 04.
Ye, K., Malinina, L., and Patel, D.J. 2003. Recognition of small interfering RNA by a
viral suppressor of RNA silencing. Nature 426(6968): 874-878.
Zuker, M. 2003. Mfold web server for nucleic acid folding and hybridization prediction.
Nucleic Acids Res 31(13): 3406-3415.
Chapter 3
RNA sequence analysis defines Dicer's role in mouse
embryonic stem cells
This chapter appears in the context of its contemporary science, and is published in PNAS
104:18097-18102. The described experiments were an equal collaboration with Amy C.
Seila, and also performed with Gene W. Yeo.
All Supporting Figures referenced in the original manuscript are included in Chapter 3.
The Supporting Text and associated Supporting Figures to the original manuscript are
included as an Appendix to Chapter 3. All Supporting Tables and Alignment Files are
published online on the Proceedings of the National Academy of Sciences' website.
Abstract
Short RNA expression was analyzed from Dicer-positive and Dicer-knockout
mouse embryonic stem (ES) cells using high-throughput pyrosequencing. A correlation
of miRNA quantification with sequencing frequency estimates that there are 110,000
miRNAs per ES cell, the majority of which can be accounted for by six distinct miRNA
loci. Four of these miRNA loci or their human homologues have demonstrated roles in
cell cycle regulation or oncogenesis, suggesting that a major function of the miRNA
pathway in ES cells may be to shape their distinct cell cycle. 46 novel miRNAs were
identified, most of which are expressed at low levels and are less conserved than the set
of known miRNAs. Low abundance short RNAs matching all classes of repetitive
elements were present in cells lacking Dicer, although the production of some SINE- and
simple repeat-associated short RNAs appeared Dicer-dependent. These and other Dicerdependent novel sequences resembled miRNAs. At a depth of sequencing that
approaches the total number of 5' phosphorylated short RNAs per cell, miRNAs
appeared to be Dicer's only substrate. The results presented suggest a model in which
repeat-associated miRNAs serve as host defenses against repetitive elements, a function
canonically ascribed to other classes of short RNA.
Introduction
RNA interference (RNAi) is a conserved set of gene regulatory mechanisms in
which short RNA molecules guide protein complexes to suppress expression of
complementary nucleic acid targets. Different classes of short RNAs, complexed with
specific Argonaute protein family members, either induce the degradation, prevent the
translation, or prevent the transcription of their target RNA species (Tolia and Joshua-Tor
2007).
In mammals, Argonaute proteins are thought to associate predominantly with a
class of non-coding RNA genes termed microRNAs (miRNAs). miRNAs are essential
regulators of diverse biological processes, including cell division, apoptosis, and
metabolism (Kloosterman and Plasterk 2006). miRNA precursors are processed
sequentially by the enzymes Drosha and Dicer to yield mature -22 nucleotide (nt) long
single-stranded miRNAs (Bartel 2004). miRNAs are thought to primarily influence gene
expression by preventing productive translation of target mRNAs, though recent studies
suggest that they may have other mechanisms of action (Hwang et al. 2007; Nilsen 2007).
Other classes of short RNAs mediate different types of RNAi-based silencing. In
Arabidopsis and S. pombe, Argonaute-associated short-interfering RNAs (siRNAs)
cleave repetitive transcripts and nucleate heterochromatin at genomic repeats (Lippman
and Martienssen 2004). These siRNAs require Dicer and an RNA-dependent RNA
polymerase (RdRP) for biogenesis. Potentially analogous siRNA species were identified
in mouse oocytes, although it is not clear if these oocyte siRNAs nucleate
heterochromatin (Watanabe et al. 2006). In animal germ cells, the Argonaute subfamily
of Piwi proteins associate with Dicer-independent short RNAs, termed piRNAs. Like
Arabidopsis and S. pombe siRNAs, piRNAs are thought to silence repetitive sequences at
the level of transcription (O'Donnell and Boeke 2007). Finally, in C. elegans, endogenous
siRNAs exist that are thought to silence protein-coding genes at the post-transcriptional
level (Ambros et al. 2003b). These siRNAs also require Dicer and RdRPs for biogenesis,
and likely have 5' di- or tri- instead of 5' mono-phosphates (Lee et al. 2006; Ruby et al.
2006).
Embryonic stem (ES) cells are derived from the inner cell mass of the blastocyst
during the stage of development where epigenetic patterns of gene regulation are reestablished in preparation for implantation (Reik et al. 2001). ES cells can be propagated
in vitro without the loss of pluripotency and induced to differentiate into specialized cell
types when given appropriate cues, making them potential sources of tissue in
regenerative therapies (Mayhall et al. 2004). Many cancers also have stem cell-like
characteristics, underscoring the clinical relevance of ES cell biology (Jones and Baylin
2007).
Despite recent fundamental advances in the understanding of global ES cell
chromatin architecture, much remains to be learned about the mechanisms by which ES
cells maintain the pluripotent state (Spivakov and Fisher 2007). Specifically, ES cells
lacking Dicer are viable, but are incapable of differentiation and display severe growth
defects, indicating that the RNAi pathway is required for pluripotency and aspects of ES
cell division (Kanellopoulou et al. 2005; Murchison et al. 2005). Presumably, these
defects are due to loss of miRNA biogenesis and not other types of short RNAs, as
previous sequencing of short cDNA libraries revealed miRNAs to be the predominant
class of short RNA in mouse ES cells (Houbaviy et al. 2003; Calabrese and Sharp 2006).
However, Dicer is critical to the biogenesis of almost all classes of short RNAs described,
with the potential exception of piRNAs, thus it is possible that other previously
unidentified RNAs contribute to the Dicer null ES cell mutant phenotype.
To further our understanding of Dicer function and the mechanisms by which
short RNAs mediate gene regulation in ES cells, short RNA expression was profiled in
four independently derived ES cell cDNA libraries, including a library made from Dicer
null ES cells. From quantification of miRNA levels, we estimate that there are 130,000 5'
phosphorylated short RNAs per ES cell. 15% of these RNAs are generated independently
of Dicer, and consist of: short non-coding RNA (ncRNA) fragments, promoter proximal
RNAs (described elsewhere, in preparation), presumed breakdown products of mRNAs,
and low-abundance, highly repetitive sequences. The remaining 85% of 5'
phosphorylated ES cell short RNAs consist of miRNAs or miRNA-like species that
depend on Dicer for biogenesis. The majority of ES cell miRNAs appear to be generated
by six distinct loci, four of which have been implicated in cell cycle control or
oncogenesis. Notably, poorly conserved ES cell miRNA hairpins tend to overlap
annotated repetitive elements, potentially connecting the miRNA pathway to host defense
against accumulated repeats.
Results
Global statistics of short cDNA libraries.
Four separate short cDNA libraries made from mouse ES cells were sequenced
using high throughput pyrosequencing (Margulies et al. 2005). To determine whether
classes of short RNAs other than miRNAs depend on Dicer for biogenesis, short cDNA
libraries were made from a floxed Dicer ES cell line before and several months after
deletion of the floxed region containing the key catalytic residues of Dicer's second
- "', respectively). This
RNAse III domain (referred to as libraries "Dicer+'+" and "Dicer"
Dicer deletion cell line has been used in previous studies (Calabrese and Sharp 2006;
Leung et al. 2006) and largely recapitulates the phenotypic defects observed from earlier
studies of Dicer loss in mouse ES cells (Supporting Information (SI) Figure 4)
(Kanellopoulou et al. 2005; Murchison et al. 2005). Additionally, to determine if changes
in DNA methylation correlate with expression of novel classes of mammalian short
RNAs, libraries were sequenced from J 1 ES cells before and five days after treatment
with the DNA methyltransferase inhibitor 5-aza-deoxycytidine (referred to as libraries
"J1" and "JI aza", respectively; SI Figure 5). Rationale for this experiment was based on
observations made in Arabidopsis, where production of short RNAs by the RNAipathway stimulates DNA methylation at certain classes of repetitive elements (Lippman
and Martienssen 2004). Subsequent sequencing and analysis indicated few significant
differences between the J1 and Jlaza cDNA libraries (data not shown), and for the
purpose of this study they were treated primarily as expression replicates for Dicercontaining ES cell libraries. Because of strain and sex chromosome differences between
J1 and Dicer++ES cells, reads have only been compared between the Dicer+l+ and Dicer
/-libraries when considering the consequences of Dicer loss.
In total, the four libraries contained 418,093 reads representing 79,265 distinct
sequences (Table 1). We focused our analysis on the 298,039 reads representing 29,016
distinct sequences that matched the mouse genome with 100% identity over their entire
length. On average, 82% of all reads from the Dicer-positive libraries matched annotated
miRNA hairpins, while 11% of reads matched other known ncRNAs (rRNAs, tRNAs,
snRNAs, etc.), and 7% of reads were previously uncharacterized short RNAs (referred to
as "novel" sequences; Table 1). As expected, the Dicer"-library was nearly devoid of
miRNAs, and instead composed of other known ncRNAs (69%) and novel sequences
(31%; Table 1).
Expression and analysis of known miRNAs.
To validate that the cDNA libraries accurately recapitulated short RNA expression
in ES cells, the absolute numbers of seven known miRNAs were determined in JI and
Dicer+' + ES cells using the Direct miRNA assay (Table 2) (Neely et al. 2006). The
Pearson correlation coefficients between the miRNA quantification and sequencing
frequencies in the JI and Dicer+'+ libraries were 0.62 and 0.95, respectively. Correlating
miRNA quantification to sequencing frequency, we conclude that a single ES cell
contains approximately 110,000 miRNAs from a total pool of 130,000 5' phosphorylated
short RNAs. The calculated number of miRNAs per femtogram of total ES cell RNA is
5.4+1.
The number of reads obtained for each library approaches the total number of 5'
phosphorylated short RNAs per ES cell; thus, each cDNA library can be considered an
accurate sampling of the spectrum of 5' phosphorylated short RNAs in a single ES cell.
With this in mind, the Dicer+'- and J1 libraries were used to determine the most
abundantly expressed ES cell miRNAs. Averaging values from the Dicer+' + and JI
libraries estimates that 27 ES cell miRNAs are expressed above 1,000 molecules per cell,
with the most abundant present at about 5,000 molecules per cell (SI Table 4). When
considering the 126 miRNAs that are expressed at least 50 molecules per cell, the
average and median miRNA expression per cell is 713 and 231 molecules, respectively
(SI Table 4).
The majority of miRNAs in both ES cell lines could be accounted for by six
genomic loci, representing 76% and 69% of Dicer+' + and J miRNAs, respectively (Table
3; SI Table 4, 5). These include: the miR15b/miR16 cluster, the miR17-92 cluster,
miR21, the miR290-295 cluster, a repetitive miRNA cluster on chromosome 2 (SI Table
7), and an imprinted miRNA cluster on chromosome 12 (SI Table 7) (Seitz et al. 2004).
Certain of these miRNAs, specifically miR16 and several in the miRI 7-92 cluster, have
multiple genomic locations that may contribute to expression. There were significant
differences in expression of two of these miRNA clusters between J 1 and Dicer+'+ ES
cells, possibly due to differences in strain or sex. JI ES cells appear to express the
+ ES cells, while Dicer+/+ ES
chromosome 12 cluster in higher abundance than Dicer+l
cells appear to express the chromosome 2 cluster in higher abundance than J ES cells.
The other four miRNA loci appeared quite similar in expression between the two cell
types.
Validation of known miRNAs.
Comparison of the Dicer+l+ and Dicer- libraries allowed for the genetic validation
of miRNAs expressed in ES cells, as true miRNAs should be absent in the Dicer/
library. Six annotated miRNA hairpins expressed in the Dicer- library had exact matches
to ribosomal or small-nuclear ncRNAs and are thus probably incorrectly designated as
miRNAs (denoted as "ncRNA" in SI Table 5). There were 2.5 times as many reads
matching these six miRNA hairpins in the Dicer'- library than the Dicer+' + library,
consistent with their being generated from Dicer-independent processing of abundant
ncRNA transcripts and not miRNA hairpins. Excluding these six hairpins, the overall
ratio of Dicer+l+ to Dicer-1- reads was 213:1 for 240 miRNA hairpins present in the
Dicer+l+ and Dicer-' - libraries. This clear Dicer-dependence of miRNA expression
indicates that the previous annotation of mammalian miRNAs has been an accurate
process.
Hypothesizing that a low level of Dicer-independent cleavage of pre-miRNA
hairpins generated the few miRNA-matching reads in the Dicer" library, we further
examined the sequence characteristics of the Dicer'- miRNAs. Consistent with this
hypothesis, the lengths of the Dicer' miRNA reads were more broadly distributed as
compared to the lengths of the Dicer+' + miRNA reads (SI Figure 6A). 58% of the Dicer-miRNA reads were 21-23 nt long, compared to 91% of the Dicer+l+ miRNA reads (p=7e14). This difference was striking considering the similarity of the size distributions for all
other known ncRNAs between the Dicer+ + and Dicer'- libraries (SI Figure 6A).
Next, we examined the extent of miRNA processing variability in each library,
defined here as the proportion of miRNA-matching reads that do not match the annotated
5' and 3' ends of mature miRNA sequences. Drosha defines the 5' ends of mature
miRNAs from the 5' arm of pre-miRNA hairpins and the 3' ends of mature miRNAs from
the 3' of pre-miRNA hairpins; Dicer defines the 5' ends of mature miRNAs from the 3'
arm of pre-miRNA hairpins and the 3' ends of mature miRNAs from the 5' arm of premiRNA hairpins (SI Figure 6B). If Dicer-1 miRNA reads were excised from pre-miRNA
hairpins by a Dicer-independent mechanism, more miRNA processing variability might
be expected in the Dicer-" as compared to the Dicer+"' library. Also, the ends of DicermiRNAs that would normally be defined by Dicer might show greater processing
variability compared to those defined by Drosha. Supporting these ideas, miRNA reads
exhibited more processing variability in the Dicer'/ as compared to the Dicer+'+library
(SI Figure 6C, D), and, though Dicer-processed miRNA ends showed more variability
compared to Drosha-processed miRNA ends in all four libraries, this difference was
greatest in the Dicer/1- library (SI Figure 6C, D). While we cannot formally exclude the
possibility that some miRNAs in the Dicer"- library could be due to cross-contamination
from Dicer-positive libraries, these clear differences in expression characteristics suggest
that many of the miRNAs in the Dicer - library were generated by inefficient Dicerindependent processing of pre-miRNA hairpins.
Annotation of novel miRNAs.
Using guidelines for miRNA annotation established by Ambros et al. (Ambros et
al. 2003a), and incorporating rules for Drosha processing of primary miRNA transcripts
(Zeng et al. 2005; Han et al. 2006), 46 novel miRNAs were identified in the Dicerpositive libraries (see SI; SI Table 6; SI Alignment File). These 46 novel miRNA hairpins
generate miRNAs with 42 distinct seeds, defined as bases 2-7 from the 5' end of the
miRNA (Lewis et al. 2005). 40 of these 42 seeds are novel. As a group, the novel
miRNAs are expressed at low levels in ES cells and less conserved than the set of known
miRNAs (Figure IA). Despite their low expression levels, most of the novel miRNAs
were consistently present in each Dicer-containingES cell library. 36 of the 46 novel
miRNAs were sequenced in at least two of the three libraries made from ES cells with
functional Dicer, with 21 of these being present in all three Dicer-containinglibraries. 20
of the novel miRNAs mapped into large clusters of previously identified miRNAs on
chromosomes 2, 12, and X (SI Table 7). Out of the remaining 26 novel miRNA hairpins,
only 2 were located within 5kb of a known miRNA.
Consistent with the novel miRNAs being less conserved than the set of known
miRNAs, 24 of the 46 novel miRNA hairpins overlapped at least partially with annotated
repetitive elements. By comparison, only 31 known miRNA hairpins overlap repeats in
the set of 360 mouse miRNAs that map to the mm7 build of the mouse genome (Figure
IB). As expected, the proportion of miRNA hairpins overlapping repeats decreases as
miRNA conservation increases (Figure lA).
Analysis of repeat-overlapping novel reads.
A small number of short RNAs overlapping highly repetitive sequences existed in
each of the four libraries, defined as those sequences with at least 20 exact matches to the
genome (SI Table 8; see SI for further analysis). The 1211 unique sequences in this group
were represented by 1991 reads, and had 3,935,923 total hits to the genome covering
approximately 48 Mb of DNA. Based on correlations of miRNA quantification with
sequencing frequency (Table 2), as a class these repetitive RNAs are present at
approximately 225-750 copies per ES cell. There were no strong biases in the first
nucleotide or length of these highly repetitive short RNAs, although there were slightly
more sequences beginning with 'U' as compared to the set of novel sequences with less
than 20 matches to the genome (Figure 2A). Examining the length distribution of
100
repetitive sequences, we observed a peak above background at 22 nt (Figure 2B). This
peak is due solely to a Dicer-independent short RNA that is antisense to the primerbinding site of the early transposon (ETn) repeat, an endogenous retrovirus abundantly
expressed in the early mouse embryo and ES cells (Maksakova and Mager 2005).
The proportions of repetitive sequences overlapping SINE and simple repeats
- as compared to the Dicer+' + library (Figure 2C).
were significantly lower in the Dicer-1
This suggests either that certain SINE- and simple repeat-associated RNAs are processed
by Dicer from precursor dsRNA structures, or that a transcriptional difference between
Dicer+/+ and Dicefr' cells results in differential expression of these short RNAs. Northern
blots showed no significant difference in full-length SINE B 1 RNA levels between
+ and Dicer- - ES cells (SI Figure 7), arguing against the latter hypothesis.
Dicer+l
In contrast, short RNAs overlapping centromeric satellite repeats, LINEs, and
LTR elements were clearly not dependent on Dicer for biogenesis (Figure 2C). This was
surprising, as previous studies have suggested that Dicer-dependent siRNAs processed
from long dsRNA precursors are important for silencing of these elements
(Kanellopoulou et al. 2005; Watanabe et al. 2006; Yang and Kazazian 2006). The Dicer&ES cells analyzed here maintain genomic DNA methylation at satellite repeats and LINEs
(SI Figure 4E), demonstrating that RNAi is not required for maintenance of global repeat
methylation and suggesting that loss of centromeric silencing in certain Dicer null ES
cells lines may be an indirect effect of Dicer loss (Kanellopoulou et al. 2005).
Few non-miRNA Dicer-dependent sequences are expressed in ES cells.
101
Because Dicer is involved in the production of short RNAs other than miRNAs in
several organisms, we next sought to determine what non-miRNA short RNAs might be
Dicer-dependent in ES cells. Sequences present at least 3 times in the Dicer+'+ library and
/ library were flagged as potentially dependent on Dicer for
absent in the Dicer>
biogenesis (referred to as "Dicer-dependent" below) and subjected to further analysis.
There were 50 distinct sequences, represented by 233 reads in the Dicer'l+and 139 reads
in the Jl and Jlaza libraries, which matched these criteria and were not annotated above
as novel miRNAs. Consistent with their being Dicer products, the length distribution of
these sequences peaked more sharply at -21 nt when compared to all other novel
+ library (Figure 3A; p=4.0e-5). The Dicer-dependent short
sequences in the Dicer+l
RNAs are biased towards sequences that begin with 'A' as compared to the set of all
novel reads, though this bias is not as strong as the 'U' bias seen for known miRNAs
(Figure 3B). As expected from the analysis of highly repetitive reads, these sequences
were enriched in SINE and simple repeat elements as compared to the set of novel
sequences that did not meet the criteria for Dicer-dependence (SI Figure 8). Two groups
of Dicer-dependent sequences, composed of 48 and 87 reads, were related in sequence
(Figure 3C). Both of these sequence groups appeared to be repeat-derived, with Group 1
composed entirely of SINE B 1 overlapping reads, and Group 2 displaying more
heterogeneity with respect to its repeat overlap (Figure 3C).
The possibility that Dicer-dependent sequences represent endogenous siRNAs
processed by Dicer from long double-stranded RNA was examined. Endogenous siRNAs
processed from a single precursor would be expected to cluster near other short RNA
sequences. In contrast, Dicer-dependent novel sequences do not cluster with any greater
102
frequency than novel sequences not defined as Dicer-dependent. 22% of novel sequences
both defined and not defined as Dicer-dependent fell within 500 bases of at least one
other short RNA from the set of 25,040 non-repetitive sequences present in all four
libraries (10 out of 45 Dicer-dependent sequences, and 2,493 out of 11,493 other novel
sequences; non-repetitive sequences were defined as having <20 matches to the genome).
Moreover, of the 10 Dicer-dependent sequences that did cluster near other short RNA
loci, eight overlapped protein-coding genes in the sense orientation, again not consistent
with these sequences being canonical siRNAs involved in gene silencing processes.
Instead of representing a class of endogenous siRNAs, it seems likely that many
of these Dicer-dependent sequences are miRNA-like reads whose surrounding genomic
sequences did not form prototypical miRNA hairpins. The two groups of related Dicerdependent sequences are in support of this hypothesis (Figure 3C). The five SINE B 1
associated sequences from Group 1 aligned to hairpins which were miRNA-like, but did
not meet the minimum requirements for miRNA hairpin base-pairing used in this study
(SI Alignment File). The Group 2 sequences are related to known miRNAs on
Chromosome 2 (SI Table 7), and two sequences from this group also aligned to miRNAlike hairpins with poorly defined secondary structure (SI Alignment File). Again, these
observations are consistent with Group 2 sequences being miRNA-like and not siRNAlike in origin.
Sequences present less than three times in the Dicer+/+ library were not evaluated
for Dicer-dependence, because the transcriptional program of Dicer+' + and Dicer' ES
cells is likely quite different and minor differences in short RNA expression between the
two cell types would be expected. There remained 1,096 novel sequences each present
103
+ library, which were absent in the
and represented by less then three reads in the Dicer+l
Dicer- library and potentially dependent on Dicer for biogenesis. While some are
expected to be Dicer products, as a class they clearly differed from the Dicer-dependent
sequences described above; most notably, these sequences exhibited a broad length
distribution uncharacteristic of Dicer products (SI Figure 9). Thus, if non-miRNA Dicerdependent short RNAs are expressed in ES cells, they are beyond the limits of detection
in the cDNA libraries analyzed here.
Discussion
Of the estimated 130,000 5' phosphorylated short RNAs in an ES cell, roughly
85% are Dicer-dependent miRNAs or miRNA-like species and 15% are Dicerindependent short RNAs. These Dicer-independent RNAs consist primarily of short
ncRNA species, promoter proximal RNAs that are likely the products of paused RNA
polymerase II (described elsewhere, in preparation), presumed breakdown products of
mRNAs, and highly repetitive short RNA sequences.
At a depth of sequencing approaching the total number of 5' phosphorylated short
RNAs per ES cell, the miRNA was the only class of short RNA found to be Dicerdependent. Other classes of Dicer-dependent short RNAs found in many non-mammalian
organisms do not appear to be expressed in ES cells. Specifically not observed were the
Dicer-dependent heterochromatic siRNAs, analogous to those seen in Arabidopsis and S.
pombe, that have been proposed to guide the silencing of ES cell centromeric repeats
(Kanellopoulou et al. 2005). While short RNAs corresponding to highly repetitive
sequences were detected at low levels in the ES cells analyzed here, their biogenesis was
104
Dicer-independent. Moreover, the potential mammalian counterparts to these siRNAs,
piRNAs, were also not detected in the analyzed libraries, nor were C. elegans-like
siRNAs that are anti-sense to mRNAs (see SI). Direct comparison of the Dicer+/+ and
Dicer - libraries did detect a small number of sequences, representing 0.5% of all Dicer+/+
reads, which appeared Dicer-dependent and were not annotated as miRNAs; however,
many of these sequences appeared miRNA-like. In summary, the presented data strongly
favor the hypothesis that Dicer's sole catalytic role in ES cells is to produce miRNAs,
and that the phenotypic consequences of ES cell Dicer deletion are due solely to miRNA
loss (Kanellopoulou et al. 2005; Murchison et al. 2005).
In total, 323 distinct known and novel miRNA sequences were observed in the J1
+ libraries. The most abundant of these have implied functions consistent with
and Dicer""'
the severe growth defects of Dicer null ES cells; miR21, the miR17-92 cluster, the
miR15b/16 cluster, and the miR290-295 cluster, or their human homologues, have
demonstrated roles in cell-cycle regulation or oncogenesis (He et al. 2005; Si et al. 2006;
Voorhoeve et al. 2006; Linsley et al. 2007). Almost half of the 110,000 ES cell miRNAs
can be accounted for by these four loci, suggesting that a major function of the miRNA
pathway in ES cells is to contribute to the control of cell division.
Close to two-thirds of the 323 ES cell miRNAs are expressed at less than 50
copies per cell. A subset of these lowly expressed miRNAs may play important roles in
defining the ES cell state; however, many may have more critical roles in cell types other
than ES cells, especially those that are the most conserved. Considering the latter
possibility, their apparent ES cell expression could be due to the existence of a small
number of differentiated cells within a larger population of undifferentiated ES cells.
105
Alternatively, the diverse set of lowly expressed miRNAs might reflect the heterogeneity
of regulatory systems inherent within a pluripotent ES cell population.
Many of the least conserved ES cell miRNA hairpins overlap annotated repetitive
elements, suggesting that particular miRNAs may partially function to silence
complementary repeat-containing RNAs (Smalheiser and Torvik 2005; Piriyapongsa et
al. 2007). This repression could occur through a canonical miRNA-based targeting
mechanism, resulting in the translational inhibition and targeting to cellular processing
bodies of repeat-containing RNAs with seed complements to repeat-derived miRNAs.
Alternatively, the most repetitive miRNA sequences have the potential to direct cleavage
of transcripts with perfect or near perfect complementarity. Finally, in certain cases, it is
possible that recognition of the miRNA hairpin itself may be the initiating signal for a
silencing event in cis.
In mouse oocytes, repetitive sequences appear to be under Dicer-dependent
repression. Certain repeat-containing mRNAs were found to be expressed at higher levels
in Dicer- compared to Dicer"I oocytes (Murchison et al. 2007). Further, expression of
EGFP reporters with retrotransposon-derived 3'UTRs was repressed in mouse oocytes
(Watanabe et al. 2006). These repressive effects were conjectured to be due to
endogenous siRNA species arising from genomic repeats (Watanabe et al. 2006;
Murchison et al. 2007). Similarly, LINE retrotransposition has been proposed to be
repressed by Dicer-dependent siRNA species in human cells (Yang and Kazazian 2006).
The apparent absence of analogous siRNA species in mouse ES cells, coupled with the
observed relationship between miRNAs and repetitive elements, suggests that in certain
contexts the miRNA pathway may perform functions canonically thought of as siRNA-
106
specific. This hypothesis argues for the re-evaluation of repressive effects associated with
mammalian repetitive elements, and potentially has important implications during early
mouse development, where repetitive element expression is dynamic (Peaston et al.
2004).
Methods
ES cell culture and manipulation.
Generation of Dicer÷
" and Dicer, ES cells, and of Jlaza RNA, is described in the SI.
miRNA quantification was performed essentially as described (Neilson et al. 2007).
Briefly, trypsinized ES cells were counted and lysed directly in Trizol. 1.5 or 3 picomoles
of single-stranded siRNA was spiked into Trizol solutions and quantified to normalize for
short RNA recovery. From 15 preparations, the average total RNA per ES cell was 20pg
and the average short RNA recovery was 76%. miRNA levels were quantified using the
Direct assay (Neely et al. 2006). miRNA molarity per sample was determined by
comparison to standard curves of synthetic miRNAs and normalized for short RNA
recovery. miRNA per cell values were obtained by dividing miRNA copy number per
sample by the number of ES cell equivalents of RNA measured per assay. The number of
5' phosphorylated short RNAs per ES cell, 130,000, was obtained by dividing the miRNA
copy number per cell by the sequencing frequency of each quantified miRNA (SI Table
4) and taking the average for 7 miRNAs quantified in J1 and Dicer+'+ ES cells. Mature
miRNAs sequenced per library included those truncated on their 3' end by one
nucleotide, and those extending beyond the annotated 3' end.
107
Short cDNA library preparation and read processing.
Short cDNA libraries were made as described (Neilson et al. 2007). Gel purifications of
short RNA/DNA species extended from 16 to slightly past 30 nt. Downstream analysis
was performed on sequences with perfect matches to either: the NCBI build 35 of the
mouse genome (mm7), miRBase8.2 (Griffiths-Jones 2004; Griffiths-Jones et al. 2006),
tRNA sequences (Lowe and Eddy 1997), the non-code RNA database (Liu et al. 2005),
ENSEMBL non-coding RNAs (Hubbard et al. 2005), or the complete rDNA repeating
unit (Grozdanov et al. 2003). Conservation and repeat information was obtained using the
UCSC table browser (Karolchik et al. 2004); see SI for details.
Novel miRNA annotation.
Novel miRNAs were annotated according to pre-established guidelines, also
incorporating rules for Drosha processing of primary miRNA transcripts (Ambros et al.
2003a; Zeng et al. 2005; Han et al. 2006); see SI for details. 16 of the 46 novel miRNAs
were verified by other studies at the time of submission (Rfam version 10).
Sequence information.
Analyzed sequences are provided in SI Tables 9-13.
Acknowledgements
We thank A. Young and J. Neilson for critical reading of this manuscript, G. Zheng, C.
Whittaker, and G. Ruby for advice on bioinformatic analysis, M. Lindstrom for figure
help, and the Broad Institute for pyrosequencing. Thanks to D. Livingston for Dicer
108
antibodies. ACS was supported by NIH fellowship 5-F32-HDO51190 and GWY was
funded by the Crick-Jacobs Center for Theoretical and Computational Biology. This
work was supported by United States Public Health Service grants RO1-GM34277 from
the NIH, PO1-CA42063 from the NCI to PAS and partially by grant P30-CA14051 from
the NCI.
109
Table and Figure legends.
Table 1. Composition of cDNA libraries analyzed, represented as percentages of the total
number of reads matching the August 2005 build of the mouse genome ("match mm7").
Also shown is the total number of reads sequenced in each library ("all reads").
Table 2. Direct quantification of specific miRNAs per ES cell. The measured miRNA
copy number is compared to the sequencing frequency per 130,000 reads in the J1 and
Dicer_' - libraries. Error is the SEM from 2 to 21 triplicate measurements.
Table 3. The major miRNAs expressed in ES cells. The genomic location and miRNAs
contained in the chr2 and chrl2 clusters are described in SI Table 7.
Figure 1. Conservation, expression, and repeat overlap of known and novel miRNA
hairpins. (A) Conservation and ES cell expression of known and novel miRNA hairpins.
The percentage of miRNA hairpins overlapping repeats is bracketed for three bins of
conservation. (B) Repeatmasker overlap of known and novel miRNA hairpins. Numbers
refer to the total number of miRNA hairpins in each category. "Multiple" refers to those
hairpins overlapping more than one class of repeat.
Figure 2. Analysis of highly repetitive novel sequences. (A) First nucleotide distribution
of highly repetitive novel sequences (Ž 20 hits to the genome) compared to non-repetitive
novel sequences (< 20 hits to the genome), and known miRNAs.(B) Length distribution
110
of highly repetitive novel sequences compared to all non-repetitive novel sequences. (C)
Repeatmasker classification of highly repetitive novel sequences, represented as
proportions of novel reads per library. The number of novel reads per library is in
parentheses.
Figure 3. Description of Dicer-dependent novel sequences. (A) Length distribution of
Dicer-dependent novel sequences compared to all other Dicer+'+ and Dicer- novel
sequences. (B) First nucleotide distribution of Dicer-dependent novel sequences. (C) Two
groups of Dicer-dependent sequences share sequence similarity. Shown are identified
sequence motifs along with aligning sequences, total reads by library, number of genome
matches, and overlapping repeats.
SI Figure 4. Characterization of Dicer+'+ and Dicer'-ES cells used in this study. (A)
Schematic of the conditional allele of Dicer from (Harfe et al. 2005) and (B) genotyping
PCR confirming successful deletion of the floxed region of Dicer after clonal selection of
+ ES cells. Deletion of Dicer Exon 24 from ES cells resulted in a
Cre-treated Dicer+l
decrease in proliferation rate followed by a growth recovery after several weeks in
culture, as in (Murchison et al. 2005) (not shown). (C) PCR assay for sex-chromosome
determination. Dicer+l+ and Dicer- ES cells appear female, shown by the absence of
SRY. The microsatellite nds serves as a positive control. (D) Western blot confirms loss
of detectable Dicer in Dicer- ES cells. (E) Genomic repeats maintain DNA methylation
in the absence of Dicer. DNA from Dicer+l +, Dicer- -, and DNMT1 null ES (labeled as
"c/c") was digested with the methylation sensitive restriction enzyme Hpa II or its
111
methylation insensitive isoschizamer, Msp I. Representative southern blot probed with
the minor satellite repeat or L1 LINE element shows that Dicer--ES cells retain global
levels of DNA methylation, as in (Murchison et al. 2005). An unmethylated
mitochondrial DNA fragment serves as a loading control. Dicer-/ ES cells consistently
+ controls.
displayed elevated levels of DNA methylation compared to Dicer+l
SI Figure 5. Analysis of 5-aza-deoxycytidine treated ES cells. (A) HPLC trace showing
distinct 5-methyl-C peak. (B) Total DNA methylation assessed by HPLC following 5aza-dC treatment. (C) Quantitative RT-PCR of MuERV element following 5-aza-dC
treatment. The star in (B) and (C) denotes when RNA was collected for analysis.
SI Figure 6. Sequence characteristics of miRNAs from the Dicer +/+and Dicer /
libraries. (A) Length distributions of known ncRNAs between the Dicer +/+ and Dicer libraries: (i) miRNAs; (ii) tRNAs; (iii) non-miRNA/non-tRNA/non-rRNA known
ncRNAs; (iv) rRNAs. Associated D- and p-values for the differences in length
distributions are shown above length histograms. (B) Schematic of miRNA-end
definition. (C, D) miRNA processing variability in the four cDNA libraries.
SI Figure 7. Levels of SINE B1 RNA in Dicer"+ and Dicer/ ES cells. The glutamine
tRNA serves as a loading control.
112
SI Figure 8. Dicer-dependent novel sequences are enriched in SINE and Simple repeat
overlap. Shown is the proportional representation of repeat overlap for all Dicerdependent, Dicer-', and Dicer' 1 novel reads. "No rep overlap" refers to those sequences
not overlapping annotated repeats.
SI Figure 9. Comparison of Dicer-dependentnovel sequences with Dicer+ novel
sequences represented by 1 or 2 reads. (A) First nucleotide comparison shows no bias in
the set of Dicer
/
novel sequences represented by 1 or 2 reads compared to Dicer-
dependent novels. (B) 1- and 2-read Dicer+"novelsare not enriched in SINE or Simple
repeat overlap. (C) 1- and 2-read Dicer
uncharacteristic of Dicer products.
novels have a broad length distribution
113
Feature:
%miRNA
%rRNA
%ncRNA
%tRNA
%novel reads
match mm7
all reads
Table 1.
Jl
86.2
4.8
2.4
1.6
5.0
104,220
149,986
Jiaza
81.6
4.2
4.1
2.1
8.0
115,304
155,934
Dicer+/+
78.0
9.3
1.0
3.0
8.7
45,320
57,834
Dicer 0.5
43.8
7.9
16.9
30.9
33,195
54,339
114
miRNA:
miR15a
miR 15b
miR 16
miR 17-5p
miR 19b
miR 21
miR 30c
Table 2.
JI (quant)
290±50
950±20
1130±140
1510±110
2140±490
2750±410
250±20
JJ (reads)
175
2301
2037
795
14777
6172
2946
Dicer"+(quant)
280±20
970±40
1090±120
1440±170
2340±550
1340±450
220±40
Dicer+/'(reads)
293
1621
1199
1509
3918
2272
379
115
miRNA cluster:
290-295
17-92
chr2
chrl2
21
15b/16
Table 3.
% of Ji miRNAs:
23
17
6
14
6
4
% of Dicer + + miRNAs:
29
11
27
4
2
3
116
VA
* known miRs
to
* novel miRs
C
v
00
I-.
04
'A
chr2 novel
miRs
* chrl2 novel
miRs
01
x chrX novel
miRs
Conservation score
85
4
I/'-5
S LTR
LINE
1SINE
ESimple
I DNA
* Multiple
a No Repeat
5
novel miRs (46)
Figure 1.
329
known miRs (360)
117
C
AtL
v%
0
0-1
BM
35 .
25
0
0
Co
*
16 18 20 22 24 26 28 30 32
Length (nt)
I repetitive novels
D non-repetitive novels
Figure 2.
118
A "40f
30 •
~
201
S10
U Dicer-dependent
1[-
All
oher +/+
novel
0 60
19
Length17
21 23 2
/-
novels
27
€PI
Length (nt)
6'
Group 1 sequences
]QATCICCT
aTCT
ICCTC
Group 2 sequences
]
.. C
Figure 3.
total
rads
4
13
7
20
4
x•
x
j;
~'
+/+
3
5
3
9
4
-I0
0
0
0
0
31
+/
11
9
4
5
5
3
3
13
3
-/-
hits
0
61 Simple~repeat
tntal
reads
12
9
4
7
5
6
C- 3
CT 37
4
genome
hits
1330
7
1257
1222
172
31 ilaza
0
1
6
2
0
4
1
10
0
0
Repeat overlap
B1 SINE
81 SINE
B1 SINE
B1 SINE
B1 SINE
enome
0
0
0
0
3
0
15
1
laza
1
0
0
2
0
0
0
9
0
0
0
0
0
0
0
0
0
6
1
4
8
1
2
1
12
Repeat overlap
B3 SINE; Simple_repeat
Simple repeat; 84 SINE
Simplerepeat; RLTR21 LTR; RMER12 LTR
LIMA8 LINE
Simple repeat
RMER12 LTR; RSINE1; B3 SINE; Simple-repeat
119
A
4 DEADHelicase
RNAse III
PAiZ
OU
I
c
Deleted
region
nds
Endogenous:
Ex•n
+/+ --
351 bp band
.24
SRY
Floxed:
-OP,,If xon
FRT-Neo-FRT
IoxP
410 bp band
S
IoxP
-
w
---
Dicer
GAPDH
530 bp band
IoxP
B
/j
0
\
Hpa II
500bp
I
+/+
II
d+/+
/-
MBp i
I
O-
Hpa II
I
+1+
/
I
- c/c4+
Mep I
I
-/- c/
300bp
LINE 1
minor satellite
SI Figure 4.
mitochondrial
120
[
DADI A,
Sig=280.4A RPe'450.8
rr
J--
L
A
1 .t.,
3.1,
3
2.A
2
1.
I I
EN
m
_
Days post 5-aza-dC treatment
Lf
W
(N
1q
= Days post 5-aza-dC treatment
ouJ
SI Figure 5.
M
121
i)
miRNAs
0.6
N Dicer
ii)
0.7
D=0.29;p=7e-14
++
tRNAs
0.2
D=O5;
I Dicer"
iii)
0=
0.25
So.5(
04
O.Sl
0C
o0.3
0.6
iv)
ncRNAs
; p .39
0.2
Lot
0 0.15
o.2!
0.15
001
i
0
o
16
18
20
22
24
26
o00
16
18
20
22
24
Length (nt)
Length (nt)
26
16
18
20
22
24
o0.1
o°os
16
26
Length (nt)
Drosha
rRNAs
04 p2..
0.25
18
20 22 24
Length (nt)
26
Dicer
*= 5p arm of pre-miRNA: Drosha defined 5' end, Dicer defined 3' end
*= 3p arm of pre-miRNA: Dicer defined 5' end, Drosha defined 3' end
"
D
50
so
0
53p miRs
40
N[ 3p miRs
I
V.U.
80
8 0 .... ............
.....................
... ... ........
... ............
...............
70 M 5p miRs
60 i
3pmiRs
p 30
50
I 20
o
40n
99 0ý
=
1io
41
30S
i
E 20
= 10
0
31
Jlaza
+/+
-/-
(5' ends of miRNA reads)
SI Figure 6.
31
Jlaza
+/+
-/-
(3' ends of miRNA reads)
122
+1
-ISINE B1
Q-tRNA
1
+/+
SI Figure 7.
-I-
123
__
0.8
0.7
I_
I Dicer-dependent novels
M All +/+ novels
All -/- novels
1r
0.6
0.5
0.4
0.3
0.2
0.1
71
0
I
No rep
overlap
SI Figure 8.
SINE
tL
__
Simple
repeat
7
__
LTR
ELINE
124
A
100
do s1o0
-
-
80
S!U
B
~s·
601
0.
~~~s
---~
40
*q#
30
I,
0-
'ier
3"~
0A-i
10 I
"
I
0-
L--.------...--
Dicer-dependent
1- and 2-read
novels
Dicer-dependent
0.45
- Dicer-dependent
0.4
i
;i
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
17 18 19 20 21 22 23 24 25 26 27
Length (nt)
SI Figure 9.
1- and 2-read
novels
125
References
Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss,
G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., and al., e. 2003a. A uniform
system for microRNA annotation. Rna 9(3): 277-279.
Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003b. MicroRNAs
and other tiny endogenous RNAs in C. elegans. CurrBiol 13(10): 807-818.
Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the
P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12):
2092-2102.
Griffiths-Jones, S. 2004. The microRNA Registry. Nucleic Acids Res 32(Database issue):
D109-111.
Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006.
miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids
Res 34(Database issue): D140-144.
Grozdanov, P., Georgiev, 0., and Karagyozov, L. 2003. Complete sequence of the 45-kb
mouse ribosomal DNA repeat: analysis of the intergenic spacer. Genomics 82(6):
637-643.
Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y.,
Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary
microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901.
Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The
RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the
vertebrate limb. ProcNatl Acad Sci USA 102(31): 10898-10903.
He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S.,
Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J., and Hammond, S.M.
2005. A microRNA polycistron as a potential human oncogene. Nature
435(7043): 828-833.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., Clarke, L.,
Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin,
R., Fernandez-Suarez, X.M., Gilbert, J., Hammond, M., Herrero, J., Hotz, H.,
Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S.,
126
Kokocinsci, F., London, D., Longden, I., McVicker, G., Melsopp, C., Meidl, P.,
Potter, S., Proctor, G., Rae, M., Rios, D., Schuster, M., Searle, S., Severin, J.,
Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey,
R., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., and
Birney, E. 2005. Ensembl 2005. Nucleic Acids Res 33(Database issue): D447-453.
Hwang, H.W., Wentzel, E.A., and Mendell, J.T. 2007. A hexanucleotide element directs
microRNA nuclear import. Science 315(5808): 97-100.
Jones, P.A. and Baylin, S.B. 2007. The epigenomics of cancer. Cell 128(4): 683-692.
Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T.,
Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem
cells are defective in differentiation and centromeric silencing. Genes Dev 19(4):
489-501.
Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and
Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res
32(Database issue): D493-496.
Kloosterman, W.P. and Plasterk, R.H. 2006. The diverse functions of microRNAs in
animal development and disease. Dev Cell 11(4): 441-450.
Lee, R.C., Hammell, C.M., and Ambros, V. 2006. Interacting endogenous and exogenous
RNAi pathways in Caenorhabditis elegans. Rna 12(4): 589-597.
Leung, A.K., Calabrese, J.M., and Sharp, P.A. 2006. Quantitative analysis of Argonaute
protein reveals microRNA-dependent localization to stress granules. ProcNatl
AcadSci USA 103(48): 18125-18130.
Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked
by adenosines, indicates that thousands of human genes are microRNA targets.
Cell 120(1): 15-20.
Linsley, P.S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M.M., Bartz, S.R.,
Johnson, J.M., Cummins, J.M., Raymond, C.K., Dai, H., and al., e. 2007.
Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle
progression. Mol Cell Biol 27(6): 2240-2252.
Lippman, Z. and Martienssen, R. 2004. The role of RNA interference in heterochromatic
silencing. Nature 431(7006): 364-370.
Liu, C., Bai, B., Skogerbo, G., Cai, L., Deng, W., Zhang, Y., Bu, D., Zhao, Y., and Chen,
R. 2005. NONCODE: an integrated knowledge database of non-coding RNAs.
Nucleic Acids Res 33(Database issue): D 112-115.
Lowe, T.M. and Eddy, S.R. 1997. tRNAscan-SE: a program for improved detection of
transfer RNA genes in genomic sequence. Nucleic Acids Res 25(5): 955-964.
127
Maksakova, I.A. and Mager, D.L. 2005. Transcriptional regulation of early transposon
elements, an active family of mouse long terminal repeat retrotransposons. J Virol
79(22): 13865-13874.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka,
J., Braverman, M.S., Chen, Y.J., Chen, Z., and al., e. 2005. Genome sequencing in
microfabricated high-density picolitre reactors. Nature 437(7057): 376-380.
Mayhall, E.A., Paffett-Lugassy, N., and Zon, L.I. 2004. The clinical potential of stem
cells. Curr Opin Cell Biol 16(6): 713-720.
Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005.
Characterization of Dicer-deficient murine embryonic stem cells. Proc Natl Acad
Sci USA 102(34): 12135-12140.
Murchison, E.P., Stein, P., Xuan, Z., Pan, H., Zhang, M.Q., Schultz, R.M., and Hannon,
G.J. 2007. Critical roles for Dicer in the female germline. Genes Dev 21(6): 682693.
Neely, L.A., Patel, S., Garver, J., Gallo, M., Hackett, M., McLaughlin, S., Nadel, M.,
Harris, J., Gullans, S., and Rooke, J. 2006. A single-molecule method for the
quantitation of microRNA gene expression. Nat Methods 3(1): 41-46.
Neilson, J.R., Zheng, G.X., Burge, C.B., and Sharp, P.A. 2007. Dynamic regulation of
miRNA expression in ordered stages of cellular development. Genes Dev 21(5):
578-589.
Nilsen, T.W. 2007. Mechanisms of microRNA-mediated gene regulation in animal cells.
Trends Genet.
O'Donnell, K.A. and Boeke, J.D. 2007. Mighty Piwis defend the germline against
genome intruders. Cell 129(1): 37-44.
Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D.,
and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes
and preimplantation embryos. Dev Cell 7(4): 597-606.
Piriyapongsa, J., Marino-Ramirez, L., and Jordan, I.K. 2007. Origin and evolution of
human microRNAs from transposable elements. Genetics 176(2): 1323-1337.
Reik, W., Dean, W., and Walter, J. 2001. Epigenetic reprogramming in mammalian
development. Science 293(5532): 1089-1093.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel,
D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs
and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207.
128
Seitz, H., Royo, H., Bortolin, M.L., Lin, S.P., Ferguson-Smith, A.C., and Cavaille, J.
2004. A large imprinted microRNA gene cluster at the mouse Dlkl-Gtl2 domain.
Genome Res 14(9): 1741-1748.
Si, M.L., Zhu, S., Wu, H., Lu, Z., Wu, F., and Mo, Y.Y. 2006. miR-21-mediated tumor
growth. Oncogene.
Smalheiser, N.R. and Torvik, V.I. 2005. Mammalian microRNAs derived from genomic
repeats. Trends Genet 21(6): 322-326.
Spivakov, M. and Fisher, A.G. 2007. Epigenetic signatures of stem-cell identity. Nat Rev
Genet 8(4): 263-271.
Tolia, N.H. and Joshua-Tor, L. 2007. Slicer and the argonautes. Nat Chem Biol 3(1): 3643.
Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J., Stoop, H., Nagel, R., Liu, Y.P.,
van Duijse, J., Drost, J., Griekspoor, A., and al., e. 2006. A genetic screen
implicates miRNA-372 and miRNA-373 as oncogenes in testicular germ cell
tumors. Cell 124(6): 1169-1181.
Watanabe, T., Takeda, A., Tsukiyama, T., Mise, K., Okuno, T., Sasaki, H., Minami, N.,
and Imai, H. 2006. Identification and characterization of two novel classes of
small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes
and germline small RNAs in testes. Genes Dev 20(13): 1732-1743.
Yang, N. and Kazazian, H.H., Jr. 2006. L 1 retrotransposition is suppressed by
endogenously encoded small interfering RNAs in human cultured cells. Nat Struct
Mol Biol 13(9): 763-771.
Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA
precursors by the nuclear processing enzyme Drosha. Embo J24(1): 138-148.
129
Chapter 3, Appendix
Supporting information referenced in Chapter 3,
"RNA sequence analysis defines Dicer's role in mouse
embryonic stem cells".
This chapter appears in the context of its contemporary science, and, with the exception
of the included description of short rRNA fragments, appears as Supporting Information
for PNAS 104:18097-18102 on the Proceedings of the National Academy of Science's
website.
130
Complete description of repeat-overlapping sequences.
The genomic distribution of repetitive sequences correlated well with total repeat
content along chromosomes, and specific high-density clusters of repeat overlapping
reads were not observed (SI Figure 10A,B). Of note, the ratio of short RNA associated
repeats to total chromosome repeat content in all 4 libraries is highest on chromosome X
(SI Figure 10B,C). This increase is due to a large number of matches to LINE-associated
short RNAs, and not an absolute increase in the number of distinct repetitive short RNA
sequences on chromosome X (SI Figure 10OD, E).
Because of their large number of corresponding genomic locations, the highly
repetitive sequences were not searched comprehensively for novel miRNAs (see SI
Methods). Nevertheless, the total proportions of highly repetitive sequences compared to
all novel reads were similar between the Dicer+l+ and Dicer-" libraries, indicating that as a
class they exist independently of Dicer activity (SI Table 8). Further, these sequences
share several descriptive characteristics with the set of repeat-overlapping novel
sequences with less than 20 hits to the genome, described below, again indicating they
exist as a class of sequences separate from miRNAs.
The repeat overlap in the set of novel sequences with less than 20 hits to the
genome was analyzed (referred to as "non-repetitive repeat-overlapping novel reads"
below). Because these sequences had far fewer genomic hits than the highly repetitive
sequences, a more in depth analysis of their relationship to surrounding genomic
sequence was feasible. Foremost, these novel reads were comprehensively evaluated as
potential miRNAs, and their surrounding genomic sequences are devoid of clear hairpin
structures. As described above for the set of highly repetitive sequences, the proportions
131
of SINE and Simple repeat overlap in this set are reduced in the Dicer-1 library as
compared to the Dicer+' + library, further supporting the idea that a subset of repeatassociated short RNAs depend on Dicer for biogenesis (SI Figure 11A).The nonrepetitive, repeat-overlapping novel sequences showed no clear strand bias with respect
to overlapping repetitive elements, and had the same distribution of length and first
nucleotides as the novel sequences that did not overlap repeats (SI Figure 11A, B, not
shown). As expected, the non-repetitive, repeat-overlapping novel sequences were more
frequently complementary to intergenic regions, and significantly less conserved when
compared to the novel reads that did not overlap repeats (SI Figure 11C,D). Consistent
with the broad chromosomal distribution of the set of highly repetitive novel sequences,
there is no significant clustering of the repeat overlapping reads with less than 20 hits to
the genome (not shown).
No evidence for piRNAs in mouse ES cells.
The recent description of piRNAs in the mouse, rat, and human testis prompted us
to examine if ES cells expressed similar short RNAs (O'Donnell and Boeke 2007). There
was no evidence for a distinct class of 29-31 nt piRNA-like species in any of the four
cDNA libraries. There were, however, 51 distinct sequences, represented by 112 reads,
which uniquely overlapped 14 known piRNA clusters (Lau et al. 2006). 30 of these
sequences, represented by 82 reads, fell into one piRNA cluster and were generated by a
group of known and novel miRNAs that was identified on chromosome X (Table 4).
Accordingly, the length distribution and first nucleotide bias of the reads falling into
132
piRNA clusters were miRNA-like, with a major peak of reads surrounding 22 nt, and
60% of sequences beginning with a 'U' (SI Figure 12A,B).
No evidence for C.elegans-like siRNAs in mouse ES cells.
The possibility that ES cells express endogenous siRNAs similar to those
observed in C.elegans was examined (Ambros et al. 2003b; Ruby et al. 2006; Pak and
Fire 2007; Sijen et al. 2007). Such siRNAs are antisense to protein coding genes, and do
not typically have the 5' mono-phosphates that were selected for in this study.
Nevertheless, an analysis of 5' mono-phosphate-containing short RNAs from C. elegans
did identify two distinct siRNA populations that had a strong 'G' first nucleotide bias and
peaked at 22 and 26 nt, respectively (Ruby et al. 2006). There was no evidence that these
species exist in ES cells. In the four libraries analyzed here, there were 190 distinct
sequences, represented by 261 reads, that were uniquely anti-sense to known protein
coding exons and could be considered potential analogues of C. elegans siRNAs.
However, unlike C. elegans siRNAs, these sequences had no distinct length or first
nucleotide bias when compared to the set of 1319 distinct sequences (1500 reads) that
were uniquely sense to known protein coding genes (SI Figure 12C,D). This apparent
absence of analogous siRNA species in ES cells is not entirely surprising considering that
mammals do not have RdRPs homologous to those required for siRNA production in C.
elegans. Anti-sense exon-overlapping short RNAs were present in all four libraries (SI
Figure 12E). The low abundance of these species in our libraries suggests that they have
limited physiological roles; however, similar, more abundant molecules may be 5' end
modified such that they were excluded in the libraries analyzed here.
133
Description of ES cell short rRNA fragments
Because the RNAi pathway has been implicated in the transcriptional silencing
and nuclear organization of rDNA repeats in several model organisms, characteristics of
the ES cell short RNAs matching the rDNA repeat were examined (Xie et al. 2004; Cam
et al. 2005; Peng and Karpen 2007). It was previously observed in mouse ES cells that
specific short rRNAs exist in the absence of Dicer, and do not function as siRNAs
(Calabrese and Sharp 2006). The large number of reads sequenced in the libraries
described in Chapter 3 allowed for a more extensive analysis of the short rRNAs
expressed in mouse ES cells. Consistent with previous observations, short rRNAs
mapped predominantly to the 45S precursor region of the rDNA repeat, and were
overwhelmingly in the sense orientation with respect to rDNA transcription. 91.9 % of
the 27,899 reads mapping to the rDNA repeat mapped to either the mature 18s, 5.8s, or
28s rRNA, and all of these reads were in the sense orientation relative to rDNA
transcription. The distribution of short RNAs along the mature rRNA species was
surprisingly non-random, and nearly identical between the Dicer+"+ and Dicer-' libraries,
demonstrating that these sequences, like other non-miRNA known ncRNAs, are
generated independently of Dicer (SI Figure 13). Based on miRNA quantification, as a
class, short rRNA species are represented by approximately 9,000 molecules per ES cell.
134
Supporting methods.
Generation and characterization of Dicer - - ES cells.
All ES cells were cultured and transfected as described (Calabrese and Sharp 2006).
Dicer
ES cells were derived from mice homozygous for the floxed Dicer allele
described in (Harfe et al. 2005), and floxed GFP described in (Ventura et al. 2004). To
generate clonal Dicerý cell lines, Cre recombinase was transiently transfected into
Dicer + + ES cells. 24 hours post-transfection cells were plated at clonal density onto
feeder layers and individual GFP negative colonies were selected and cultured until
growth recovered, then expanded, removed from feeder layers, and used for subsequent
analysis. Dicer genotyping oligos, from 5' to 3' as illustrated in SI Figure 4, are as
follows: (1) 5'-CATGACTCTTCAACTCAAACT-3'; (2) 5'CCTGACAGTGACGGTCCAAAG-3'; (3) 5'-AGCATGGGGGCACCCTGGTCCTGG3'. Sex determination of ES cells was determined as described (Conner 2000). Dicer
antibody 1416 was from (Kanellopoulou et al. 2005). 5gg of DNA was used for the
southern blot in SI Figure 1. The minor satellite probe was described in (Martens et al.
2005). The mitochondrial DNA probe and DNMT1 null ES cells were gifts from R.
Jaenisch. LINE L and SINE B 1 probes were PCR amplified using primers from
(Martens et al. 2005). The SINE B 1 northern blot was performed as in (Calabrese and
Sharp 2006).
Jlaza treatment and analysis.
135
JI ES cells were treated with 30gtM 5-aza-2'-deoxycytidine, 5-aza-dC, (Sigma) dissolved
in DMSO, or DMSO only, for 24 hours. DNA and RNA samples were collected for each
sample approximately every two days for a total of 2 weeks. To determine the percent
cellular DNA methylation HPLC analysis was used as described (Ramsahoye 2002).
Samples were loaded onto a Vydac 218TP52 reverse phase C18 HPLC column and a 60
minute isocratic run in 50mM Ammonium phosphate dibasic buffer pH 4.1 was used for
separation. As a control, dTMP, dAMP, dCMP, dGMP, and 5mCMP (Sigma and
Reliable Biopharmaceutical Corporation) were mixed equally and eluted as above.
MuERV-L primers are from (Peaston et al. 2004).
Calculation of mature miRNA processing variability.
To calculate miRNA processing variability, the number of miRNA reads matching each
annotated 5'/3' miRNA end was divided by the total number of reads overlapping the arm
of the hairpin on which the mature miRNA was located. If previously unannotated, 5p
and 3p miRNAs were assigned from miRNA* strands if the miRNA* species represented
>20% of all reads originating from the hairpin. For miRNA hairpins with multiple
genomic locations, only the hairpin with the most aligning reads was evaluated.
Novel miRNA annotation.
Only sequences with <20 matches to the genome were evaluated as potential miRNAs,
with the exception of the 8 sequences with >20 hits to the genome that were sequenced
>3x in the Dicer+
' + library and absent in the Dicer"- library. For each short RNA genomic
location, two potential miRNA hairpins were defined that encompassed 20 and 80 nt of
136
sequence around the 5' and 3' ends of the short RNA. Potential miRNAs were evaluated
for their ability to form hairpins that had secondary structures consistent with Droshaand
Dicer processing (Ambros et al. 2003a; Zeng et al. 2005; Han et al. 2006). Those hairpins
whose RNAfold output (Hofacker et al. 1994) exhibited base pairing over at least 70% of
the length of the potential miRNA, base pairing over or adjacent to the processed ends of
the putative pre-miRNA, symmetrical bulges, and double-strandedness existing
approximately one helical turn past the Drosha-processedend of the miRNA--between 717 nt after the end--were annotated as novel miRNAs. Additionally, in order for a hairpin
to be annotated as a novel miRNA, we required that there existed a sequence comprising
the majority of all reads aligning to the novel hairpin (a dominant miRNA), and that this
dominant miRNA was sequenced more than once and between 20 and 25 nt long.
Requiring a dominant miRNA allowed for unambiguous assignment of the novel
miRNA's seed.
miRNA-like hairpins with aligned reads that did not produce a dominant miRNA
of the specified length, that had only one aligned read, or that had a predicted secondary
structure that did not completely satisfy our requirements, were deemed miRNA
candidates and not included in the final set of novel miRNAs (SI Alignment File). This
set of candidate miRNA hairpins had secondary structures and expression characteristics
similar enough to known miRNAs that we believe short RNAs mapping to these loci
were likely generated by the miRNA-processing pathway; however, the putative hairpins
either lacked sufficient expression for confident annotation, or had characteristics that
differed slightly from known miRNAs.
137
From the set of 8 sequences with >20 hits the genome that were sequenced >3x in
the Dicer+' + library and absent in the Dicer- library, 3 novel miRNAs were identified.
The reads aligning to these three hairpins had 120, 306, 379, and 1416 hits to the genome,
respectively.
Repeat analysis.
Repeat overlap was determined using the Repeatmasker track of the UCSC table browser
(Karolchik et al. 2004). For sequences with >20 hits to the genome, repeat-identity was
assigned to the repeat and class that overlapped most frequently with each short RNA
sequence. For those sequences with >500 hits to the genome, 250 hits were randomly
selected 5 times, and specific repeat and class was assigned if the majority of the 250 hits
overlapped the same repeat and class all 5 times. Otherwise the specific repeat and class
were designated "can't distinguish". For sequences with <20 hits to the genome, the
specific repeat and class were annotated for each genomic hit and normalized to the
number of genomic hits for each sequence.
Conservation and motif analysis.
Four-way mammalian alignments (mm7, hg 17, canFam2 and rn3) were extracted from
the UCSC genome browser's 17-way mammalian alignments for a given region of length
L. The conservation score for a region was determined by (Ji=1:L, Fi)/L, where Fi is the
number of bases in the other species (hg 17, canFam2 or rn3) that is identical to that of
mm7 at position i, divided by the number of aligned species (maximum of 3) at position i.
Information content, IMof motif M of length N was computed as li=1:N, vj fij log2(fij/gj),
138
where fij was the frequency of nucleotide j at position i, and gj was the background
frequency of nucleotide j. Nucleotide background frequencies (gA, gT, gG, gc) were
defined as (0.3, 0.3, 0.2, 0.2). For small sets of sequences, the algorithm MEME was
utilized to identify motifs with a minimum width of 6, and an e-value cutoff of 0.001
(Bailey 1994).
139
Figure legends.
SI Figure 10. Highly repetitive reads distribute uniformly across chromosomes, with the
highest density of short RNA-producing loci on chromosome X. (A) Count of short
RNA-producing loci and total repeat content in 0.5 Mb windows is shown for
representative chromosomes (1 and X). (B) The ratio of repetitive short RNA-producing
loci to total repeat content, per chromosome. (C) The ratios from (B) represented by
cDNA library, normalized to the total number of short RNA-producing loci divided by
total genome repeat content. (D) Proportion of repeat-associated short RNA hits per
chromosome, by repeat class. (E) Count of distinct repetitive short RNAs matching each
chromosome.
SI Figure 11. Analysis of repeat-overlapping novel reads with less than 20 hits to the
genome. (A) Hit-normalized Repeatmasker classification of novel reads with less than 20
hits to the genome; "s" denotes sense overlap; "a" denotes anti-sense overlap. (B) Length
distribution of repeat-overlapping compared to non-repeat overlapping novel reads. (C)
Location of repeat overlapping novel reads with respect to UCSC known genes. (D)
Conservation of repeat-overlapping compared to non-repeat overlapping novel reads. (E)
Ratio of repeat overlapping short RNA loci to total repeat content, per chromosome.
SI Figure 12. Description of sequences that are within piRNA clusters, and anti-sense to
exons. Length distribution, (A), and first nucleotide, (B), of sequences mapping to
piRNA clusters compared to known miRNA sequences. Length distribution, (C), and first
nucleotide, (D), of sequences mapping anti-sense and sense to exons. (E) Proportion of
140
sequences that are uniquely sense and anti-sense to exons, by cDNA library.
SI Figure 13. Comparison of Dicer+'' and Dicer- rDNA reads. Distribution of short
rRNA hit starts along the 45kb rDNA precursor, bases 3500-14000, with Dicer +/+
reads above, and Dicer -/" reads below the x-axis. The location of the mature 18s,
5.8s, and 28s rRNA sequences are indicated below the graph.
141
""^^
0..
0°
1
2500
-repetitive short RNA hits
-total repeat content
Srepetitive short RNA hits
-total
repeat content
2000
2000
I
in
. .
0
c. 1500
(=
ooo1000
1000
-r
500
'1
0
20
40
60
80
100 120 140 160 180 200
0
Chromosome 1 postion (x 10^7 bases)
0.08
F
I
--
M
20
40
60
80
100
120
160
140
Chromosome X postion (x 10^7 bases)
ni
0.07
131
831aza
Dicer+/+
SDicer-/-
0.06
|; '
C 0.05
e
0.03
0.02
0.01
;· i; - - ... -- t
0
0
lllllh
.4.41eo4
.4
4 "14
"14aa
1
'"""
ILINE
1SINE
MSimple_repeat
0 LTR
C
MOther
1100
1050
C
930
tq
SI Figure 10.
14 Id14 "O.4N An A "
_a
.4Nm~aaO.4O4MNuOeN5Q~)
• =*
~.1_·oluu
A.J~r~~(~rr~
142
A
8 31
0
C
J01+5-aza
7t
Dicer +/+
N Dicer -/-
S6
CL4
3
0 41
sSINE
m
aSINE
sSimple
aSimple
s
ILs
NE
R
aLINE
C
0.2
0U
N novels with repeat overlap
-0.18
1
I novels with no repeat overlap
0 0.16
Mnovels with repeat overlap
0 novels with no repeat overlap
0 0.8
S0.14
0.6
&0.12
L
0.1
!
0.1
0 0.08
=
0.4
0.4
0.06
0
0.0
16
20
18
22
0.C
24 26
0
28
30
32
-
0/
Length (nt)
D
, o0
oo
$
0
E
A 0.35
o
0.0018
* novels with repeat overlap
8 novels with no repeat overlap
0.3
c 0.0016
. 0.0014
0.25
0.0012
S0.2
0.001
C
0.15
0.0008
. 0.0006
z 0.0004
01
Q 0.05
e0
0.0002
0
0.2
0.4
0.6
Conservation score
SI Figure 11.
0.8
1
0
molatiner
oNm'
torHVX
143
A
S0.5 .
(a
100-
N miRs
reads from
-
4 0.41
piRNA clusters,
C
80
Cr 0.3
60
o"
40
o S0.2
C
0
16 1 8
.AJII
i,.
20
0
20 22 24 26
Length (nt)
0.2
• 100
I anti-sense to exons
i sense to exons
01
,m .I
A-e
m
miRs
0
so
O.
U
reads from
piRNA clusters
U.
S60
L
c 0.08
40
o0
0 0.04
0
6
0n16
i
0.121
20
0•
o0
18 20 22 24 26 28 30 32
Length (nt)
-
sense to exons
anti-sense to exons
sense to exons
0.081
0.04
~1
-------~'
All
libraries
SI Figure 12.
J
1
1Jlaza
Dicer+/+ Dicer-/-
_'I
-
-
anti-sense to
exons
144
Dicer +/+
iILL..
iJl,
J1 ,I ,1 _~·ll
''
,,t.....
Il
Ji
.I
I
Dicer -/1
18s
SI Figure 13.
5.8s
Sit
--T 1,it L11
28s
S II
.
145
References.
Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X., Dreyfuss,
G., Eddy, S.R., Griffiths-Jones, S., Marshall, M. et al. 2003a. A uniform system
for microRNA annotation. Rna 9(3): 277-279.
Ambros, V., Lee, R.C., Lavanway, A., Williams, P.T., and Jewell, D. 2003b. MicroRNAs
and other tiny endogenous RNAs in C. elegans. Curr Biol 13(10): 807-818.
Bailey, T.L.a.E., C. 1994. Fitting a mixture model by expectation maximization to
discover motifs in biopolymers. Proceedingsof the Second International
Conference on Intelligent Systems for Molecular Biology, August: 28-36.
Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the
P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12):
2092-2102.
Cam, H.P., Sugiyama, T., Chen, E.S., Chen, X., FitzGerald, P.C., and Grewal, S.I. 2005.
Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic
control of the fission yeast genome. Nat Genet 37(8): 809-819.
Conner, D.A. 2000. Mouse Embryonic Stem (ES) Cell Isolation. CurrentProtocols in
MolecularBiology 23.4.1.
Han, J., Lee, Y., Yeom, K.H., Nam, J.W., Heo, I., Rhee, J.K., Sohn, S.Y., Cho, Y.,
Zhang, B.T., and Kim, V.N. 2006. Molecular basis for the recognition of primary
microRNAs by the Drosha-DGCR8 complex. Cell 125(5): 887-901.
Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The
RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the
vertebrate limb. Proc Natl Acad Sci U S A 102(31): 10898-10903.
Hofacker, I.L., Fontana, W., Stadler, P.F., Bonhoeffer, L.S., Tacker, M., and Schuster, P.
1994. Fast Folding and Comparison of RNA Secondary Structures. Monatsh
Chem 125: 167-188.
Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T.,
Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem
cells are defective in differentiation and centromeric silencing. Genes Dev 19(4):
489-501.
Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and
Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res
32(Database issue): D493-496.
Lau, N.C., Seto, A.G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D.P., and
Kingston, R.E. 2006. Characterization of the piRNA complex from rat testes.
Science 313(5785): 363-367.
146
Martens, J.H., O'Sullivan, R.J., Braunschweig, U., Opravil, S., Radolf, M., Steinlein, P.,
and Jenuwein, T. 2005. The profile of repeat-associated histone lysine
methylation states in the mouse epigenome. Embo J 24(4): 800-812.
O'Donnell, K.A. and Boeke, J.D. 2007. Mighty Piwis defend the germline against
genome intruders. Cell 129(1): 37-44.
Pak, J. and Fire, A. 2007. Distinct populations of primary and secondary effectors during
RNAi in C. elegans. Science 315(5809): 241-244.
Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D.,
and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes
and preimplantation embryos. Dev Cell 7(4): 597-606.
Peng, J.C. and Karpen, G.H. 2007. H3K9 methylation and RNA interference regulate
nucleolar organization and repeated DNA stability. Nat Cell Biol 9(1): 25-35.
Ramsahoye, B.H. 2002. Measurement of genome wide DNA methylation by reversedphase high-performance liquid chromatography. Methods 27(2): 156-161.
Ruby, J.G., Jan, C., Player, C., Axtell, M.J., Lee, W., Nusbaum, C., Ge, H., and Bartel,
D.P. 2006. Large-scale sequencing reveals 21U-RNAs and additional microRNAs
and endogenous siRNAs in C. elegans. Cell 127(6): 1193-1207.
Sijen, T., Steiner, F.A., Thijssen, K.L., and Plasterk, R.H. 2007. Secondary siRNAs result
from unprimed RNA synthesis and form a distinct class. Science 315(5809): 244247.
Ventura, A., Meissner, A., Dillon, C.P., McManus, M., Sharp, P.A., Van Parijs, L.,
Jaenisch, R., and Jacks, T. 2004. Cre-lox-regulated conditional RNA interference
from transgenes. P NatlAcadSci USA 101(28): 10380-10385.
Xie, Z., Johansen, L.K., Gustafson, A.M., Kasschau, K.D., Lellis, A.D., Zilberman, D.,
Jacobsen, S.E., and Carrington, J.C. 2004. Genetic and functional diversification
of small RNA pathways in plants. PLoS Biol 2(5): E104.
Zeng, Y., Yi, R., and Cullen, B.R. 2005. Recognition and cleavage of primary microRNA
precursors by the nuclear processing enzyme Drosha. Embo J24(1): 138-148.
147
Chapter 4
Short RNAs in the sense and anti-sense orientation from
transcription initiation sites in mouse embryonic stem cells
The described experiments were an equal collaboration with Amy C. Seila, and also
performed with Gene W. Yeo, and Stuart S. Levine.
148
Abstract
Short RNAs 20-30 nucleotides (nt) in length mediate essential regulatory
processes in eukaryotes by interfering with gene expression on pre- and posttranscriptional levels in processes termed RNA interference (RNAi) (Bartel 2004; Tolia
and Joshua-Tor 2007). To further our understanding of the mechanisms by which short
RNAs regulate gene expression in mammalian cells, we characterized mouse ES cell
short RNAs to a depth of one RNA per cell. Here we describe a new class of short RNAs
that lie in close proximity to the transcription start sites of many genes (TSS-3kb RNAs).
TSS-3kb RNAs exist independently of Dicer activity, and, surprisingly, are found in both
the sense and anti-sense orientation relative to transcription start sites of their associated
genes. TSS-3kb RNAs have no 5' nucleotide bias, in sharp contrast to the initial 'G' bias
of known RNA polymerase II products, suggesting that TSS-3kb RNAs are generated by
a processing event that occurs at the 3' ends of nascent transcripts. Genes associated with
TSS-3kb RNAs show chromatin marks associated with active transcription in human ES
cells, and are predominantly, although not exclusively, expressed in mouse ES cells. We
hypothesize that TSS-3kb RNAs are evidence of widespread bidirectional initiation and
pausing of Pol II in ES cells, and speculate that this frequent bidirectional initiation and
pausing may help maintain chromatin structure in a state poised for rapid transcriptional
activation.
149
Introduction
The initiation of productive transcription is a multi-step process that requires the
concerted action of transcription factors, chromatin modifying enzymes, and nucleosome
remodeling complexes. In part, regulation of this process is achieved by modulating the
recruitment of RNA Polymerase II (Pol II) to promoters (Ptashne and Gann 1997).
However, recent work suggests a large fraction of genes are subject to regulation at the
stage of transcriptional elongation, presumably by a process that induces the pausing of
Pol II near transcription start sites (Guenther et al. 2007; Muse et al. 2007; Zeitlinger et
al. 2007).
Pol II pausing is a phenomenon by which Pol II initiates transcription but pauses
approximately 20-50 nt downstream of initiation sites while remaining elongation
competent (Saunders et al. 2006). The pausing of Pol II has been described for many
genes, including several Drosophilaheat shock genes, and the mammalian c-myc, c-fos,
and junB (Rougvie and Lis 1990; Spencer and Groudine 1990; Aida et al. 2006). In these
specific examples, pausing is induced by DSIF and the negative elongation factor NELF,
and is likely overcome by P-TEFb guided phosphorylation of Pol II's carboxy terminal
domain, and by the transcript cleavage activity of TFIIS (Saunders et al. 2006).
Genome-scale location analyses of markers for transcription initiation suggest that
Pol II pausing occurs at a large fraction of eukaryotic genes. The promoters of many
protein-coding genes in several human cell lines were recently found to associate with
marks of transcription initiation, including RNA Polymerase II (Pol II), and H3 lysine 4
trimethlylated (H3K4me3) nucleosomes (Guenther et al. 2007). Notably, a significant
150
fraction of genes associated with these initiation marks were not expressed at detectable
levels, suggesting the frequent occurrence of initiation without elongation human cells
(Guenther et al. 2007).
Genome-wide location analysis of Pol II in Drosophilamelanogaster also
supports the notion that a significant fraction of genes are subject to post-initiation
transcriptional regulatory mechanisms. Examination of whole embryos or DrosophilaS2
cells showed a promoter proximal enrichment of Pol II at 10-20% of genes (Muse et al.
2007; Zeitlinger et al. 2007). Many of these genes were expressed at low or undetectable
levels, consistent with the observations described above for human cells (Guenther et al.
2007; Muse et al. 2007; Zeitlinger et al. 2007). Further, ontological analysis showed
significant enrichment for genes involved in developmental control or response to
external stimuli, supporting the idea that promoter proximal pausing of Pol II keeps genes
poised to rapidly respond to developmental or environmental changes (Muse et al. 2007;
Zeitlinger et al. 2007). Knockdown of NELF in DrosophilaS2 cells results in loss of
promoter proximal Pol II enrichment at most genes, mechanistically linking genomewide Pol II promoter enrichment to previously described Pol II pausing phenomenon
(Muse et al. 2007).
Herein, we describe a class of short RNAs identified in mouse ES cells that we
hypothesize are products of bidirectionally initiated and paused Pol II. These RNAs are
low in abundance, exist independently of Dicer activity, and cluster near transcription
start sites that associate with features of transcription initiation in ES cells. The existence
of these short RNAs suggests that Pol II may initiate bidirectionally at many ES cell
151
protein-coding genes, potentially as a mechanism that helps maintain chromatin in an
active state.
Results
During the analysis of approximately 300,000 short RNA reads from four mouse
ES cell cultures (described in Chapter 3), we identified a class of short RNAs (TSS-3kb
RNAs) that map in close proximity to the transcription start sites of known proteincoding genes. Specifically, 4,150 reads map to within ±3kb of the transcription start sites
of 2,637 distinct genes. These RNAs cluster bidirectionally around transcription start
sites in a strikingly non-random distribution (Figure lA). There is a major peak of
sequences transcribed in the same orientation as their surrounding genes (defined as the
"sense" orientation), located between 0 and 50 nts downstream of transcription start sites.
A similar, but slightly broader, peak of sequences is observed at approximately 250 nts
upstream of transcription start sites. Surprisingly, the sequences that map upstream of
transcription start sites are transcribed in the opposite orientation (defined as the "antisense" orientation) to their associated genes. These metagene profiles do not significantly
change when subpopulations of sequences that map uniquely to the genome or when
sequences that map to a single gene annotation are analyzed (Figure IB). Further, similar
TSS-3kb metagene profiles were present in 4 independently derived short cDNA
libraries, including a library made from ES cells lacking Dicer, indicating that these
RNAs are common of multiple ES cell lines and not generated by Dicer processing.
(Figure IC).
152
The length distribution of TSS-3kb RNAs is broad and extends from 16 to 31
nucleotides with a mean of 21.5 (Figure 2A). The distinct peak at 21.5 nucleotides
suggests that these RNAs do not arise from a process expected to generate a uniform
distribution, as the cDNA library preparation method selected for RNAs between 17 and
30 nts in length.
Comparison to quantification of miRNA levels (Chapter 3) suggests that
individual TSS-3kb RNAs are present at a maximum of five copies per ES cell. Despite
this low abundance, there are common TSS-3kb sequences and associated genes between
cDNA libraries, suggesting a non-random biogenesis. For example, there were 29
identical TSS-3kb sequences present between the two libraries derived from J ES cells,
indicating that the site of short RNA biogenesis at each gene is not random. Moreover,
TSS-3kb RNAs associate with common genes between cDNA libraries, supporting the
idea that a specific subset of genes generates promoter proximal short RNAs. 115
common genes were associated with short RNAs between the two Jl ES cell derived
libraries, from a total of 582 and 819 TSS-3kb RNA associated genes, respectively.
From this overlap, we estimate the total cellular pool of TSS-3kb associated genes to be
approximately 4,100 (Methods). Further, the frequency of short RNAs per gene in this
set is approximately 0.2, suggesting that the average initiation site in this set of genes has
a detectable short RNA associated with it approximately 1/5 of the time.
Their Dicer-independent biogenesis, together with their non-random orientation
around transcription start sites, suggests that the TSS-3kb RNAs are generated by the
bidirectional initiation of Pol II. Notably, the sense peak of TSS-3kb RNAs is located in
a region relative to transcription initiation sites known to frequently harbor paused Pol II
153
(Saunders et al. 2006; Muse et al. 2007; Zeitlinger et al. 2007). Moreover, the transition
from transcription of a naked DNA template to one wrapped around nucleosomes induces
the pausing of Pol II (Kireeva et al. 2005), and the sense and anti-sense peaks of TSS3kb RNAs surround promoter regions known to typically have low nucleosome density
(Lee et al. 2004; Giresi et al. 2007; Ozsolak et al. 2007). These observations raise the
possibility that Pol II accesses regions of low nucleosome density bidirectionally, and
pauses when transcription complexes encounter regions of higher nucleosomal density.
We hypothesize TSS-3kb RNAs to be the products of this putative promoter proximal
pausing of Pol II in mouse ES cells.
TSS-3kb RNAs have no 5'nucleotide bias, in sharp contrast to the 'G' bias seen at
the 5' ends of most Pol II initiation sites. 25% of TSS-3kb RNAs initiate with 'G', as
compared to the 46% of TSS-3kb associated genes, where RNA synthesis initiates with
'G' (Figure 2B). Consistent with the bias observed for TSS-3kb associated genes,
sequences associated with 5'capped mRNAs frequently begin with 'G'. Transcription
start sites have been mapped throughout the mouse genome using a method known as
Cap Analysis Gene Expression (CAGE). This method is based on the preparation and
sequencing of DNA tags derived from the initial 20 nucleotides of 5'capped mRNAs
(Shiraki et al. 2003; Carninci et al. 2005). Considering that CAGE tags cluster around
transcription start sites, we analyzed the 5' nucleotide bias in this population. Similar to
the bias observed for TSS-3kb associated genes, 51% of CAGE tags begin with 'G'
(Figure 2B). The differences between the 5'nucleotide composition of the TSS-3kb
RNAs and 5'nucelotide of Pol II initiation products strongly suggests that the 5' ends of
the former are generated by processing of Pol II transcripts.
154
Interestingly, TSS-3kb associated promoters are associated with CpG islands
above what would be expected from random chance. 80% of TSS-3kb associated genes
have a CpG island within 1kb of their transcription start site. This represents a strong
enrichment over the total gene population or randomly selected sets of genes (Methods),
55% and 56%±1 of which, respectively, have promoters mapping within 1kb of a CpG
island.
We next performed gene ontology analysis of TSS-3kb associated genes to
determine if they are enriched in notable biological processes (Ashburner et al. 2000;
Beissbarth and Speed 2004). In Drosophila,the pausing of Pol II frequently occurs at
genes involved in responses to developmental or external stimuli (Muse et al. 2007;
Zeitlinger et al. 2007). In contrast, gene ontology analysis of TSS-3kb associated genes
shows no significant enrichment for genes involved in developmental responses or
response to stimuli, although TSS-3kb associated genes are frequently associated with
cellular metabolic processes (Figure 3B).
Supporting the hypothesis that TSS-3kb RNAs are a product of bidirectionally
initiated Pol II, the genes around which they cluster associate with marks of active
transcription. The majority of TSS-3kb associated genes produce full-length transcripts
that are detected in ES cells. For this analysis, we utilized two previously published
mouse ES cell microarrays (Ivanova et al. 2002; Hailesellasse Sene et al. 2007). 74% of
TSS-3kb genes with unambiguous calls on the arrays were present at detectable levels in
ES cells. The remaining 26% of TSS-3kb genes with unambiguous calls on the arrays
were not expressed at detectable levels. Sense and anti-sense short RNAs associated
equally with expressed and non-expressed genes (Figure 4A). Together, these
155
observations show that, although most TSS-3kb associated genes are expressed at
detectable levels, the presence of a TSS-3kb RNA at a promoter is not directly related to
transcript levels.
To determine if the chromatin structure surrounding TSS-3kb associated genes is
consistent with the presence of a bidirectionally initiating Pol II, we compared TSS-3kb
short RNA coordinates with previously published genome-wide location analyses of
chromatin modifications and associated factors in human ES cells (Lee et al. 2006;
Guenther et al. 2007). In order to perform this analysis, mouse TSS-3kb short RNA
coordinates were converted to their homologous human location using the UCSC liftover
utility LocusLink (http://genome.ucsc.edu/; Kent et al. 2002). After this conversion,
2,000 TSS-3kb short RNA coordinates (60% of the total) mapped uniquely to the human
genome, 1,409 of which were within ±3kb of an annotated human gene TSS and present
on the tiling arrays used for the genome-wide analysis. These coordinates showed the
same distribution with respect to transcription start sites as the total population of TSS3kb RNAs, indicating that they comprise a representative subset of the larger population
(not shown).
Comparison to the genome-scale chromatin analyses shows that TSS-3kb RNAs
associate with features of active transcription. 95% of converted TSS-3kb short RNA
coordinates overlapped with sites of H3K4me3, and 79% overlapped with chromatin
bound by the initiated form of Pol II, compared to 8 1±1% and 47±2% for the background
set of coordinates, respectively (Figure 4B). Both of these chromatin marks are
indicative of active transcription. Further, only 5% of converted TSS-3kb coordinates
overlapped with the Polycomb complex component Suzl2 as compared 11 1% for the
156
background set (Figure 4B). Therefore, consistent with their strong association with
markers of transcription activation, TSS-3kb short RNAs are not preferentially associated
with genes that are repressed by the Polycomb complex.
Discussion
In summary, analysis of ES cell short RNA expression to single molecule depth
revealed a class of low abundance, Dicer-independent short RNAs that cluster nonrandomly around transcription start sites of protein-coding genes. Based on previous
work mapping Pol II pause sites and nucleosome density at promoters, we hypothesize
that these RNAs are evidence of widespread bidirectional transcription initiation and
nucleosome-induced pausing of ES cell Pol II. Consistent with this hypothesis, the genes
around which short RNAs cluster associate with features of active transcription in ES
cells, including the presence of H3K4me3 nucleosomes and Pol II at their promoters.
Although it is known that Pol II can pause in the sense direction with respect to its bound
genes, we believe the data presented here to be the first evidence for the presence of a
paused Pol II in the anti-sense direction at many initiation sites. The position of paused,
anti-sense Pol II is inferred by the presence of short RNAs most frequently located -250
nucleotides upstream of the TSS. Anti-sense short RNAs were observed at almost the
same frequency as sense short RNAs, suggesting that significantly more promoters are
capable of executing bidirectional initiation than previously expected (Li et al. 2006).
Also, genes associated with short RNAs are enriched in CpG island promoters,
suggesting that bidirectional transcription may frequently occur at this promoter
structure.
157
TSS-3kb RNAs have no first nucleotide bias, suggesting they do not represent the
initial -22 nucleotides of RNA transcribed by Pol II, but rather have been processed from
the 3' ends of nascent transcripts. There is precedent for the generation of short RNAs
from nascent transcripts: during the TFIIS-enhanced release of promoter-proximal Pol II
pause, although the production of anti-sense short RNAs in this process has not been
previously described (Adelman et al. 2005; Kireeva et al. 2005; Galburt et al. 2007).
TFIIS aids in the reversal of Pol II pause by stimulating the intrinsic nuclease activity of
Pol II, resulting in cleavage near the nascent transcript's 3' end, escape from pause, and
transition to productive elongation (Wind and Reines 2000). Initial in vitro studies
showed that TFIIS cleavage typically releases di-nucleotide RNA fragments from nascent
transcripts, however, pauses induced by specific DNA sequences or nucleosomal barriers
generate longer TFIIS cleavage fragments, up to 30 nt (Izban and Luse 1993a; Izban and
Luse 1993b; Adelman et al. 2005; Kireeva et al. 2005). Although plausible, it seems
unlikely that ES cell TSS-3kb RNAs are produced by TFIIS cleavage, as productive
elongation is largely unidirectional and TSS-3kb RNAs exist in both orientations
surrounding transcription start sites. Nevertheless, there is clearly an established link
between the pausing of Pol II near promoters and the generation of short RNA fragments,
and it is possible that a process mechanistically related to TFIIS cleavage is responsible
for TSS-3kb short RNA production.
Alternatively, TSS-3kb RNAs may be generated by exonucleolytic degradation of
nascent transcripts associated with paused Pol II. The 5' to 3' exonuclease Xrn2 is known
to facilitate a pause-dependent degradation of uncapped Pol II transcripts after poly(A)
site cleavage (Gromak et al. 2006), and may be expected to generate -22 nt TSS-3kb
158
RNAs if paused Pol II remains associated with nascent transcripts during degradation.
This putative biosynthetic pathway would predict the association of Xrn2 with promoter
regions, and would also require that paused transcripts are decapped, as Xrn2 can only
degrade substrates lacking a 7meG cap structure.
Physical detection of the TSS-3kb short RNAs will likely provide insights into
their mechanisms of biogenesis and biochemical properties. The broad, non-random size
distribution of TSS-3kb short RNAs suggests that they are produced over a large range of
sizes, and peak at -20 nt in length. Visualization of specific TSS-3kb short RNAs,
perhaps via a hybridization-based protocol, will be necessary to either confirm or refute
this observation. Initial detection attempts using a standard short RNA northern blotting
procedure and large quantities of ES cell RNA have failed to detect TSS-3kb short RNAs
(A.C. Seila, not shown), necessitating the development of more sensitive detection
methods. In addition to size visualization, the establishment of a hybridization-based
detection protocol will allow the determination of the TSS-3kb short RNA end
modifications. The technology used to prepare the analyzed short cDNA libraries relied
on the presence of a 5' phosphate and 3' hydroxl, and thus the sequenced TSS-3kb RNA
sequences likely have these end modifications; however, it is possible that related short
RNA species were excluded from the prepared cDNA libraries due to incompatible end
modifications, such as a me 7G cap structure.
Moreover, the physical detection of a transcribing, anti-sense Pol II is needed to
support the hypothesis that Pol II initiates and pauses bidirectionally at thousands of ES
cell genes. The detection of an anti-sense transcription bubble in vivo via potassium
permanganate cleavage assays would be a strong indication of the presence of a
159
transcriptionally engaged anti-sense Pol II at promoters. Alternatively, genome-scale
location analysis of Pol II using DNA fragmented with micrococcal nuclease may
provide high enough resolution to differentially detect sense and anti-sense Pol II at the
same promoter. Lastly, high-density promoter tiling arrays or large-scale sequencing of
-200 nt RNAs may reveal the larger anti-sense transcripts proposed to be the precursors
of TSS-3kb short RNAs. One such analysis performed in human cells did reveal the
presence of <200 nt anti-sense RNAs near promoters (Kapranov et al. 2007); whether or
not these human short RNAs are related to the process that generates ES cell TSS-3kb
RNAs is unclear.
We hypothesize that the putative bidirectional initiation and pausing of Pol II may
be a mechanism by which ES cell genes are maintained in a state poised for
transcriptional activation. Future work examining the location and polarity of Pol II at ES
cell promoters will be the first step towards proof of this hypothesis. It will also be of
great interest to see if other cell types express short RNAs that similarly cluster around
transcription start sites. Of particular interest are the short RNAs present in Drosophila
S2 cells, a single cell type in which the genome-wide pausing of Pol II is prevalent and
now well documented (Muse et al. 2007). It is possible that bidirectional initiation of Pol
II is an unobserved aspect of previously described pausing phenomena, and that
downstream factors, such as pTEFb, impose the unidirectionality of productive
elongation upon release from pause.
160
Methods
Short RNA sequencing and identification of TSS-3kb RNAs.
Preparation of short cDNA libraries and 454 read processing are described in Chapter 3
of this thesis. To identify TSS-3kb RNAs, genomic coordinates of previously
uncharacterized short RNAs were compared with all mouse genes annotated in the UCSC
known gene and RefSeq databases (http://genome.ucsc.edu/; Pruitt et al. 2005; Hsu et al.
2006). Those short RNAs within ±3kb of an annotated transcription start site (TSS) were
defined as TSS-3kb RNAs. Based on this definition, a single TSS-3kb RNA can map to
multiple gene annotations for the same gene. The distance to the TSS is defined as the
distance from the TSS to the end of the short RNA closest to the TSS.
Estimation of total TSS-3kb associated gene number.
The Peterson estimator (Seber 1982) was used to estimate the number of genes associated
with TSS-3kb short RNAs in ES cells, N:
N
nln 2/m2
Where nl and n2 are the number of TSS-3kb genes sampled in the first and second
libraries, respectively, m2 is the number of TSS-3kb genes found in both the nl and n2
libraries, and N the estimated total of TSS-3kb genes. The J1 and Jlaza libraries
contained 582 (ni) and 819 (n2) TSS-3kb genes, respectively. 115 of these genes overlap
between these the two independent samplings (m2) suggesting that the total cellular pool
of TSS-3kb associated genes is approximately 4,100.
161
CpG island overlap, first nucleotide analysis, and Gene Ontology analysis.
CpG island and CAGE coordinates were downloaded from the UCSC genome database
(Karolchik et al. 2003; Karolchik et al. 2004). Genes with transcription start sites located
within lkb of CpG islands were counted as overlapping. To create background sets of
genes, for each TSS-3kb short RNA a gene on the same chromosome was selected
randomly from the list of UCSC known gene and RefSeq annotations. This
randomization was run 100 times. The CpG island association for the background set
was calculated by taking the average of 100 random runs and the error is defined as the
standard deviation for the 100 random sets. All UCSC known gene and RefSeq
annotations were used in the determination of the first nucleotide of protein-coding
genes. Gene ontology analysis was performed using GOstat, comparing to all genes in
the mouse GO database and using the Benjamini method for multiple testing correction
(Beissbarth and Speed 2004).
ES cell expression analysis.
Data from previously published mouse ES cell microarrays was used for expression
analysis (Ivanova et al. 2002; Hailesellasse Sene et al. 2007). The data from
Hailesellasse Sene and colleagues consisted of 3 replicates per chip with present, absent
and marginal calls defined for each gene in each replicate (Hailesellasse Sene et al.
2007). For this analysis the three replicates were combined; each gene with three present
calls was defined as present, each gene with three absent calls was defined as absent, and
each gene with different calls or three marginal calls was excluded from the analysis. For
the data collected by Ivanova and colleagues, we utilized the present and absent calls
162
published in the manuscript (Ivanova et al. , 2002). For both data sets, each probe id in
the affymetrix data set was associated with the appropriate gene symbol. If more than
one probe id was associated with a given gene symbol, then the probe with the highest
expression was used to define the Gene Symbol.
Comparison to human ChIP-chip
The mm7 TSS-3kb coordinates were mapped to human assembly hgl7 using liftover
(http://genome.ucsc.edu/). Of the 3,372 mouse coordinates, 2,000 mapped to the human
genome and 1,599 of these were present on the ChIP-Chip array (Lee et al. 2006). All
transcripts assigned an Entrez-gene ID were used to determine if the TSS-3kb RNA was
within +3kb of a TSS in both the human and mouse genomes (NCBI; Boyer et al. 2005).
Homologene was used to determine if each mouse TSS-3kb associated gene mapped
uniquely to a human gene (Wheeler et al. 2002);
http://www.ncbi.nlm.nih.gov/HomoloGene). 85% of these genes mapped to human genes
that were proximal to a remapped human TSS-3kb short RNA. Overlap was counted if a
TSS-3kb associated gene fell within lkb of an enriched region for the factors described
(Lee et al. 2006; Guenther et al. 2007). A background set of short RNAs was created to
compare the enrichment of TSS-3kb RNA genes in H3K4me3, RNA Pol II, and Suzl2
with what would be expected from random. For each TSS-3kb RNA, the distance from
the short RNA to the nearest promoter was determined, a random gene was selected from
the mouse genome, and the sequence of equal distance from that gene was selected. The
coordinates of this random sequence were then mapped to the human genome assembly
hg 17 using liftover. Randomizing in mouse instead of in human was done to remove any
163
bias arising from the liftover process. We then determined if the human gene proximal to
this random sequence was enriched for H3K4me3, RNA Pol II and Suzl2 as above. The
enrichment for factor binding was calculated by determining the enrichment mean from
100 random data sets and the error is the standard deviation from the 100 data sets.
164
Figure Legends
Figure 1. Distribution of short RNAs around transcription start sites of known genes.
(A) Shown is a metagene profile of all TSS-3kb RNAs and their associated distances to
UCSC known gene or Refseq gene transcription start sites. Counts of TSS-3kb RNA
start positions relative to gene transcription start sites are binned in 50 nucleotide
windows. Red and blue bars represent bins of TSS-3kb RNAs in the sense and anti-sense
orientation with respect to gene transcription, respectively. (B) Metagene profiles for the
67% of all TSS-3kb RNAs that map uniquely to the genome or for the 24% that map to a
single gene annotation. (C) Metagene profiles of TSS-3kb short RNAs in individual
cDNA libraries.
Figure 2. TSS-3kb RNA length and 1st nucleotide distributions. (A) The length
distribution of the TSS-3kb short RNA sequences. (B) Ist nucleotide distribution for the
TSS-3kb short RNAs, TSS-3kb associated genes, and all CAGE tags in the mouse
genome.
Figure 3. CpG island overlap (A) and significantly enriched GO terms (B) for TSS-3kb
associated genes.
Figure 4. TSS-3kb genes tend to associate with features of active transcription. (A)
Metagene profiles of TSS-3kb short RNA locations relative to genes with detectable
(present) or undectable (absent) expression in ES cell. (B) Fraction of TSS-3kb
165
associated genes enriched in protein-bound chromatin fragments compared to a random
background.
166
SENSE
c.)
(n o
C,)
I
-2000
-3000
B)
2000
3000
g
0.
unique sites
-
_~
-1000
0
1000
distance to the TSS
00
o
88
Co
00l)
5)
Cc)o-
(9o
-3000
-2000
,0-00
0.0
10
-1000
0
1000
distance to the TSS
(nc'
!
2000
3000
-3000
J1
60]
4020
I
-2000 -1000
0
1000
distance to the TSS
2000
Jlaza
Ii
8050-
S40
20
-3000 -2000 -1000
0
1000 2000
distance to the TSS
-31
distance to the TSS
Dicer -/ -
200
C
000 -2000 -1000 0 1000 2000 3000
30 00
Dicer +/+
40-
S150
30-
V) 1001
0
s50
C
S0
IY
50j
I5l
0
1000 20010 3000
distance to the TSS
Figure 1.
20
o
9
10-
20
r
0
1000 2000
distance to the TSS
3000
3000
167
c-
HI
S0
00
cN --
I
I
16
20
•
I
I•
25
30
read length
B
Sequence set:
A
C
G
U
First nt TSS-3kb RNAs
0.27
0.24
0.25
0.24
First nt genes with TSS-3kb 0.25
0.21
0.46
0.09
First nt CAGE elements
0.16
0.51
0.13
Figure 2.
0.20
168
A
100
C:
0
4-a
O
80-
0
60as
CO
0-40-
0oc~
0,
200-
-
TSS-3kb All genes Random
genes
set
B
I
~
RNA metabolism
protein metabolism
nucleotide metabolism
..
..
..
cellular metabolic
I
processes
"f
t
'f"
20
40
60
80
Significance of enrichment
(p-value = le X )
Figure 3.
169
A)
O
C0
0
-------.
-----------
h
..........
.- ---:~--o)
0
0
N
E
oO
C5
-3000 -2000 -1000
·
·
0
1000
2000
Distance to the TSS
B)
100
C-
U TSS-3kb
80
0I
60
oC)
40
genes
Random set
20
I
H3K4me3
Figure 4.
Pol II
-n I
Suz12
3000
170
Adelman, K., Marr, M.T., Werner, J., Saunders, A., Ni, Z., Andrulis, E.D., and Lis, J.T.
2005. Efficient release from promoter-proximal stall sites requires transcript
cleavage factor TFIIS. Mol Cell 17(1): 103-112.
Aida, M., Chen, Y., Nakajima, K., Yamaguchi, Y., Wada, T., and Handa, H. 2006.
Transcriptional pausing caused by NELF plays a dual role in regulating
immediate-early expression of the junB gene. Mol Cell Biol 26(16): 6094-6104.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis,
A.P., Dolinski, K., Dwight, S.S., Eppig, J.T. et al. 2000. Gene ontology: tool for
the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1): 2529.
Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell
116(2): 281-297.
Beissbarth, T. and Speed, T.P. 2004. GOstat: find statistically overrepresented Gene
Ontologies within a group of genes. Bioinformatics20(9): 1464-1465.
Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther,
M.G., Kumar, R.M., Murray, H.L., Jenner, R.G. et al. 2005. Core transcriptional
regulatory circuitry in human embryonic stem cells. Cell 122(6): 947-956.
Carninci, P. Kasukawa, T. Katayama, S. Gough, J. Frith, M.C. Maeda, N. Oyama, R.
Ravasi, T. Lenhard, B. Wells, C. et al. 2005. The transcriptional landscape of the
mammalian genome. Science 309(5740): 1559-1563.
Galburt, E.A., Grill, S.W., Wiedmann, A., Lubkowska, L., Choy, J., Nogales, E.,
Kashlev, M., and Bustamante, C. 2007. Backtracking determines the force
sensitivity of RNAP II in a factor-dependent manner. Nature 446(7137): 820-823.
Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. 2007. FAIRE
(Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active
regulatory elements from human chromatin. Genome Res 17(6): 877-885.
Gromak, N., West, S., and Proudfoot, N.J. 2006. Pause sites promote transcriptional
termination of mammalian RNA polymerase II. Mol Cell Biol 26(10): 3986-3996.
Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A. 2007. A
chromatin landmark and transcription initiation at most promoters in human cells.
Cell 130(1): 77-88.
Hailesellasse Sene, K., Porter, C.J., Palidwor, G., Perez-Iratxeta, C., Muro, E.M.,
Campbell, P.A., Rudnicki, M.A., and Andrade-Navarro, M.A. 2007. Gene
171
function in early mouse embryonic stem cell differentiation. BMC Genomics 8:
85.
Hsu, F., Kent, W.J., Clawson, H., Kuhn, R.M., Diekhans, M., and Haussler, D. 2006. The
UCSC Known Genes. Bioinformatics22(9): 1036-1046.
http://genome.ucsc.edu/.
Ivanova, N.B., Dimos, J.T., Schaniel, C., Hackney, J.A., Moore, K.A., and Lemischka,
I.R. 2002. A stem cell molecular signature. Science 298(5593): 601-604.
Izban, M.G. and Luse, D.S. 1993a. The increment of S11-facilitated transcript cleavage
varies dramatically between elongation competent and incompetent RNA
polymerase II ternary complexes. JBiol Chem 268(17): 12874-12885.
-. 1993b. SII-facilitated transcript cleavage in RNA polymerase II complexes stalled early
after initiation occurs in primarily dinucleotide increments. JBiol Chem 268(17):
12864-12873.
Kapranov, P., Cheng, J., Dike, S., Nix, D.A., Duttagupta, R., Willingham, A.T., Stadler,
P.F., Hertel, J., Hackermuller, J., Hofacker, I.L. et al. 2007. RNA maps reveal
new RNA classes and a possible function for pervasive transcription. Science
316(5830): 1484-1488.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin,
K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J. et al. 2003. The UCSC Genome
Browser Database. Nucleic Acids Res 31(1): 51-54.
Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W., Haussler, D., and
Kent, W.J. 2004. The UCSC Table Browser data retrieval tool. Nucleic Acids Res
32(Database issue): D493-496.
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and
Haussler, D. 2002. The human genome browser at UCSC. Genome Res 12(6):
996-1006.
Kireeva, M.L., Hancock, B., Cremona, G.H., Walter, W., Studitsky, V.M., and Kashlev,
M. 2005. Nature of the nucleosomal barrier to RNA polymerase II. Mol Cell
18(1): 97-108.
Lee, C.K., Shibata, Y., Rao, B., Strahl, B.D., and Lieb, J.D. 2004. Evidence for
nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36(8):
900-905.
Lee, T.I., Jenner, R.G., Boyer, L.A., Guenther, M.G., Levine, S.S., Kumar, R.M.,
Chevalier, B., Johnstone, S.E., Cole, M.F., Isono, K. et al. 2006. Control of
developmental regulators by Polycomb in human embryonic stem cells. Cell
125(2): 301-313.
172
Li, Y.Y., Yu, H., Guo, Z.M., Guo, T.Q., Tu, K., and Li, Y.X. 2006. Systematic analysis
of head-to-head gene organization: evolutionary conservation and potential
biological relevance. PLoS Comput Biol 2(7): e74.
Muse, G.W., Gilchrist, D.A., Nechaev, S., Shah, R., Parker, J.S., Grissom, S.F.,
Zeitlinger, J., and Adelman, K. 2007. RNA polymerase is poised for activation
across the genome. Nat Genet.
NCBI. Homologene. In.
Ozsolak, F., Song, J.S., Liu, X.S., and Fisher, D.E. 2007. High-throughput mapping of
the chromatin structure of human promoters. Nat Biotechnol 25(2): 244-248.
Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2005. NCBI Reference Sequence (RefSeq):
a curated non-redundant sequence database of genomes, transcripts and proteins.
Nucleic Acids Res 33(Database issue): D501-504.
Ptashne, M. and Gann, A. 1997. Transcriptional activation by recruitment. Nature
386(6625): 569-577.
Rougvie, A.E. and Lis, J.T. 1990. Postinitiation transcriptional control in Drosophila
melanogaster. Mol Cell Biol 10(11): 6041-6045.
Saunders, A., Core, L.J., and Lis, J.T. 2006. Breaking barriers to transcription elongation.
Nat Rev Mol Cell Biol 7(8): 557-567.
Seber, G. 1982. The Estimation ofAnimal Abundance and RelatedParameters.Arnold,
London.
Shiraki, T., Kondo, S., Katayama, S., Waki, K., Kasukawa, T., Kawaji, H., Kodzius, R.,
Watahiki, A., Nakamura, M., Arakawa, T. et al. 2003. Cap analysis gene
expression for high-throughput analysis of transcriptional starting point and
identification of promoter usage. ProcNatl Acad Sci U S A 100(26): 1577615781.
Spencer, C.A. and Groudine, M. 1990. Transcription elongation and eukaryotic gene
regulation. Oncogene 5(6): 777-785.
Tolia, N.H. and Joshua-Tor, L. 2007. Slicer and the argonautes. Nat Chem Biol 3(1): 3643.
Wheeler, D.L., Church, D.M., Lash, A.E., Leipe, D.D., Madden, T.L., Pontius, J.U.,
Schuler, G.D., Schriml, L.M., Tatusova, T.A., Wagner, L. et al. 2002. Database
resources of the National Center for Biotechnology Information: 2002 update.
Nucleic Acids Res 30(1): 13-16.
Wind, M. and Reines, D. 2000. Transcription elongation factor SII. Bioessays 22(4): 327336.
173
Zeitlinger, J., Stark, A., Kellis, M., Hong, J.W., Nechaev, S., Adelman, K., Levine, M.,
and Young, R.A. 2007. RNA polymerase stalling at developmental control genes
in the Drosophila melanogaster embryo. Nat Genet.
174
Chapter 5
Examining miRNA function in mouse embryonic stem
cells
The experiments described in this chapter were performed with the help of Arvind Ravi,
Grace X. Zheng, Stuart S. Levine, and Charlie Whittaker.
175
Introduction
The work described in the preceding chapters of this thesis suggests that miRNAs
are the sole regulatory short RNAs that function through the RNAi pathway in mouse
embryonic stem cells. Here, a set of preliminary experiments is described that lay the
foundation for future studies of miRNA function in mouse ES cells. Examination of
large-scale sequencing data shows that although most miRNA genes express a single
mature miRNA species, a minority of miRNA genes are alternatively processed to
generate multiple mature miRNAs expected to each carry distinct regulatory potential.
miRNAs were also shown to repress artificial target genes in ES cells in a manner that
correlated linearly with their log-2 expression values, suggesting that the most highly
expressed miRNAs dominate miRNA-mediated regulation in ES cells. Consistent with
this hypothesis, preliminary experiments indicate that miRNA abundance is a useful
predictor of endogenous miRNA target repression. Finally, the initial characterization of
the growth defect induced by deletion of Dicer in ES cells is described. Steps towards
the establishment of an experimental system in which this growth defect can be
efficiently studied are also described. mRNA expression profiling of cells with and
without functional Dicer shows, surprisingly, that loss of all miRNAs does not
significantly alter mRNA expression profiles in ES cells.
Results and discussion
Alternate miRNA processing in ES cells.
The extent of base pairing between the 5' ends of miRNAs and their targets is a
critical determinant of miRNA specificity and activity in target repression. 3' end
176
complementarity appears generally less important for target repression (Lewis et al.
2003; Doench and Sharp 2004; Brennecke et al. 2005; Lewis et al. 2005). We therefore
examined the variability of ES cell miRNA processing using the cDNA libraries
described in Chapter 3 of this thesis. Because of the importance of the 5' end in target
recognition, miRNA genes that express significant alternate processing products are
expected to regulate additional target genes beyond that predicted by a single mature
miRNA species.
The starting point for this analysis were the 153 miRNA genes encoded by a
single locus that had at least 30 aligning reads from the 4 ES cell cDNA libraries
described in Chapter 3 of this thesis. Because many mature miRNAs are encoded by
multiple genomic locations, and the pre-miRNA hairpins of these paralogous miRNA
genes often have sequence differences between genomic locations, it is possible that
miRNA paralogues are subject to differential processing. Therefore, mature miRNA end
variability for paralogous miRNAs cannot be unambiguously assigned to a single miRNA
hairpin, and these miRNA genes were excluded from the analysis described below. Also,
hairpins with fewer than 30 aligning reads were not considered in the analysis so as to
increase confidence in observed alternate processing products.
The major miRNA produced from 64 of the 153 genes examined differed from
the annotated sequence in the current release of the miRNA database (miRBase 10.0)
(Griffiths-Jones et al. 2006). A full list of the differences is located at
http://luria.mit.edu/caw web/ccr-bcc/mauro/miRBase-diffs.xls. The majority of these
differences were 1 or 2 nucleotide additions or deletions at the miRNA 3' end, and thus
are not expected to significantly alter miRNA function. 13 miRNA genes showed
177
significant 5' alternate processing compared to the annotated sequence in miRBase 10.0
(see Figure IB for selected examples). The hairpins encoding miRNAs 292, 363, and
367 each produce two major products from their 3' arms, and the hairpin encoding miR142 produces two major products from both its 5' and 3' arms. The alternate products
produced by miR-142 have also been identified in primary T-cells, indicating these
processing events are not ES cell specific (Wu et al. 2007). Because of the importance of
5' sequence in miRNA-based repression, these shifted miRNA pairs are expected to
repress expression of different sets of genes, although this hypothesis has not yet been
experimentally tested.
Additionally, there were 12 miRNA genes whose major mature miRNA came
from the hairpin arm opposite from the annotated sequence in miRBase 10.0.
Interestingly, for the two examples shown in Figure 1C, the ES cell strand bias is also the
opposite of what has been observed in a survey of miRNA expression from various
mouse tissues (Landgraf et al. 2007). For example, miR-154 has a 9-fold bias towards
expression from the 3' arm of its pre-miRNA hairpin in ES cells, whereas the sum of
miR- 154 expression in various mouse tissues indicates a -3-fold bias towards 5' arm
expression (Landgraf et al. 2007). Similarly, miR-411 has a 9-fold 3' arm bias in ES
cells, whereas the tissue panel of mouse miRNA expression indicates a 12-fold 5' arm
bias (Landgraf et al. 2007). If these sequencing results are verified by an independent
method of miRNA quantification, they would suggest the existence of a novel pathway
that directs the differential expression of mature miRNAs from pre-miRNA hairpin arms.
miRNA expression levels correlate with repressive capacity.
178
Mammalian miRNAs are thought to be equally incorporated into different
Argonaute-containing RISCs (Liu et al. 2004), and are presumed to function by
translationally inhibiting or cleaving target mRNAs. The large-scale sequencing data
described in Chapter 3 shows that approximately 300 miRNAs are expressed in ES cells
at levels that span approximately 3 orders of magnitude (Figure 2A). It is not yet known
if other mammalian cell types express similar numbers of miRNAs. Large-scale
sequencing data from 6 different human colorectal cell lines potentially reveals fewer
distinct miRNAs expressed per cell (Figure 2B), however differences in the number of
sequences per cDNA library precludes an accurate comparison to the ES cell cDNA
libararies (Cummins et al. 2006). It currently is unclear how the many ES cell miRNAs
affect gene expression. Specifically, it is unknown at what level of expression a miRNA
is expected to carry significant regulatory capacity.
As a first step towards addressing this question, ES cell miRNAs expressed at
varying levels were tested for their ability to repress an artificial miRNA target. Renilla
luciferase reporters containing two perfectly complementary miRNA binding sites in
/ ES cells with an un-targeted
their 3' UTRs were co-transfected into Dicer ' and Dicer-
firefly luciferase to normalize for transfection efficiency. miRNA-mediated repression
was estimated by comparing normalized targeted reporter expression to that of a reporter
lacking miRNA-binding sites. The miRNAs tested, along with their estimated ES cell
expression from Chapter 3, are shown in Table 1. In DicerF' ES cells, reporter
expression correlated inversely with the log-2 transformation of miRNA expression
values, consistent with the hypothesis that miRNA expression levels correlate with
repressive capacity (Figure 3A). This inverse correlation was absent in Dicer- ES cells,
179
indicating it is dependent on miRNA expression (Figure 3B). Further, the correlation
coefficient between reporter repression and miRNA expression increased when Dicer+/
was normalized to Dicer"- relative expression (Figure 3C). The observed increase in
correlation suggests that comparison between the Dicer'" and Dicer" cell lines
introduces an additional level of normalization, perhaps normalizing for variability in
DNA plasmid quality. Together, these results show that the most highly expressed
miRNAs have the largest regulatory capacity in ES cells.
Although the correlation between miRNA expression and reporter repression is
high, there are apparent exceptions. miR-467a is expressed at levels similar to miR-293
and miR-295, but the miR-467a reporter is repressed 3-5 fold less than the miR-293 and
295 reporters (Table 1). Similarly, miR-669a is expressed at similar levels to miR-21,
but represses reporter expression approximately 4-fold less effectively (Table 1). The
significance of these differences in repressive capacity is currently unclear. One potential
hypothesis is that miR-467a and 669a have many more endogenous targets than miRNAs
expressed at similar levels, and so fewer active RISCs containing these miRNAs are
available for repression of the transfected reporter. Alternatively, because the reporter
assay appears to be predominantly measuring mRNA cleavage and not translational
repression (see explanation below), miR-467a and 669a may not be loaded as efficiently
into Ago2-containing complexes, which are the only RISCs capable of catalyzing
complementary mRNA cleavage.
Because most miRNAs likely function by binding with imperfect
complementarity to their target mRNAs, selected miRNAs from Table I were also tested
for their ability to repress reporters containing two imperfect binding sites in their 3'
180
UTRs (Table 1, underlined rows; Figure 3, red dots). The apparent dynamic range of
repression in this assay was low compared to the assay done with perfect reporters. Only
the miR-293 and 295 bulged sites induced more than 2-fold repression, again suggesting
that only the most abundant miRNAs have the potential to significantly alter target gene
expression. Reporters with imperfectly complementary sites to miRNAs 16 and 21 were
approximately 3-fold less repressed compared to reporters with perfectly complementary
sites. Similarly, reporters with imperfectly complementary sites to miRNAs 293 and 295
were approximately 6-fold less repressed compared to reporters with perfectly
complementary sites. These differences suggest that the repression of reporters with
perfect miRNA binding sites is predominantly due to the perfectly matching miRNA and
not related miRNAs with matching seeds.
miRNA abundance is likely a useful predictor of endogenous target repression.
Given the correlation between miRNA expression levels and repressive capacity,
we examined whether the differential expression of luciferase reporters containing
endogenous 3' UTRs between Dicer"'+ and Dicefr ES cells would correlate with the
abundance of the miRNAs targeting those UTRs. For this analysis, twelve 3' UTRs over
a range of targeting abundance (defined below) were cloned into the 3' UTR of Renilla
luciferase and assayed for their ability to confer repression as in Figure 3C. The assayed
3' UTRs are listed in Table 2. miRNA:mRNA pairs listed in the TargetScan database
were utilized to determine a list of miRNA binding sites per 3' UTR (Grimson et al.
2007). Contained in the database are all instances of miRNA "seed" matches to
annotated Refseq 3' UTRs, defined as perfect complementarity between the mRNA and
181
bases 2-7 of the miRNA. To determine targeting abundance, the number of miRNA
molecules per ES cell was summed for all annotated seed matches over each 3' UTR. To
reduce noise from low abundance miRNAs, only miRNAs expressed at greater than 200
copies per ES cell were considered. When considering all binding sites to miRNAs
expressed at greater than 200 molecules per ES cell, no strong correlation between
targeting abundance and repression was observed for the UTRs tested (Figure 4A).
Surprisingly, when considering only conserved miRNA binding sites (i.e. those
conserved in human, mouse, rat, and dog), the correlation coefficient between repression
and targeting abundance, as assessed via linear regression analysis, increased about 10fold (Figure 4B). These results are preliminary, and more UTRs need to be tested before
confident interpretation; however, a potentially similar difference in functionality
between conserved and non-conserved sites was noted in a previous studies analyzing
miRNA-mediated mRNA down-regulation (Lim et al. 2005; Grimson et al. 2007;
Nielsen et al. 2007).
In the assay described in Figure 4, the simplest interpretation of a difference in
normalized reporter expression between Dicer+..' and Dicer - ES cells is that the inserted
UTR represses luciferase expression in a miRNA-dependent manner. The high degree of
correlation between miRNA expression and repression described in Figure 3 argues that
this simple interpretation is at least partially valid; however, the UTRs assayed in Figure
4 ranged in length between 1,000 and 1,700 base pairs, and so potentially contain nonmiRNA sequence elements that may be differentially affected between Dicer+/ and
Dicer-' ES cells. These putative non-miRNA effects may at least partially explain the
182
imperfect correlation between conserved targeting abundance and WT/KO expression
ratio.
Characterizing the phenotypic effects of ES cell Dicer deletion.
It is clear from previously published work and the results presented in this thesis
that ES-like cells can survive in the absence of Dicer (Kanellopoulou et al. 2005;
Murchison et al. 2005). However, although the cells that survive Dicer deletion appear
ES-like in morphology and express normal levels of the pluripotency markers Oct4 and
Nanog, they are incapable of differentiating in all assays tested and so are no longer
functionally stem cells (Kanellopoulou et al. 2005; Murchison et al. 2005; Leung et al.
2006). This loss of pluripotency is likely partly due to the necessity of miRNAs
expressed in differentiated cell types. However, miRNAs almost certainly have cell
autonomous functions in ES cells, as ES cells express a specific set of miRNAs
(Houbaviy et al. 2003), and Dicer deletion results in an acute loss in ES cell growth rate
(described in Figure 5 below).
The cell growth rate of clonal ES cell lines was monitored for three weeks
immediately following deletion of Dicer activity with Cre recombinase. Three different
Dicer conditionalES cell lines were used in the analysis: the Dicer+/+ ES cells described
in Chapter 3, and two Dicer conditional ES lines derived from a cross between Dicer"/
(Harfe et al. 2005) and Dicerfl+/Cre-ERT2 mice, referred to below as cell lines 2FA and
2FC. To monitor growth rate, Dicer conditional ES lines were transfected with Cre
recombinase, plated at clonal density 24 hours post-transfection, and 6 single colonies
were picked from each cell line for further analysis. Because of the clonal selection, the
183
need to grow small numbers of ES cells on feeder cells, and the growth delay induced by
Dicer loss, reliable growth counts were not obtained for all cell lines until 15 days post
Cre-treatment. At final genotyping, 5 Dicer deletion lines were obtained, 3 from parental
line 2FA and 2 from parental line 2FC. No deletion lines from the Dicer- l ES cell line
were obtained; however, the growth recovery kinetics from the 2FA and 2FC Dicer
deletion lines were similar to what was previously observed for the Dicer+'- deletion
lines (J.M.C. and P.A.S., unpublished).
The 5 deletion lines were compared in growth rate and morphology to 9 ES cell
colonies picked from the same transfections that remained Dicer conditional, as assessed
via genotyping PCR (not shown). The averaged growth rate of the deletion lines was
approximately 2-3 fold lower when compared to the conditional lines, up to 18 days post
Cre-treatment (Figure 5A). This difference in growth rate did not appear to be due to
excessive cell death, as the Dicer deletion lines had normal ES cell morphology (Figure
5B for selected examples). After Day 18, the deletion lines showed a coordinated growth
recovery, and at 3 weeks post Cre-treatment, their average growth rate was 21 hours per
doubling as compared to the 16 hours per doubling of the control Dicer conditional ES
lines (Figure 5A). The near simultaneous recovery of several clonal deletion lines argues
against bypass of the growth defect by an acquired DNA mutation, as one would expect
this to occur at random times throughout the 3 week time course. Instead, it seems more
likely that an accumulated change in signaling or epigenetic state results in the growth
recovery of Dicer deletion ES cells.
Because a clear understanding of the growth defect induced by Dicer deletion is
important for the understanding of ES cell miRNA function, attempts were made to
184
establish a system in which Dicer deletion could be induced efficiently in a population of
ES cells. Such a system would allow a characterization of the acute effects induced by
ES cell miRNA loss. ES cell lines were derived that were homozygous for the floxed
allele of Dicer used throughout this thesis (Harfe et al. 2005), and heterozygous for an
allele conferring tamoxifen-inducible Cre expression. This tamoxifen-inducible allele
expresses Cre fused to the estrogen-binding domain of the estrogen receptor (ER), such
that until the ER agonist tamoxifen is added to cells, Cre remains anchored nonfunctionally in the cytoplasm. This Cre-ER fusion has been shown to induce deletion of
various floxed alleles upon tamoxifen addition at close to 100% efficiency in ES cells
(Vallier et al. 2001). Dicere +mice harboring a single integration of the Cre-ER fusion
allele in the Rosa26 locus (constructed by M. E. McLaughlin in the laboratory of T.
Jacks) were obtained from M.S. Kumar (laboratory of T. Jacks) and crossed to Dicert
mice to generate ES cell lines.
Treatment of several different Cre-ER ES cell lines with varying levels of
tamoxifen resulted in inefficient deletion of Dicer, such that no population-wide growth
defect was detectable. At the highest level of tamoxifen treatment, 1.5 giM for 3 days,
approximately 50% deletion of the floxed Dicer allele was detectable (A. Ravi, not
shown). This poor efficacy of Dicer deletion compared to previously published (Vallier
et al. 2001) and unpublished (M.E. McLaughlin) studies using the same Cre-ER allele is
likely due to the strong selection against Dicer loss in ES cells coupled with the low
amount of Cre-ER expression driven by the Rosa26 promoter. Higher deletion frequency
may be achieved in future studies by expressing the Cre-ER fusion protein off of a strong
ES cell promoter, such as the synthetic CAGGS promoter (Niwa et al. 1991).
185
Co-transfection of Cre and GFP followed by sorting of transfected cells was a
much more effective strategy for Dicer deletion, resulting in polyclonal Dicer deletion
populations that were close to 90% pure. Dicer conditional ES cells were transfected and
sorted for GFP positive cells 24 hours after transfection. PCR genotyping, Dicer western
blotting, and short RNA northern blotting for an abundant ES cell miRNA indicate that
FAC-sorted ES cells have approximately a 90% reduction in miRNA levels 5 days postCre transfection (Figure 6A). Consistent with the described growth defect of Dicer
deletion cells, there is a selection against Dicer loss after day 5 post-Cre transfection. At
9 days post-Cre, miRNA and Dicer protein levels are almost to pre-deletion levels
(Figure 6A). It is currently unclear whether many Dicer null ES cells die after day 5, or
they are simply out-competed by faster growing Dicer conditional cells.
To gain insight into the potential change in ES cell identity induced by Dicer loss,
microarray analysis was performed on mRNA from days 0, 5 and 9 post-Cre transfection
(Figure 6A), and from 3 clonal Dicer null ES lines cultured for several months after
Dicer deletion (Figure 6B). Replicate microarrays have not yet been performed and thus
differences between expression of individual genes cannot be interpreted from the arrays
shown in Figure 6B. However, it is clear that loss of Dicer from ES cells does not result
in a major change in ES cell state. The 6 expression profiles are remarkably similar, with
the two most divergent samples having a Pearson-correlation of 0.95 (Figure 6C). These
results are preliminary but intriguing, suggesting that miRNAs do not have an essential
role in governing ES cell expression profiles. Rather, specific transcription factors, likely
Oct4, Sox2, Nanog, and potentially others, appear to be the dominant regulators of ES
cell identity. Considering these results, it seems likely that the function of the major
186
miRNAs expressed in ES cells is to fine-tune the hard-wired output of these transcription
factors, potentially as a means of optimizing the signaling pathways that govern rapid
self-renewal.
Methods
Plasmid construction and transfection.
2x miRNA sites or single 3' UTRs were cloned into the XhoI-ApaI or Xhol-NotI sites in
the pRL-CMV-FLAG 3' UTR from (Petersen et al. 2006). Transfections were performed
as in (Calabrese and Sharp 2006). 2x bulged miRNA reporters were head-to-tail
insertions of 2 copies of the sequence complementary to the annotated miRNA with an
imperfect match from bases 9-13, usually comprised of TTTTT. 2x perfect reporters
were head-to-tail insertions of 2 copies of the sequence complementary to the annotated
miRNA. For 3' UTR reporters, sequence was obtained from the TargetScan database,
and primers were designed that amplified all miRNA sites in each UTR, extending into
flanking genomic sequence when necessary. Exact sequences of DNA oligos used to
construct reporter plasmids are available upon request.
ES cell culture and biological assays.
ES cell culture and northern blots were performed as in (Calabrese and Sharp 2006).
Western blots and genotyping were performed as in (Calabrese et al. 2007). RNA was
prepared using Trizol (Invitrogen). For cell counts, ES cells were plated in 24-well wells
on top of approximately 1 x 10^4 lethally irradiated MEFs, and counted using a
hemacytometer. To estimate MEF contribution, MEF-only wells were also counted at
187
each trypsinization. For array analysis, 5 gg of RNA from each time point was
hybridized to the Affymetrix 4302 chip. Expression data were summarized using
GCRMA (http://www.bioconductor.org,
using Spotfire software.
and hierarchical clustering was performed
188
Table and Figure legends.
Table 1. 2x perfect miRNA reporters used in Figure 3. Also shown are the molecules
per Dicer` + ES cell of each miRNA, along with the relative reporter expression in
Dicer+- + compared to Dicer- ES cells (WT/KO) and the inverse of this ratio, or fold
repression of each miRNA reporter (KO/WT). The miR 467a and 669a constructs were
made by G.X. Zheng.
Table 2. 3' UTR reporters used in Figure 4. along with per UTR targeting abundance,
considering either all binding sites (All sites log2 mpc), or only those sites conserved in
mouse, rat, human, and dog (Conserved log2 mpc). All summations only consider those
miRNAs expressed at greater than 200 copies per ES cell. Also shown is the relative
reporter expression in Dicer" compared to Dicer- ES cells (WT/KO) and the inverse of
this ratio, or fold repression of each 3' UTR reporter (KO/WT).
Figure 1. Notable examples of alternate miRNA processing events from ES cell cDNA
library analysis. The miRNA name is shown below each hairpin, with the total number
of matching reads from the 4 ES cell cDNA libraries shown in parentheses. (A) miRNAs
15b and 106a are examples of miRNAs that have canonical processing patterns. For each
hairpin, one major miRNA species is produced that starts and ends at a defined location.
miRNAs originating from the opposite strand of each hairpin, termed miRNA* species,
are detectable but represent a minority of accumulated sequences. Percentages indicate
the total number of reads matching the highlighted sequence. (B) miRNA genes that
189
produce multiple mature miRNAs from the same hairpin arm. Percentages indicate the
total number of reads initiating at the indicated nucleotide. (C) miRNA genes whose
major mature sequences expressed in ES cells are on the opposite hairpin arm of the
major annotated species. Percentages indicate the total number of reads matching the
highlighted sequence.
Figure 2. Distinct miRNAs expressed in ES cells. (A) The number of distinct miRNAs
expressed in ES cells at the indicated copy number. (B) The number of distinct miRNAs
detected in 6 human colorectal cancer cell lines (Cummins et al. 2006) compared to the
ES cell libraries from Chapter 3 of this thesis. "Library size" refers to the number of
sequence reads per cDNA library. In the case of the colorectal libraries, this number had
to be approximated by dividing the total number of reported reads (266,430) by the
number of libraries analyzed (6).
Figure 3. miRNA repressive capacity correlates with copy number. R2 values refer to
the linearity of relative 2x perfect reporter expression with the log-2 of Dicer - + miRNA
copy number. (A) Ratio of 2x miRNA reporter expression divided by no-site reporter
expression in Dicer' + ES cells. (B) Ratio of 2x miRNA reporter expression divided by
no-site reporter expression in Dicer ES cells. (C) Ratio of relative 2x miRNA reporter
expression in Dicer+/• ES cells divided by relative 2x miRNA reporter expression in
Dicer~ ~ES cells.
190
Figure 4. Relative expression (WT/KO) for the 3' UTR luciferase constructs in Table 2,
compared to the per UTR summation of the log2 transformation of the molecules per ES
cell for seed-matching miRNAs. "Cumulative UTR mpc" refers to sum of the molecules
per cell for miRNAs with seed matches to individual UTRs tested (A) All seed matches,
(B) only those seed matches conserved in mouse, rat, human, and dog.
Figure 5. Doubling time and morphology of Dicer conditional ES cells after transfection
with Cre recombinase. (A) Hours per doubling of clonal ES cell lines picked after Cre
transfection. The blue graph shows doubling times of cells successfully deleted for Dicer
and the red graph cells that remained conditional for Dicer after Cre transfection.
Growth rates are binned in two-day windows counting after Cre transfection. (B)
Morphology and doubling times for selected knockout or conditional ES cell lines at day
15 post-Cre transfection. Circles highlight ES cell colonies. Knockout cells grow slowly
but have a characteristic ES cell morphology.
Figure 6. Expression analysis of Dicer conditional ES cells immediately following, and
several months after Dicer deletion. (A) (i) Genotyping PCR indicating the relative
quantities of floxed and deleted Dicer alleles in a sorted population of ES cells at the
indicated days following Cre transfection. (ii) Dicer western blot analysis of the cell
populations analyzed in panel (i). The bottom band is a non-specific hybridization that
does not reproducibly appear using this antibody. (iii) Short RNA northern blot analysis
of RNA from the cell populations analyzed in panel (i). Quantification of miR-292
relative to Day 0 cells is shown below the blots. The work shown in (A) was performed
191
by Arvind Ravi during a summer rotation project. (B) Dendogram of Affymetrix
microarray expression analysis of RNA from Days 0, 5, and 9 post-Cre transfection, and
from three clonal Dicer knockout lines cultured for approximately 2 months after Dicer
deletion. KO #11 is the cell line that was analyzed for short RNA expression in Chapter
3. (C) Pearson-correlation coefficients between expression profiles in (B). The
microarrays were performed in collaboration with S. S. Levine (laboratory of R. Young)
and panels (B) and (C) were made by C. Whittaker.
192
miRNA
293
467a
295
19b
669a
21
16
18a
669d
34a
466k
466f
485
Table 1.
Mol.
per cell
5069
4435
4082
3918
3872
2272
1199
528
431
384
172
147
123
Rel. exp.
(WT/KO)
0.05
0.16
0.03
0.06
0.29
0.37
0.21
0.41
0.56
0.54
0.72
0.78
0.71
Fold rep.
(KO/WT)
19.8
6.4
33.8
15.4
3.5
2.7
4.8
2.4
1.8
1.9
1.4
1.3
1.4
193
3' UTR
DEK
NUDCD1
ABCC5
M6PR
PTPN9
TMEM168
TOMM70A
DAZAP2
YPEL5
SCML2
LATS2
ELL2
Table 2.
All sites
log2 mpc
10.9
11.1
15.2
17.5
23.7
24.4
41.1
45.1
63.0
67.4
68.0
145.8
Conserved
log2 mpc
0.7
0.7
7.6
0.0
8.3
19.0
1.4
44.6
13.0
23.1
46.1
22.2
(WT/KO)
0.84
1.02
0.73
0.62
0.96
0.93
1.2
0.28
0.78
0.61
0.15
0.82
(KO/WT)
1.2
1
1.4
1.6
1
1.1
0.8
3.6
1.3
1.7
6.7
1.2
194
cAC
A tUc
AC
U'A
ti-C
A-U
C-G
A
UuUu
U
U-.
U- A
C-G
A-U
/t
0GA-U A
Cu
U-C
A-ti
C-ti
c
U
C
U-A
c
UmcA
miRNA
a99%/ (44
ti-A
miRNA GA U G
950/% ACQ miRNA*
A*U
miRNA*
U-u
1%
Cu.AU
CA U U
A-U
c a
A-U
c-G
G-e
A-U
U
c
G-U
U-A
S-C - G*3
A
c
U-A
C-G
G-C
U-A
t-C
A-U
A-U
C U
U-A
6-C
U-A
S.A- U.3
miR-15b
miR-106a
(1158)
(4444)
uc
UA6
c
U-A6
B
A
c
U
C-t
6
A
AC-
U-4
AU
tiU
uU
A-A
u
U
1
A
G
A
c Q
- 19(
0
43 A
C- 6
A-U
U-A
C-tG
A-U
C-t
G-U
A-U
t-C
A-U
U-A
A-C
A-U
A-U
U-A
A-U
47%•-A
G-C
G-6
G 6
G-C
a- tit UA
U
C-GG
UU
AA-UU
AtiU
C-G
U-A
CG
ut
"u
A:
U-A
G-C
UC:G c
A-UG.,
$c:
miR-142
(307)
3%
mIR-292
(12207)
0
28 i
240o
AUt
AA
u UA
G A
U
U
a
A
a
a
ti-ti
A
G
U
U
A-U
U-t
a
U
cU
41%~
U·i~
A: U
t-A
ticA-U
C-G
4• i/
AUA
U-~P C
G-C
AU-A
C
t
A-t
A-t
UC-t
ti-C
i-A
U-A
t*C
GC
A:UA
A-U 3
6-C
U-A
Gti-tG
C-G
100/0 --*U-c-G
A ' A
Uti-U
C-ti
Ct-C
O - C-3
U-A
U-
miR-367
(725)
miR-363
(876)
UA UC U
U
U-A
G-C
C-t
miRNA
U A
tUA
GC·
9%(2405)
U A miRNA*
G6tiU
A- c
U- A
c
6a
0GAA-UUU
A
iRNA
UU:
A'
UmiR-154
c
G·U
AU-A
2.
U
A:
AU
Va
0p- Uý
miR-154
(240S)
Figure 1.
miRNA
8%/i
t
AU
A-Ui
ti-CA
Ati
AU U
67
C-ti
a-U
U-A
ruu:
A
A
U
i-t
A
AA
0
A
C
tc
miRNA*
I-4
A-:
70%
t-C
-A
A
A
G-C
UU-A
U
G
A
U
mIR-411
(1648)
310/.
195
180
160
140
120
100
80
60
40
20
0
-
-
-
-
---
--
*
-r-
>1000
250-999
r-
50-249
0-49
miRNA molecules per ES cell
Library # of miRNAs
J1
306
Dicer
254
380N
156
380T
122
309N
151
309T
126
CACO-2
135
SW480
125
Figure 2.
Reads per
Library size distinct miRNA
104220
341
45320
178
44405
285
44405
364
44405
294
44405
352
44405
329
44405
355
DistinctmiRNAs
per read
0.003
0.006
0.004
0.003
0.003
0.003
0.003
0.003
196
1.4
1.2
1
Dicer +/+
0.8
0.6
*2x perfect
*2x bulged
0.4
R 2 = 0.76
0.2
0
1
10
100
1000
molecules per cell
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
10000
Dicer-/* 2x perfect
S2x bulged
R 2 = 0.13
1
10
100
1000
molecules per cell
10000
1.4
1.2
1
Combined
0.8
0.6
0.4
0.2
0
* 2x perfect
* 2x bulged
R = 0.89
1
Figure 3.
10
100
1000
molecules per cell
10000
197
R 2= 0.046
1.5
All sites
'0.5
0
0
25
50
75
100 125 150
Cumulative UTR mpc
R 2 = 0.652
1.5
-
lConserved sites
T
:0.5
+
-
0
0
10
20
30
40
Cumulative UTR mpc
Figure 4.
50
198
I
70
Cond
60
60
50
50
40
40
30
30
20
20
15/16
17/18 19/20 21/22
Days post Cre
43 hours per doubling
10
-m
-m~
15/16
Days post Cre
49 hours per doubling
22 hours per doubling
Figure 5.
17/18 19/20 21/22
199
A
B
i)
ii)
-__ -T2- F---
-deleted
-floxed
__
am
.
4-Dicer
=
.oma.
& g
iii) m
GAPDH
U6 Control
pre-miR-292
0
.
miR-292
C
1.00
C
0.99
Ct
0.98
KO#
0.97
KO#
0.96
KO#
,9
Figure 6.
40o*o*
-4
...
440
0.95
'*V
I
10
09
0o
*
*to
200
References
Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. 2005. Principles of microRNAtarget recognition. PLoS Biol 3(3): e85.
Calabrese, J.M., Seila, A.C., Yeo, G.W., and Sharp, P.A. 2007. RNA sequence analysis
defines Dicer's role in mouse embryonic stem cells. ProcNatl Acad Sci USA
104(46): 18097-18102.
Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the
P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12):
2092-2102.
Cummins, J.M., He, Y., Leary, R.J., Pagliarini, R., Diaz, L.A., Jr., Sjoblom, T., Barad,
O., Bentwich, Z., Szafranska, A.E., Labourier, E. et al. 2006. The colorectal
microRNAome. ProcNatl Acad Sci USA 103(10): 3687-3692.
Doench, J.G. and Sharp, P.A. 2004. Specificity ofmicroRNA target selection in
translational repression. Genes Dev 18(5): 504-511.
Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., and Enright, A.J. 2006.
miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids
Res 34(Database issue): D140-144.
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P.
2007. MicroRNA targeting specificity in mammals: determinants beyond seed
pairing. Mol Cell 27(1): 91-105.
Harfe, B.D., McManus, M.T., Mansfield, J.H., Hornstein, E., and Tabin, C.J. 2005. The
RNaseIII enzyme Dicer is required for morphogenesis but not patterning of the
vertebrate limb. ProcNatl Acad Sci USA 102(3 1): 10898-10903.
Houbaviy, H.B., Murray, M.F., and Sharp, P.A. 2003. Embryonic stem cell-specific
MicroRNAs. Developmental Cell 5(2): 351-358.
Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T.,
Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem
cells are defective in differentiation and centromeric silencing. Genes Dev 19(4):
489-501.
Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., lovino, N., Aravin, A., Pfeffer, S., Rice,
A., Kamphorst, A.O., Landthaler, M. et al. 2007. A mammalian microRNA
expression atlas based on small RNA library sequencing. Cell 129(7): 1401-1414.
Leung, A.K., Calabrese, J.M., and Sharp, P.A. 2006. Quantitative analysis of Argonaute
protein reveals microRNA-dependent localization to stress granules. Proc Natl
AcadSci USA 103(48): 18125-18130.
201
Lewis, B.P., Burge, C.B., and Bartel, D.P. 2005. Conserved seed pairing, often flanked
by adenosines, indicates that thousands of human genes are microRNA targets.
Cell 120(1): 15-20.
Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. 2003.
Prediction of mammalian microRNA targets. Cell 115(7): 787-798.
Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel,
D.P., Linsley, P.S., and Johnson, J.M. 2005. Microarray analysis shows that some
microRNAs downregulate large numbers of target mRNAs. Nature 433(7027):
769-773.
Liu, J., Carmell, M.A., Rivas, F.V., Marsden, C.G., Thomson, J.M., Song, J.J.,
Hammond, S.M., Joshua-Tor, L., and Hannon, G.J. 2004. Argonaute2 is the
catalytic engine of mammalian RNAi. Science 305(5689): 1437-1441.
Murchison, E.P., Partridge, J.F., Tam, O.H., Cheloufi, S., and Hannon, G.J. 2005.
Characterization of Dicer-deficient murine embryonic stem cells. Proc Natl Acad
Sci USA 102(34): 12135-12140.
Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B.
2007. Determinants of targeting by endogenous and exogenous microRNAs and
siRNAs. Rna 13(11): 1894-1910.
Niwa, H., Yamamura, K., and Miyazaki, J. 1991. Efficient selection for high-expression
transfectants with a novel eukaryotic vector. Gene 108(2): 193-199.
Petersen, C.P., Bordeleau, M.E., Pelletier, J., and Sharp, P.A. 2006. Short RNAs repress
translation after initiation in mammalian cells. Mol Cell 21(4): 533-542.
Vallier, L., Mancip, J., Markossian, S., Lukaszewicz, A., Dehay, C., Metzger, D.,
Chambon, P., Samarut, J., and Savatier, P. 2001. An efficient system for
conditional gene expression in embryonic stem cells and in their in vitro and in
vivo differentiated derivatives. Proc Natl Acad Sci USA 98(5): 2467-2472.
Wu, H., Neilson, J.R., Kumar, P., Manocha, M., Shankar, P., Sharp, P.A., and Manjunath,
N. 2007. miRNA Profiling of Naive, Effector and Memory CD8 T Cells. PLoS
ONE 2(10): e1020.
202
Conclusions and future directions
203
This thesis presents a quantitative analysis of ES cell short RNA populations.
Short RNA fragments of larger non-coding RNAs are present in ES cells at levels high
enough to be detected via standard hybridization-based protocols, and, though they have
no known function, are apparently generated by non-random mechanisms. Additionally,
Pol II transcription start sites associate with a novel class of low abundance
bidirectionally oriented short RNAs. Finally, contrary to initial hypotheses based on the
functions of RNAi in various non-mammalian organisms, miRNAs appear to be the sole
short RNAs expressed through a Dicer-dependent pathway in mouse ES cells. The
current challenge is to fully understand how these various short RNAs contribute to ES
cell biology.
The sequencing of short RNAs associated with the double-stranded RNA binding
protein, P19, led to the observation that short fragments of rRNA were present in ES cells
in a non-random distribution along the lengths of the mature 18S, 5.8S, and 28S species.
High-throughput sequencing of ES cell short RNAs strengthened this initial observation,
showing an almost identical profile of short rRNA fragments in ES cell lines with and
without functional Dicer. These short rRNAs are expressed in ES cells at levels easily
detectable by short RNA northern blotting, and as a class they are relatively abundant,
present at about 9,000 copies per ES cell; however, reporter assays indicate that they do
not have miRNA-like repressive capability. The function of these short rRNA species is
entirely unknown. Although the reporter assays suggest otherwise, it would be of use to
know whether short rRNAs associate with Argonaute proteins in ES cells. It would also
be of interest to measure short rRNA half-life in ES cells. A long half-life would
204
potentially suggest the existence of a mechanism for short rRNA stabilization, and may
be consistent with an unknown function, whereas a short half-life would perhaps suggest
that rRNA fragments represent transient, non-functional degradation products of mature
rRNA species. An additional puzzling observation is that short rRNA species associate
with P19 in ES cells but not in the human embryonic kidney cell line, 293T cells.
Determining whether P19 associates with short rRNA species in other mouse and human
cell lines may provide additional insight into their functionality.
High-throughput sequencing also identified a class of low abundance short RNAs
that cluster in close proximity to the transcription start sites of protein-coding genes.
These RNAs exhibit a strikingly non-random distribution around transcription start sites,
flanking a region of promoters that consistently exhibits low nucleosome density (Lee et
al. 2004; Giresi et al. 2007; Ozsolak et al. 2007). The existence of these short RNAs
raises the possibility that the bidirectional initiation of Pol II at nucleosome-depleted
regions may be a previously unobserved, general feature of transcription. Further,
because productive Pol II elongation is unidirectional from most promoters, the existence
of bidirectionally-oriented, promoter-associated short RNAs potentially indicates that
additional signals, perhaps the concerted action of pTEFb and TFIIS, impose the
unidirectionality of productive elongation post-initiation.
A host of experiments are needed to verify these hypotheses. Perhaps the most
illuminating will be those that prove the existence of bidirectional Pol II transcription at
many ES cell genes. The position of an anti-sense Pol II is inferred by the presence of
promoter proximal anti-sense short RNAs; however, physical detection of an anti-sense
promoter proximal Pol II is needed before this conclusion can be definitively made.
205
Additionally, it will be important to experimentally determine the frequency of sense vs.
anti-sense initiation. Individual promoter proximal short RNAs are expressed at an
estimated 0.2 copies per cell, and approximately 50% of promoter proximal short RNAs
are in an anti-sense orientation. These observations raise a number of important
questions. Is the number of bidirectional initiation events equal to the number of antisense short RNAs present in an ES cell? That is, is anti-sense initiation an event that
occurs at a single gene in approximately one out of ten ES cells? Or does anti-sense
initiation occur at a higher frequency, and associated short RNAs are either not generated
or escaped detection using our cDNA preparation protocol? Moreover, at a single gene,
is there contemporaneous existence of bidirectionally oriented transcription complexes?
These questions await the application of technique that can detect the polarity of DNAbound Pol II at single-nucleosome resolution, perhaps via potassium permanganate
cleavage assays or chromatin immunoprecipitation and sequencing of Pol II-bound DNA
after digestion with micrococcal nuclease.
It will be of great interest to see if promoter proximal short RNAs can be detected
in cell types other than ES cells. Of particular interest would be the short RNAs
expressed in Drosophila S2 cells, where recent work has shown the frequent occurrence
of Pol II pausing downstream of transcription start sites (Muse et al. 2007; Zeitlinger et
al. 2007). It is possible that previously described Pol II pausing phenomena are
associated with bidirectional initiation.
The RNAi pathway has a conserved role in the silencing of repeating and
transposable elements, in many cases through the action of Dicer-dependent siRNA
molecules. Dicer-dependent siRNA species have yet to be documented in mammals, and
206
the work presented in Chapter 3 of this thesis shows no evidence for expression of
siRNA-like molecules in ES cells. Highly repetitive short RNAs are expressed in ES
cells, but at very low levels and thus not expected to be functional. Coupled with the
Dicer-dependent repression of repeating elements in other mammalian cell types
(Watanabe et al. 2006; Yang and Kazazian 2006; Murchison et al. 2007), these
observations raise the possibility that repeat-derived miRNAs may in certain cases
function similarly to putative repeat-derived siRNAs: by silencing fully or partially
complementary repeating elements. Preliminary experiments show no consistent
deregulation of abundantly encoded LINE and SINE repeats in Dicer knockout ES cells
compared to controls (Figure 1), suggesting that miRNAs do not play a major role in the
destabilization of LINE- and SINE-derived RNA in this ES cell line. However, this
negative result does not preclude a role for repeat-derived miRNAs in the posttranscriptional suppression of complementary repeats, nor does it preclude a role for
miRNAs in the silencing of complementary repeats in the early embryo or other
developmental stages.
It is unclear to what extent mammalian repeating elements are silenced through an
RNAi-based mechanism. The expression of repeat-derived miRNAs provides a tangible
mechanism by which the RNAi pathway could repress repetitive element propagation.
Additionally, repeat-derived miRNAs could serve as a means to coordinately repress
diverse repeat containing mRNAs. Different classes of transposable elements, as well as
fusion transcripts between transposable elements and mRNAs, show differential
expression in the earliest stages of mouse development (Peaston et al. 2004). It is
possible then, that repeat-derived miRNAs may coordinately regulate expression of target
207
repeat-containing mRNAs in a stage-specific manner during early development. Further,
it is possible that other currently undefined classes of short RNAs silence mammalian
repeats in early development; alternatively, repetitive sequences may not be under RNAibased control in mammals, although the existence of mammalian piRNAs suggests that
repeating sequences may be repressed via RNAi in the germline. In-depth sequence
analysis of short RNA species from early mouse development will be necessary to further
evaluate the potential role for RNAi in the silencing of repeating elements.
The large amount of short RNA sequence data from ES cells provided interesting
examples of non-canonical miRNA processing events. Many miRNA genes were found
to express overlapping but separate mature miRNAs from the same hairpin arm,
potentially increasing the regulatory capacity of these miRNA genes. Further, it was
found that certain miRNA genes express major ES cell products originating from the
hairpin arm opposite from what has been previously observed in other cell types. This
observation suggests the existence of a pathway that directs the differential incorporation
of mature miRNAs into the RISC. Direct quantification of these differences in ES cells
and other differentiated cell types will be the first step towards testing this hypothesis.
Deletion of Dicer from mouse ES cells results in an acute drop in growth rate,
followed by an apparent recovery approximately 3 weeks post Dicer deletion. Dicer
knockout ES cells are also incapable of differentiation (Kanellopoulou et al. 2005). The
observation that Dicer's sole catalytic role in ES cells is to generate miRNAs suggests
that these phenotypic effects are entirely due to loss of miRNA expression. Surprisingly,
the preliminary data described in Chapter 5, where mRNA expression of ES cells was
profiled immediately and several months after Dicer deletion, shows few major
208
expression changes between cells with and without Dicer. Together, these observations
suggest that the core transcription factors in ES cells dominantly control identity, and
abundant ES cell miRNAs tune this transcriptional output to increase the rate of ES cell
self-renewal. Additionally, miRNAs are required for the transition from pluripotency to a
differentiated cell state. Consistent with the former hypothesis, many of the most
abundant ES cell miRNAs have documented roles in cell-cycle control or oncogenesis
(He et al. 2005; Si et al. 2006; Voorhoeve et al. 2006; Linsley et al. 2007). In the most
poignant example, expression of early embryo specific miRNAs in primary fibroblasts is
sufficient for bypass of Ras-induced senescence, indicating that these miRNAs can
manipulate existing expression profiles to induce cell division (Voorhoeve et al. 2006).
It is possible that ES cells express a set of miRNAs that primarily function to promote
self-renewing cell division, and it would therefore be of great interest to determine if
other known stem cell populations express sets of miRNAs similar to those found in
mouse ES cells.
miRNA expression levels were found to correlate well with repressive capacity.
Surprisingly, only the most abundant ES cell miRNAs, those expressed at greater than
1,000 copies per ES cell, silenced reporter gene expression by 5-fold or more. This
observation suggests, that although ES cells express approximately 300 different miRNA
species, only the -30 most highly expressed contribute significantly to miRNA-mediated
repression. Consistent with this hypothesis, preliminary experiments indicate that the
abundance of the miRNAs targeting specific 3' UTRs is a useful predictor of repression.
Notably, this predictive measure only appeared true for miRNA target sites that were
conserved between human, mouse, rat, and dog, suggesting that additional factors outside
209
of the miRNA target site are important for repression. It will be of great interest to see if
these trends hold true in other cell types. Recently, a quantitative model of miRNAmediated repression was described that predicts the efficacy of different miRNA target
sites, incorporating both target-site-dependent and -independent variables (Grimson et al.
2007). Incorporating endogenous miRNA abundance with this model of target site
efficacy may be an accurate way to predict the extent of miRNA-mediated repression for
individual genes.
From a basic biological standpoint, it is important to place the discoveries and
hypotheses generated from mouse ES cells in the context of the early mouse embryo. ES
cells are derivatives of the inner cell mass (ICM) of the pre-implantation blastocyst, a cell
compartment that exists for only a few hours during mouse development. During the life
of the inner cell mass, it needs to rapidly divide while maintaining pluripotent. Rapid cell
division and pluripotency are hallmarks of ES cells, and so the proposed role for ES cell
miRNAs in promoting in vitro self-renewal is consistent with a similar role in vivo.
However, there are differences between the ICM and ES cells, and so it is possible that
certain abundant ES cell miRNAs have no discernable role in ES cells, but have critical
functions in the ICM, or that certain abundant miRNAs expressed in the ICM are not
expressed in ES cells.
Nevertheless, the understanding of in vitro ES cell biology has important clinical
implications. The guided differentiation of ES cells into various tissue types raises the
possibility that ES-like cells may be a future tissue source in regenerative therapies (Pera
and Trounson 2004). Remarkably, 4 transcription factors, including Oct4 and Nanog, are
sufficient to turn differentiated mouse or human cells pluripotent (Takahashi and
210
Yamanaka 2006; Okita et al. 2007; Takahashi et al. 2007; Wernig et al. 2007; Yu et al.
2007). This finding is a large step forward for the eventual application of regenerative
therapy. Thus the understanding of ES cell gene regulatory networks, including those
governed by short RNAs, will likely have direct clinical applications.
211
Figure 1. Full-length RNA northern blot probed for GAPDH, LINE, and SINE. The
first three lanes represent triplicate collections of Dicer+" ES cell RNA. The following
four lanes show RNA from clonally derived deletion lines, followed by a lane loaded
with Jl ES cell RNA, and a lane with RNA from NIH 3T3 fibroblasts. The short RNA
expression analysis from Chapter 3 was performed on the Dicer+lVand KO-11 ES lines.
Because the SINE RNA probe had such extensive hybridization, the entire blot is shown,
with the locations of the 28S and 18S rRNA, and the likely full-length SINE RNA,
marked beside the blot. The northern blot was performed as in (Calabrese and Sharp
2006).
212
I 4O
Parent
II-
.
W,
Dicer +/+ 0 0 0 0••d w4Z
GAPDH
LINE
[28s rRNA)
[18s rRNA)
SINE BI
Figure 1.
213
References
Calabrese, J.M. and Sharp, P.A. 2006. Characterization of the short RNAs bound by the
P19 suppressor of RNA silencing in mouse embryonic stem cells. Rna 12(12):
2092-2102.
Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. 2007. FAIRE
(Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active
regulatory elements from human chromatin. Genome Res 17(6): 877-885.
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P.
2007. MicroRNA targeting specificity in mammals: determinants beyond seed
pairing. Mol Cell 27(1): 91-105.
He, L., Thomson, J.M., Hemann, M.T., Hernando-Monge, E., Mu, D., Goodson, S.,
Powers, S., Cordon-Cardo, C., Lowe, S.W., Hannon, G.J. et al. 2005. A
microRNA polycistron as a potential human oncogene. Nature 435(7043): 828833.
Kanellopoulou, C., Muljo, S.A., Kung, A.L., Ganesan, S., Drapkin, R., Jenuwein, T.,
Livingston, D.M., and Rajewsky, K. 2005. Dicer-deficient mouse embryonic stem
cells are defective in differentiation and centromeric silencing. Genes Dev 19(4):
489-501.
Lee, C.K., Shibata, Y., Rao, B., Strahl, B.D., and Lieb, J.D. 2004. Evidence for
nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36(8):
900-905.
Linsley, P.S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M.M., Bartz, S.R.,
Johnson, J.M., Cummins, J.M., Raymond, C.K., Dai, H. et al. 2007. Transcripts
targeted by the microRNA-16 family cooperatively regulate cell cycle
progression. Mol Cell Biol 27(6): 2240-2252.
Murchison, E.P., Stein, P., Xuan, Z., Pan, H., Zhang, M.Q., Schultz, R.M., and Hannon,
G.J. 2007. Critical roles for Dicer in the female germline. Genes Dev 21(6): 682693.
Muse, G.W., Gilchrist, D.A., Nechaev, S., Shah, R., Parker, J.S., Grissom, S.F.,
Zeitlinger, J., and Adelman, K. 2007. RNA polymerase is poised for activation
across the genome. Nat Genet.
Okita, K., Ichisaka, T., and Yamanaka, S. 2007. Generation of germline-competent
induced pluripotent stem cells. Nature 448(7151): 313-317.
214
Ozsolak, F., Song, J.S., Liu, X.S., and Fisher, D.E. 2007. High-throughput mapping of
the chromatin structure of human promoters. Nat Biotechnol 25(2): 244-248.
Peaston, A.E., Evsikov, A.V., Graber, J.H., de Vries, W.N., Holbrook, A.E., Solter, D.,
and Knowles, B.B. 2004. Retrotransposons regulate host genes in mouse oocytes
and preimplantation embryos. Dev Cell 7(4): 597-606.
Pera, M.F. and Trounson, A.O. 2004. Human embryonic stem cells: prospects for
development. Development 131(22): 5515-5525.
Si, M.L., Zhu, S., Wu, H., Lu, Z., Wu, F., and Mo, Y.Y. 2006. miR-21-mediated tumor
growth. Oncogene.
Takahashi, K., Tanabe, K., Ohnuki, M., Narita, M., Ichisaka, T., Tomoda, K., and
Yamanaka, S. 2007. Induction of Pluripotent Stem Cells from Adult Human
Fibroblasts by Defined Factors. Cell.
Takahashi, K. and Yamanaka, S. 2006. Induction of pluripotent stem cells from mouse
embryonic and adult fibroblast cultures by defined factors. Cell 126(4): 663-676.
Voorhoeve, P.M., le Sage, C., Schrier, M., Gillis, A.J., Stoop, H., Nagel, R., Liu, Y.P.,
van Duijse, J., Drost, J., Griekspoor, A. et al. 2006. A genetic screen implicates
miRNA-372 and miRNA-373 as oncogenes in testicular germ cell tumors. Cell
124(6): 1169-1181.
Watanabe, T., Takeda, A., Tsukiyama, T., Mise, K., Okuno, T., Sasaki, H., Minami, N.,
and Imai, H. 2006. Identification and characterization of two novel classes of
small RNAs in the mouse germline: retrotransposon-derived siRNAs in oocytes
and germline small RNAs in testes. Genes Dev 20(13): 1732-1743.
Wernig, M., Meissner, A., Foreman, R., Brambrink, T., Ku, M., Hochedlinger, K.,
Bernstein, B.E., and Jaenisch, R. 2007. In vitro reprogramming of fibroblasts into
a pluripotent ES-cell-like state. Nature 448(7151): 318-324.
Yang, N. and Kazazian, H.H., Jr. 2006. L 1 retrotranspositior is suppressed by
endogenously encoded small interfering RNAs in human cultured cells. Nat Struct
Mol Biol 13(9): 763-771.
Yu, J., Vodyanik, M.A., Smuga-Otto, K., Antosiewicz-Bourget, J., Frane, J.L., Tian, S.,
Nie, J., Jonsdottir, G.A., Ruotti, V., Stewart, R. et al. 2007. Induced Pluripotent
Stem Cell Lines Derived from Human Somatic Cells. Science.
Zeitlinger, J., Stark, A., Kellis, M., Hong, J.W., Nechaev, S., Adelman, K., Levine, M.,
and Young, R.A. 2007. RNA polymerase stalling at developmental control genes
in the Drosophila melanogaster embryo. Nat Genet.
215
J. Mauro Calabrese
Biographical note
Education
Ph. D. Biology 2007
Massachusetts Institute of Technology, Cambridge, MA
B.S. Chemistry, Biochemistry, and Molecular Biology 2001
University of Wisconsin-Madison, Madison, WI
Awards
Phi Beta Kappa
Elvehjem Scholarship for Excellence in Biochemistry
University of Wisconsin Alumni Foundation Scholarship
Order Sons of Italy Scholar
Research Experience
Graduate Student, MIT Department of Biology, 2002-2007
Advisor: Phillip A. Sharp, Ph.D.
Doctoral Thesis: Dicer deletion and short RNA expression analysis in mouse ES cells
Research Assistant, Department of Biochemistry, University of Wisconsin-Madison,
2000-2001
Advisor: Hector Deluca, Ph.D.
Project: Purifying a novel co-activator of the aryl-hydrocarbon receptor from pig lung
Research Assistant, Department of Ophthalmology, University of Wisconsin-Madison
1999-2000
Advisor: Len Levin, M.D., Ph.D.
Project: Synthesizing a photoactivatable inducer of the Tet-On gene-expression system
Research Assistant, Department of Bioinorganic Chemistry, University of Delaware 1998
Advisor: Charles Riordan, Ph.D.
Project: Chemical modeling of anaerobic bacterial methanogenesis
Publications
216
J. Mauro Calabrese*, Amy C. Seila*, Gene W. Yeo, and Phillip A. Sharp. (2007). RNA
sequence analysis defines Dicer's role in mouse embryonic stem cells. PNAS
104:18097-18102.
J. Mauro Calabrese and Phillip A. Sharp. (2006) Characterization of the short RNAs
bound by the p19 suppressor of RNA silencing in mouse embryonic stem cells. RNA 12:
2092-2102.
Anthony K. L. Leung, J. Mauro Calabrese, and Phillip A. Sharp. (2006) Quantitative
analysis of argonaute protein reveals microRNA-dependent localization to stress
granules. PNAS 103: 18125-18130.
* denotes equal contribution.
Teaching Experience
Teaching Assistant, 2006
Course Title: Principles of Human Disease
Instructors: David E. Housman and Jacqueline A. Lees
Teaching Assistant, 2004
Course Title: Undergraduate Cell Biology
Instructors: Angelika Amon and Harvey F. Lodish
Presentations
Oral presentation: "In depth analysis of embryonic stem cell short RNA expression:
Observations and functional implications" Center for Cancer Research Fall Retreat,
Waterville Valley, NH, 2007.
Oral presentation: "In depth analysis of embryonic stem cell short RNA expression."
RNA society conference, Madison, WI, 2007.
Poster presentation: "RNAi and the silencing of mammalian repetitive elements."
Genomic Impact of Eukaryotic Transposable Elements, Asilomar, CA, 2006.
Poster presentation: " RNAi and the silencing of mammalian repetitive elements."
Keystone symposium on RNA interference, Vancouver, Canada, 2005.
Download