Evolution of microRNA genes by inverted duplication Edwards Allen , Zhixin Xie

advertisement
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
ARTICLES
Evolution of microRNA genes by inverted duplication
of target gene sequences in Arabidopsis thaliana
Edwards Allen1,2, Zhixin Xie1,2, Adam M Gustafson1,2, Gi-Ho Sung2, Joseph W Spatafora2 &
James C Carrington1,2
MicroRNAs (miRNAs) in plants and animals function as post-transcriptional regulators of target genes, many of which are
involved in multicellular development. miRNAs guide effector complexes to target mRNAs through base-pair complementarity,
facilitating site-specific cleavage or translational repression. Biogenesis of miRNAs involves nucleolytic processing of a precursor
transcript with extensive foldback structure. Here, we provide evidence that genes encoding miRNAs in plants originated by
inverted duplication of target gene sequences. Several recently evolved genes encoding miRNAs in Arabidopsis thaliana and other
small RNA–generating loci possess the hallmarks of inverted duplication events that formed the arms on each side of their
respective foldback precursors. We propose a model for miRNA evolution that suggests a mechanism for de novo generation of
new miRNA genes with unique target specificities.
miRNAs (B20–22 nucleotides) function as guides that direct effector
complexes to specific mRNA targets, resulting in negative regulation
through degradative pathways or cotranslational repression1. miRNA
biogenesis involves multistep processing of self-complementary (foldback) precursor transcripts. In animals, pri- (foldback excision) and
pre-miRNA processing is catalyzed by the RNaseIII-like enzymes
Drosha and Dicer2–4; in plants, multiple steps may be catalyzed by
Dicer-like1 (DCL1)5–7. Although miRNAs have been found only in
land plants and animals, most eukaryotes possess the RNA interference (RNAi) machinery through which miRNAs operate8. Formation
of many genes encoding miRNAs in plants predates the split of
monocots and dicots (B150 million years ago); many animal genes
encoding miRNAs predate the divergence of metazoans (B600
million years ago)9,10. There are no known orthologous miRNAs or
common miRNA targets between plants and animals1. Furthermore,
several miRNA-controlled regulatory networks, such as those involving miR165/166, are found only in plant lineages11,12. Plant miRNA
targets typically contain single sites that are complementary to one
miRNA family, whereas animal targets usually contain multiple,
weakly complementary sites within 3¢ untranslated regions1.
A. thaliana contains at least 22 miRNA families, most of which are
conserved between monocots and dicots13,14. Twelve of these families
target mRNAs encoding transcription factors involved in development.
Perturbation of miRNA biogenesis or targeting frequently leads to
developmental defects, including alterations in leaf morphology, meristem identity, patterning and reproductive development15–21. Several
A. thaliana miRNAs affect hormone signaling pathways13,19, and
others regulate genes involved in stress responses13,22. Two miRNAs
(miR162 and miR168) regulate genes required for miRNA biogenesis
or activity23,24. A. thaliana also contains several nonconserved miRNAs, such as miR161 and miR163. The nonconserved miRNAs are
represented by single genes rather than multigene families13.
New gene regulatory networks can result from duplication of
protein-coding sequences and regulatory elements25,26. In this paper,
we explore the possibility that genes encoding miRNAs in plants
originated by duplication events from sequences in target gene families.
RESULTS
MIR161 and MIR163 have extended similarity to target genes
Loci capable of forming a transcript that adopts an extended foldback
structure can arise by inverted duplication events. If the originating
sequence is a protein-coding gene, then the originating gene and
closely related family members could be brought under negative
regulation by RNAi through short interfering RNAs (siRNAs)
spawned at the duplication locus8. If sequences at the duplication
locus diverge under constraints to maintain a foldback structure and
adapt to the miRNA biogenesis apparatus, then the new locus might
evolve into a miRNA gene with specificity for one or more targets
related to the founder gene. This is the inverted duplication hypothesis
for evolution of genes encoding miRNAs.
The inverted duplication hypothesis predicts that the foldback arms
of recently evolved miRNA genes might contain relatively long
segments with similarity to target gene sequences. To investigate this
possibility, we used the foldback sequences from 91 miRNA loci14 in
FASTA searches against the A. thaliana gene database (Fig. 1).
Complementarity or similarity to the B21-nucleotide miRNA
1Center for Gene Research and Biotechnology and 2Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon 97331, USA. Correspondence
should be addressed to J.C.C. (carrington@cgrb.oregonstate.edu).
Published online 21 November 2004; doi:10.1038/ng1478
1282
VOLUME 36
[
NUMBER 12
[
DECEMBER 2004 NATURE GENETICS
ARTICLES
A. thaliana miRNAs and endogenous siRNAs
miRNA loci (91)
miRNA and siRNA loci (ASRP
database) from nonrepetitive,
intergenic regions (1,816 loci)
FASTA foldback lacking miRNA
sequence against A. thaliana gene
database (P < 0.001, miR161 and
miR163)
BLASTn small RNA loci, including
500 nt flanking each side, against
A. thaliana gene database
(P ≤ 6 × 10–5)
FASTA miR161 and miR163
foldback arms, with and without
miRNA and miRNA* sequences,
against A. thaliana gene database
Identify potential foldbacks
containing small RNA by
EINVERTED (Score >50, 16 loci)
Global sequence alignment of intact
and randomized miR161 and
miR163 foldback arms with top
FASTA hits
Identify small RNA loci with potential
foldbacks that are not also present in
A. thaliana gene hit (9 loci)
Figure 1 Flowchart for identification of miRNA foldbacks and endogenous
small RNA loci with properties that are consistent with derivation by inverted
duplication from protein coding genes. A. thaliana miRNA foldback
sequences and endogenous siRNAs were as defined by the miRNA
Registry14 and the ASRP database.
sequence alone was insufficient to detect a significant signal under the
conditions used. Most miRNA foldbacks lacked extended similarity or
complementarity to any gene sequences, including those from their
respective target genes (Fig. 2a). But queries with foldback sequences
from MIR161 and MIR163 resulted in significant hits to several genes,
which corresponded to their respective target genes or closely related
family members (Fig. 2a). We detected significant scores to the same
genes even when the miRNA was computationally deleted from the
foldback sequence (Fig. 2b,c). miR161 targets several mRNAs coding
a
non-miRNA
foldbacks
miRNA foldbacks
16
b 16
c
14
100
700
12
14
600
8
10
6
4
P = 0.001
P = 0.05
2
P = 0.001
0
P = 0.05
1
8
6
4
MIR161
MIR163
P = 0.001
P = 0.05
2
500
400
1 miRNA arm
2 miRNA arm,
miRNA excluded
3 miRNA* arm
4 miRNA* arm,
miRNA complem.
sequence excluded
300
200
0
100
–2
0
1729
2020
1868
1268
1171
172
2039
–2
10
NW Score
10
MIR161
MIR163
–log(E value)
12
–log(E value)
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
FASTA foldback sequences against
A. thaliana gene database
for pentatricopeptide repeat proteins (PPRs), and miR163 was predicted to target several S-adenosylmethionine-dependent methyltransferases (SAMTs)5,27,28. The MIR161 locus is unusual in that it encodes
overlapping miRNAs (miR161.1 and miR161.2) from a single precursor sequence (Fig. 3), comprising a contiguous, 29-nucleotide
miR161.1-miR161.2 sequence containing complementarity to a contiguous sequence in each PPR target mRNA. miR163 is also unusual in
that it is 24 nucleotides, rather than the more typical B21 nucleotides,
in length6,7,29 (Fig. 3).
We detected discrete segments of complementarity and similarity
to target gene sequences in miRNA-containing and miRNAcomplementing (miRNA*) arms, respectively, for MIR161 and
MIR163 (Fig. 4a,b). The foldback arms of MIR163 share significant
similarity to sense and antisense polarities of a segment of three
SAMT-like genes (Fig. 4b). The miR163 precursor also contains
significant target sequence similarity in a region between the two
foldback arms (Fig. 4b). We also detected segmental similarity
between MIR161 foldback arms and PPR target genes; the miR161*
arm had the greatest similarity (Fig. 4a). To analyze these matches
statistically, we used 1,000 randomized versions of each foldback arm
from MIR161 and MIR163 in FASTA searches against the A. thaliana
gene database. We also did this using arms lacking miRNA or miRNAcomplementary sequences. We selected the top-scoring gene using
each randomized sequence and subjected it to global sequence alignment with the corresponding query sequence. Using each intact
MIR161 and MIR163 foldback arm, the top hit (always a target
gene) scored 2.7–15.0 standard deviations (s.d.) (P o 7 103)
higher than the mean of the top hits for the randomized sequences
(Fig. 2c). Using foldback arms lacking the miR163 or miR163complementary sequences, the top target gene hit scored at least 8.3
s.d. (P o 1.3 1015) higher than the mean top score with the
randomized arms. Among the arms lacking the miR161 or miR161complementary sequences, top target gene scores were significantly
higher (P o 0.01) than the randomized top score for only the
miR161* arm lacking the miR161.1-complementary sequence
(Fig. 2c). Although sequence similarity was clearly evident outside
the miR161 and miR161-complementary regions of each MIR161
1234
1234
MIR161.1 MIR161.2
1234
1234
1234
MIR163 ASRP1729 ASRP1763
Figure 2 Computational analysis of miRNA and endogenous small RNA–generating foldback sequences. (a) On the left, A. thaliana gene matches in FASTA
searches using foldback sequences from 91 miRNA genes (presented in numerical order from left to right in Supplementary Table 1 online). The top four
hits based on Expect value are presented. Genes corresponding to miRNA targets (or closely related target gene family members) or nontargets are indicated
by red or black dots, respectively. Scores corresponding to P ¼ 0.05 and P ¼ 0.001 values are indicated. On the right, FASTA scores for the top four hits of
predicted foldback sequences from seven small RNA–generating loci (Table 1). (b) Same analysis as in a, except that the miRNA sequence was
computationally removed from each foldback sequence. (c) Needleman-Wunche (NW) scores of alignments between MIR161, MIR163 and ASRP1729
foldback arms and top four gene hits (red) detected by FASTA searches, and mean 7 s.d. of top gene hits detected using 1,000 shuffled sequences from
each foldback arm. For MIR163, four foldback arm sequences were tested: (1) complete miRNA-containing foldback arm; (2) complete miRNA* arm;
(3) miRNA arm with miR163 deleted; and (4) miRNA* arm with the miR163-complementary sequence deleted. For both MIR161 and ASRP1729, two sets
of analyses were done in which the adjacent or overlapping miRNA (miR161.1 and miR161.2) or small RNA (ASRP1729 and ASRP1763) sequences were
individually deleted.
NATURE GENETICS VOLUME 36
[
NUMBER 12
[
DECEMBER 2004
1283
ARTICLES
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
foldback arm (Fig. 4a), the relative short length of the precursor segments after removal of miR161.1 or miR161.2 (or their
complementary sequences) limited the statistical power of the
methods used.
Inverted duplications at other small RNA–producing loci
The inverted duplication hypothesis predicts that a locus undergoes
transitional evolutionary stages before acquiring miRNA-forming
capacity. Initially, inverted duplication loci may contain segments
with perfect or near-perfect repeats corresponding to sequences of
protein-coding genes. If expressed, these transitional loci may yield
heterogeneous siRNA populations just like transgenes engineered to
express hairpin-forming transcripts30,31. Biogenesis of these siRNAs
may or may not require an RNA-dependent RNA polymerase, and
they may form through the activity of DCL1 (as do most or all
miRNAs) or another DCL protein associated with other classes of
endogenous siRNAs32. We postulated that, if these transitional forms
led to miRNA genes in the past, transitional forms with the potential
to lead to miRNA genes in the future would be evident in the current
A. thaliana genome. We devised a computational strategy to search for
such loci (Fig. 1). We identified all known small RNA–generating loci
from nonrepetitive intergenic sequences in the Arabidopsis thaliana
Small RNA Project (ASRP) database and combined them with
the miRNA loci (1,816 total loci). We used a genomic segment
comprised of 500 nucleotides flanking both sides of each small RNA
sequence, or up to the beginning of an adjacent gene, as a query
sequence against the Arabidopsis gene database in BLASTn searches.
Loci that hit one or more genes with a score that met or exceeded that
of the MIR161 locus (PPR target gene hit) were subjected to
EINVERTED analysis to identify inverted duplications with the
potential to form a foldback structure. We screened these loci
manually to eliminate those in which the protein-coding gene(s)
identified in the BLASTn search contained the same inverted duplication as the small RNA–generating locus.
We identified nine loci in the final set, two of which were MIR161
and MIR163 (Table 1). Each of the other seven contained inverted
duplications with the potential to form foldback structures of B195–
2,684 nucleotides (Supplementary Fig. 1 online) and were represented
by 1–20 small RNAs in the cloned sequence database (Table 1). The
predicted foldback sequences and individual arms from each locus
were used in FASTA searches against the A. thaliana gene database,
revealing similarity or complementarity in each arm with at least one
gene (Fig. 2a and Table 1). The ASRP2039 locus is part of a larger
inverted duplication consisting of two annotated genes (At3g44570
and At3g44580). Part of the ASRP172 locus comprises a pseudogene
(At4g04408) with similarity to histone H2A. The remaining five nonmiRNA loci reside at nonannotated intergenic positions and contain
similarity to genes encoding RAN1, flavin monooxygenase-like and
F-box proteins, and proteins of unknown function (Table 1).
We subjected the predicted foldback structure containing
ASRP1729 to more detailed analysis. The foldback region is represented by two tandem small RNAs in the database, ASRP1729 and
ASRP1763 (Fig. 3). The entire length of each arm, approximately onehalf of which is shown in Figure 4c, aligned with several genes coding
for Divergent C1 (DC1) domain proteins. The DC1 family is represented by B170 expressed or predicted A. thaliana genes of unknown
function. Using the randomization technique described above, the
statistical significance of similarity between each foldback arm and the
most closely related DC1 domain gene sequences was very high (P o
1.3 1015 with small RNAs either included or computationally
deleted; Fig. 2c). Although relatively long (474 nucleotides) compared
1284
5 kb
MIR161
I:17882k
23570k
MIR161 primary foldback
(4)
(1)
(10)
miR161.1
miR161.2
(307)
(5)
(1)
(6)
At1g66720
1 kb
MIR163
I:24932k
24955k
At1g66690
At1g66700
MIR163 primary foldback
(2)
(1)
(1)
miR163
1 kb
898k
V:480k
At5g02330 At5g02350 At5g02360
ASRP1729 predicted foldback
1729
(1)
1763
(1)
Figure 3 Genomic regions corresponding to A. thaliana MIR161, MIR163
and the small RNA–generating locus ASRP1729. Target genes (orange
arrows), nontarget genes (green arrows), retroelements (red triangles) and
cloned miRNAs or small RNAs (blue lines) are shown. DC1 domain–
containing genes (light blue arrows) related to ASRP1729 are shown. The
number of times each miRNA or small RNA was sequenced in the ASRP
database is given in parentheses.
with A. thaliana miRNA foldback structures (139 7 50 nucleotides),
the ASRP1729 foldback contained mismatches and bulges similar to
miRNA precursors (Fig. 4c).
Formation and function of miR161 and miR163
Because of the unusual properties and extended target gene similarity
of MIR161 and MIR163, we analyzed the biogenesis requirements and
activity of miRNAs from each locus. Canonical miRNAs, such as
miR169, generally require DCL1 (but not DCL2 or DCL3) and HEN1
but not RNA-dependent RNA polymerases RDR1, RDR2 and RDR6
(Fig. 5a)32. trans-acting siRNAs (such as ASRP255), a recently
identified class of small RNAs that functions to suppress or degrade
target mRNAs, require DCL1, HEN1 and RDR6 (Fig. 5a)33,34.
Chromatin-associated and perhaps other classes of endogenous
siRNAs require DCL3 and RDR2 (ref. 32). A. thaliana mutants with
defects in DCL and RDR genes are, therefore, useful to distinguish
among different classes of small RNAs. Accumulation of miR163 was
completely DCL1- and HEN1-dependent but was independent of
DCL2, DCL3, RDR1, RDR2 and RDR6 (Fig. 5a). miR161.1 and
miR161.2 had genetic requirements similar to those of miR163 and
other miRNAs, but each was insensitive to the dcl1-7 mutation
(Fig. 5a). miR161.1 accumulation was lost in the dcl1-9 mutant
(Fig. 5b), however, indicative of a DCL1 requirement but allelespecific sensitivity to the two mutations35. ASRP1729 was HEN1dependent but was insensitive to individual mutations in each dcl
mutant plant (Fig. 5a,b). This could reflect a biogenesis pathway that
requires DCL4, which was not tested, or that involves redundant DCL
activities. There was no evidence that accumulation of ASRP1729
small RNA required a specific RDR gene (Fig. 5a).
We obtained further evidence for miRNA function of miR161 and
miR163 by analyzing several target mRNAs for each miRNA. Using a
5¢ RACE assay developed to map miRNA cleavage sites, which
generally occur at a position 10 nucleotides from the 3¢ end of the
miRNA-complementary sequence, we validated three PPR gene targets
VOLUME 36
[
NUMBER 12
[
DECEMBER 2004 NATURE GENETICS
ARTICLES
miR161.1
a
U
LOW
miR161.2
5’
U
UU
G G
AG
CA
CC
UA UU
C
C
U
U
U
C
A U A C U U C U U G U UU
U U A G U U G G G G C U A
A U C A G U G A A A G U
A C G U A A U A G
U G C G U
G A C G U U U U U
C U C U
U U C G
A A U C A A C C CU G G U U A G U C A C U U U C A
U G C A U U A U C
A U G C A
U U G U A A A A A
A A G C
G G G A
G A U G A A G A A C A G
A
A
A
U
C
C A
U
A
G A
U
3’
GG
HIGH
miR161.1
L
miR161
miR161*
At1g62590
At5g41170
At1g64580
At1g63630
miR161.2
ATGAAGAACAAAA--AAAATCGGAACCCCGATGTAGTCACTTTCAATGCATTGATCGACGCAATAAACTGGTCAAAAACCGAGAT-------CAAGCA---------ATGAAGCG-GAAACAGTAATC--AACCCTGGTTTAGTCACTTTCACTGCATTAATCAATGCATTT-----GTAAAAAGAGGGAA--------- AAGCA----------ATATGATTGAGAAGAAAATC--AACCCTAATTTAGTCACGTTCAATGCATTGATCGATGCGTTT-----GTGAAA-GAAGGAAAGTTTGTAGAGGCTAAGAAACAGT
GGTATGACGAAGAGGAAAATC--AAACCTGATGTAATCACTTTCAATGCATTGATCGATGCGTTT-----GTGAAA-GAAGGAAAGTTTTTGGATGCTGA-------AATATGATGAAGAGGAGTATA--AACCCTGATGTTGTCACTTTCACTGCGTTGATCGATGTATTT-----GTGAAA-CAAGGCAATCTTGATGAGGCTGAAAAATTGC
CATATGATTGAAAAGCAAATC--AATCCTGATATTGTTACTTTCAGTGCATTGATCAATGCATTT-----GTCAAA-GAAAGAAAGGTCTCCGAGGCTGAAGAAATAT
* *.
Cons
* *
**
** **. .* *
* ** .**** *** **.*** *.*
.*.
** ***
.
. *
* **
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
b
563
1261
467
520
4.7 ×
1
380
1.6 ×
10–15
142
327
281
187
242
4.4 ×
10–11
1
At1g66700
10–10
miR163
374
MIR163
5’A
C AC
U
C
U
A
C CU U A GA A A A C C
CA A A AC C GG G G A U AA A U C G AGU U C C AA C C U C UU C A A C GA C
U G G A A U C U CU U U G A GCUG U U U U G UG C C CC C U A U U AU A G C U C A A G G U U G G A G A A G U U G C U G
CA
U
U
3’
miR163
miR163
miR163*
At1g66690
At1g66700
At1g66720
A C C T A T A G A G A A A C T G G A C A A A A C A C G G G G G A T A A T A T C G A A G T T C C A A G T C C T C T T C A A C G A C T T T A G C A T C T A C G A T T T A A G C A C T G - C - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A C C T - T A G A T A A A C C G A C C A A A A C C C G G T G G A T A A A A T C G - A G T T C C A A - - C C T C T T C A A C G A C - - - - - - - - - A A C G A T T T C A A C A C T C T C T T C C AG G A A C A A C T T C C T C C A G G C A G A T G A T A C T A A A G T G C T G G A G T T C C C G G T T C C T
A T C T - T A G A G A A A C C G G A C A A A A C C C G G A G G A T A A C A T C G - A G T T C C A A G T C C T C T T C A A C G A C T T A A G A A T C A A T G A C T T C A A C A C T C T C T T C C AG A - - - C A C T C C C T C C G G G G A G A A G A T A C T T T A G C G C C G G G G T T C C T G G T T C C T
A C C G - T A G A G A A A C C G G A C A A A A C C C G G C A G A T A A C A T C G - A G T T C C A A G T C C T C T T C A A T G A T T T C A G C C T C A A T G A T T T C A A C A C T C T C T T C C AG A - - - C A C T T C C A C C C G G A A G A A G A T A C T T C A G C G C T G G A G T T C C T G G T T C C T
A T C T - T A G A G A A A C C G G A C A A A A C C C G G A C G A T A A C A T C G - A G T T C C A G G T C C T C T T C A A C G A C T T A A G C A A T A A C G A T T T C A A C A C T C T C T T C C AG G - - - G A C T T C C T T C T G G C A G G A G A T A C T A T A G T G C T G C C A T T C C T G G T T C C T
Cons
* *
c
****.****.*..******.***
***** ****.*******...*********.**... ..
. * * * . * * . * . * * * * . . * ... . . ..
. . .. . .
. ..
..
......
.. .. .
.. .. .. . . . ..
ASRP1763
ASRP1729
U G
C
A
UA
A U
U
A
A A
C C
G C
G C
C
A A
A
U
C
A
A
C
A
C
A U A CU GU
U U A A CU A A A GU A GA GGU
U U U GU A C A CGU U U A CGA A GG
GU CU U U U GU U GU A GC A A U GU A GGGGU A GA U GA A U U GA A U U C C A GC A A C A U U A C U A U A CU A
U U GU C A A A A A A GU U CGU A C A GA U A GA U U U U C C A U U A A U A U A A GU A C A U A
C
U U U A U G A U GA A A A C A G U U U U U U C A A G C A U G U G C U A U C U AUG A A G G U A A U U A C U A U U C A U G U A U A A A A U A U G A U U U U A U C C U C C A U A A A A C A U G U G C A A A U G C U U U C U A C A G G A A A C A A U A U A C G U U G C A U G C C C A U C U A C U U A C A C U U A A G G U C G U U G U C A A U G A A U G U G A U
C
C
G
C
A
A
U
A
A
UAA A
ASRP1729
5’
3’
ASRP1763
ASRP1729
ASRP1729*
At1g61840
At5g55770
At2g40050
GTTTTTTCAAGCA-GTACGATGTTCTATCTAA-AAGGTAATTATTATTCATGTATGGAATGTGATTTCATTCTCCACGAAACATGTGCAAATGCTTCCCGCAGAAAACAACATGCGTTACATCCCCATATATCTACTTATACTTAAGGTCGT
GTTTTTTCAAGCATG-------TGCTATCTATGAAGGTAATTACTATTCATGTATAAAATATGATTTTATCCTCCATAAAACATGTGCAAATGCTTTCTACAGGAAACAATATACGTTGCATCCCCAT----CTACTTACACTTAAGGTCGT
GTTTTGTCAAGCATGTGTCAT-TCCTATCTATGAAGGTAATCATTACATTTGTGTGGAATGTGACTTCACCATCCATGAAATATGCGCAAAAGATCGGCGAAGAATACAACATGCGTTACATCCTCAT----CCACTTACATTAAATGCCGT
GTATTGTCGAGGGTGTGCCCT-TCCAATCTATGAAGGTCAATTCTATTCATGCATGGAATGCGACTTCATCCTCCACGAAAGCTGTGCAAATGCTCCGCGCATGAAACGACATCCTTTATATCCACAT----CCTCTTACACTAAAAGTTGC
GCTTTGTCAAGCGTGTGCCAT-TCCTATCTATGAAGGTAACTTATATGTTTGTATGGAGTGTGATTTCATCCTCCATGAAACATGTGCTAAGGCTCCCCGTAGAATCCAACATGCCTTACATCCTCAC----CCACTTGTATTGGAGGTTAT
Cons
*
** ** **
.*.
. * * *****..***** *
**
**
*..* *. ** ** * . **** .***
** ** ** *.*
.. *
*
* *.** * **. **** **
*
***
* *
* *
Figure 4 Similarity between foldback arms and protein-coding genes. Alignment of MIR161 (a) and MIR163 (b) foldback arm sequences and corresponding
target genes, and ASRP1729 foldback arms and most–closely related DC1 domain–containing genes (c). To visualize similarity, the sense orientation of each
gene sequence was aligned with the actual miRNA* foldback arm and the reverse complement of the miRNA foldback arm. The color code for alignments
represents the overall match quality at each position, as determined by T-Coffee. Color codes using T-Coffee indicate alignment quality in a regional context.
Conserved positions across all sequences, and across target gene sequences and only one foldback arm, are indicated by asterisks and dots, respectively,
in the consensus (Cons) bar. Positions corresponding to miRNA or miRNA-complementary sites are indicated. In b, a diagrammatic representation of
MIR163 sequences corresponding to duplication events is presented using the SAMT-like target gene At1g66700. The FASTA E value for each color-coded
segment is shown.
(Fig. 5c). One of these (At1g06580) was validated previously28. Based
on the predominant position of cleavage, two PPR mRNAs were
targeted by miR161.1 and one by miR161.2, indicating that both
MIR161-derived species were functional. Each of four SAMT-like
mRNAs were targeted by miR163 (Fig. 5c). Notably, we detected
cleavage at the canonical position relative to the 3¢ end of the
complementary target site, despite the long length (24 nucleotides) of
miR163. Several attempts to validate many DC1 domain–containing
mRNAs (At1g61840, At1g53340, At1g44050, At2g13900, At2g13910,
At2g02610, At2g02620, At2g02630, At5g02330, At5g02350, At3g26550,
Table 1 Small RNA–generating loci containing inverted gene duplications with similarity to protein-coding genes
Similarity to A. thaliana genes
E value
Small RNA locus
Small RNAsa
Foldback size (nt)
Gene
Domain familyb
5¢ arm
Strandc
3¢ arm
Strandc
MIR161
7
173
At4g41170
PPR
8.1 104
1.3 105
+
MIR163
ASRP1729
3
2
374
474
At1g66700
At1g61840
SAMT
DC1
5.2 1015
3.2 1012
+
3.2 1011
2.0 1012
+
ASRP2020
1
195
At5g20010
RAN
9.2 108
7.9 1010
+
ASRP1868
ASRP1268
1
1
486
470
At1g66290
At3g11000
F-box
Unclassified
3.9 1036
1.0 108
1.2 1045
8.0 104
+
+
ASRP1171
ASRP172
2
1
416
810
At1g48910
At3g54560
FMO
H2A
2.0 1032
3.6 1026
+
1.0 1073
9.5 1046
+
ASRP2039
20
2684
At3g44570
Unclassified
0.0d
+
0.0d
aUnique
small RNA sequences in the ASRP database from each small RNA locus foldback. bFamily name abbreviations are as follows: PPR, pentatricopeptide repeat; SAMT, S-adenosyl-LMet:salicylic acid carboxyl methyltransferase; DC1, Divergent C1; RAN, Ras-associated nuclear small GTP-binding protein; FMO, flavin monooxygenase; H2A, Histone 2A. cOrientation of the gene
alignment relative to the small RNA locus. dFoldback contains FASTA match to annotated genes (At3g44570 and At3g44580).
NATURE GENETICS VOLUME 36
[
NUMBER 12
[
DECEMBER 2004
1285
c
rdr6-15
Col-0
rdr2-1
rdr1-1
Col-0
dcl3-1
dcl2-1
Col-0
dcl1-7
La-er
a
hen1-1
ARTICLES
4/13 9/13
At1g06580
(PPR)
24
21
ASRP255
24
21
miR169a
860
...AUCAGCCCGGAUGUAAUCACUUUCAGUGCUUUGAUCGAU...
GGGGCUACAUCAGUGAAAGUUACGUAACU
3'
5'
1/13 2/13
miR161.1/miR161.2
10/13
970
At1g63150
(PPR)
...AUCAACCCCAAUGUUGUUACUUUCAAUGCAUUGAUCGAUG...
GGGGCUACAUCAGUGAAAGUUACGUAACU
3'
5'
5/7
miR161.1/miR161.2
1/7 1/7
728
24
21
At5g41170
(PPR)
miR163
...AUCAAACCUGAUGUAAUCACUUUCAAUGCAUUGAUCGAU...
miR161.1/miR161.2
GGGGCUACAUCAGUGAAAGUUACGUAACU
3'
5'
29/29
24
21
At1g66700
(SAMT-like)
miR161.1
...AUAACAUCGA-GUUCCAAGUCCUCUUCAAUGAUU...
UAGCUUCAAGGUUCAGGAGAAGUU
5'
5/5
3'
miR163
324
24
21
At1g66690
(SAMT-like)
miR161.2
...AUAACAUCGA-GUUCCAAGUCCUCUUCAACGAUU...
3'
UAGCUUCAAGGUUCAGGAGAAGUU
At1g66720
(SAMT-like)
24
ASRP1729
21
miR163
5'
1/1
328
...AUAACAUCGA-GUUCCAGGUCCUCUUCAACGAUU...
3'
UAGCUUCAAGGUUCAGGAGAAGUU
5'
miR163
7/7
312
5S rRNA/tRNA
4
6
7
8
9
24
miR161.1
21
24
ASRP1729
21
5S rRNA/tRNA
1
2
10
11
...AGGGAAUCG-AGUUCCAAGUUUUCUUCAAUGAUU...
UAGCUUCAAGGUUCAGGAGAAGUU
3'
5'
miR163
3
4
5
6
Figure 5 Biogenesis and function of A. thaliana miR161, miR163 and ASRP1729.
(a) Small RNA–blot analysis of total RNA from inflorescence (lanes 1–9) or 3-d postgermination seedlings (lanes 10, 11) from control (La-er and Col-0) and mutant plants.
miR169a and ASRP255 are shown as canonical miRNA and trans-acting siRNA controls,
respectively. (b) RNA-blot analysis of miR161.1 and ASRP1729 from dcl1-7 and
dcl1-9 mutant plants. (c) Validation of miRNA-guided target mRNA cleavage by 5¢ RACE.
The miR161.1 and miR161.2 sequences are highlighted by black letters. The frequency of
cloned sequences corresponding to each inferred cleavage site is shown above the arrows.
At1g62030, At1g47890, At1g53290 and At1g65680) as ASRP1729 or
ASRP1763 targets were unsuccessful, although at least one of these
mRNAs (At1g61840) was cleaved at a single position upstream from
the putative target sites (data not shown). This could reflect the
activity of another, as yet undetected, small RNA derived from the
ASRP1729 foldback. Thus, despite some unusual properties, both
miR161 and miR163 possess biogenesis and activity profiles that are
typical of miRNAs. ASRP1729, however, has distinct biogenesis
requirements and uncertain trans-active targeting functions.
Phylogenetics of MIR161, MIR163 and ASRP1729
Most miRNA genes in A. thaliana are not genetically linked to their
respective target genes. In contrast, MIR163 is located immediately
adjacent to a cluster of three SAMT-like target genes (Fig. 3b).
MIR161 is not tightly linked to target genes but is B5.7 Mbp away
from a region containing a high density of predicted and validated
PPR target genes (Fig. 3). The ASRP1729 locus resides B0.4 Mbp
away from a cluster of closely related DC1 domain–containing genes
with high similarity (Fig. 3). The PPR, SAMT-like and DC1 domain–
containing gene clusters resulted from relatively recent duplication
events and were inferred to be members of rapidly evolving families.
The proximity of miRNA genes and other foldback-encoding loci to
expanding target families is suggestive of an evolutionary relationship.
To test the relationships between MIR161 and MIR163 and their
respective target gene families, we carried out phylogenetic reconstructions (Bayesian and maximum parsimony methods36) using each
foldback arm as an independent taxon in a gene set that included
validated targets, predicted targets and the 23 (MIR161) or 18
(MIR163) most closely related target family members. The topology
of both trees supported monophyly of each miRNA foldback arm and
1286
At3g44860
(SAMT-like)
Inflorescence
dcl1-9
dcl1-7
Leaf
5
dcl1-9
b
3
dcl1-7
2
La-er
1
La-er
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
328
target genes (Fig. 6a,b). MIR161 arms each clustered in a wellsupported clade containing the 3 validated and 16 predicted miR161
targets (Fig. 6a). The MIR163 foldback arms clustered tightly with the
three validated SAMT-like target genes located adjacent to the MIR163
locus and formed a larger clade consisting exclusively of all six
validated or predicted miR163 targets (Fig. 6b).
We also generated phylogenetic trees using ASRP1729 foldback
arms and 39 DC1 domain–containing genes. Both ASRP1729 foldback
arms clustered with 25 annotated members of the DC1 domain family
in a strongly supported clade (Fig. 6c). Given the consensus properties
of known small RNA–target interactions that trigger mRNA
cleavage13,27,37,38, most of the genes in this clade are potential targets
for RNAi triggered by the ASRP1729-derived small RNAs.
DISCUSSION
We conclude that A. thaliana MIR161 and MIR163 genes evolved
relatively recently by inverted duplication events associated with active
expansion of target gene families and then adapted to the miRNA
biogenesis apparatus. The proposed evolutionary pathway includes
transitional gene forms that probably spawned siRNAs and that
resemble several inverted duplication loci derived from protein-coding
genes (such as ASRP1729) evident in the A. thaliana genome today.
We propose a general mechanism to spawn and evolve miRNA genes
with unique target specificities (Fig. 7). Inverted duplication events
from a founder gene (step 1) result in head-to-head or tail-to-tail
orientations of complete or partial gene sequences. Duplication may
include the founder gene promoter or may result in capture of a new
promoter. The inverted duplication can occur directly from a genomic
sequence or, potentially, during integration of a psuedogene-like
sequence after reverse transcription. The new locus can even form
VOLUME 36
[
NUMBER 12
[
DECEMBER 2004 NATURE GENETICS
ARTICLES
cation locus (step 2), under constraints to maintain both the foldback
character and recognition by a DCL activity, generates one or more
specific siRNA populations, such as those from the ASRP1729 locus.
Adaptation of the foldback transcript to the miRNA biogenesis pathway involving DCL1 (step 3) results in formation of a uniform miRNA
population. Additional sequence divergence (step 4), under foldback
and DCL1-recognition constraints, occurs to the point that only the
miRNA or miRNA-complementary sequences resemble the originating
gene sequence. Duplication of the miRNA locus (step 5) results in a
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
through juxtaposition of two closely related sequences from different
members of a gene family. Transcription of the nascent locus yields
foldback transcripts that potentially serve as DCL substrates for siRNA
biogenesis, bringing the founder gene and closely related family
members under control of RNAi at the post-transcriptional or chromatin levels8. Given that miRNA target sites generally occur outside of
family-defining domains15,18,19,27, we suggest that duplication events
yielding foldbacks from highly conserved domains are under strong
negative selection. Limited sequence divergence at the inverted dupli-
At2g38420
a
MIR163*
b
At5g55840
At1g11900
At1g51965
MIR163
At5g At5g
38780 38100
At5g37970
84
At5g37990
75
At1g
66720
100 (100)
96
1–17
MIR161*
MIR161
99
At1g63230 100
At1g63630
At1g62860
At1g15125
At3g04760
*
*
98 (69)
92
At1g12620
**
92
**
*
* *
92
74 57
68 61
At1g66700
At1g66690
At3g44860
At3g44870
At3g44840
100 (100)
99
70
51
*
100
88
At2g27800
*
At4g26420
At1g68040
*
100
67
At1g02060
92
*
At2g26790
At1g06710
100
73
At5g56300
100
100
At1g52640
100
96
At1g80150
At4g36470
71
nd
At1g05750
76 70
* *
At1g74630
*
**
* 85
* *
At1g34160
At5g55250
At1g19640
At1g33350
At2g17210
84 100
62 96
At2g44880
At5g04370
At1g20230
At1g64310 At5g06540
0.1
At1g17630
76
94
At5g66430
At2g14060
At5g04380 At3g
11480At3g At5g
38020
21950
0.1
c
At3g59120
At5g26190
At2g
13950 At5g
At5g 02330
At5g02360
02340 At2g
At2g 13900 At5g02350
13910
ASRP1729*
At5g55780 At1g
ASRP
At5g55770
53340
1729
98
At1g44050
* 9097
*
79
At5g55800
*
100
99
At4g10370
76
At3g26550
100
*
93
91
79
At1g61830
At5g40320
100
83
*
At1g61840
85
100
At3g21210
At2g40050
At2g02690
At2g02620
At2g02680 At2g At2g02630
02640
At2g02700 At2g
At5g
02610
37210
0.1
*
** *
At1g66440
At4g15070
At5g48320
At4g02540
At1g
At3g50010 62030
At5g03360
At2g28460
At3g26250 At3g26240
NATURE GENETICS VOLUME 36
[
NUMBER 12
[
DECEMBER 2004
Figure 6 Phylogenetic analysis of MIR161 and MIR163
foldback arms and target families and of ASRP1729
foldback arms and DC1 domain–containing genes.
Consensus phylogenetic trees for MIR161 foldback arms and
related PPR target gene family members (a); MIR163 arms
and SAMT-like target family genes (b); and ASRP1729 arms
and DC1-domain family genes (c). Color codes show miRNAcontaining and miRNA-complementary (miRNA*) foldback
arms (red), predicted target genes (light blue) and validated
target genes (dark blue). Bayesian posterior probability (top)
and maximum parsimony (bottom) values are shown for
nodes that were supported by both methods. Unlabeled
nodes were supported by values of 100; asterisks indicate
nodes with scores o50 (poor or nonsupported nodes). The
posterior probability for each clade containing the foldback
arms with the miRNA excluded is shown in parentheses.
Tightly clustered PPR genes labeled 1–17: At1g62670,
At1g63070, At1g63080, At1g62910 (two targets),
At1g63130, At1g62930, At1g63400, At1g63150,
At1g62590, At1g63330, At1g62680, At5g41170,
At1g64580, At1g06580, At5g16640 and At1g62720.
1287
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
ARTICLES
Figure 7 Inverted duplication model for miRNA
gene evolution in plants. Red arrows and step
labels indicate miRNA gene evolution events.
Black arrows and clear step labels indicate target
family evolution events. Sequences in proposed
transitional genes and miRNA genes with
sequence similarity or complementarity to target
gene sequences are indicated by black shading.
Sequences that are unique or divergent relative
to target gene sequences are indicated in red.
Putative or known foldback structures from each
transitional locus or miRNA gene are shown.
miRNA gene evolution
Inverted duplication
1
Target family evolution
‘Family’
domain
siRNA
or
6
2
e.g., ASRP1729
siRNA
3
7
8a
Young miRNA gene
miRNA target gene
Nontarget gene
multigene miRNA family, perhaps leading to
miR161, miR163
miRNA target specificity after further
miRNA
sequence divergence. Most modern miRNA
miRNA
4
genes of A. thaliana are members of multi8b
gene families39 and lack extended target gene
similarity outside of the miRNA or miRNAOld miRNA gene
miRNA target gene
Nontarget gene
e.g., miR169
complementary sequences. We propose that
MIR161 and MIR163 are young genes that
miRNA
miRNA
5
have progressed only to step three.
Gene duplication
Gene duplication 9
The model is incomplete without consideration of target-gene family evolution
MIRNA gene ‘a’ MIRNA gene ‘b’
miRNA target gene 1 miRNA target gene 2
(Fig. 7). Most miRNA target genes in
plants are subsets of larger gene
families9,13,15,18,19,27,40. Target-family gene
duplication (step 6) provides the pool from which regulatory diversi- explain why there are no common miRNAs or miRNA-regulated
fication emerges. After formation of an siRNA- or young miRNA- targets between plants and animals. Lack of orthologous miRNA genes
generating locus (steps 2 or 3), retention (step 7) or loss (step 8a) of between kingdoms is expected if plant miRNA genes arose de novo
small RNA–complementary target sites among family members leads after the split of animal-fungi and plant lineages. But we cannot rule
to differential post-transcriptional regulation. This may be accompa- out the possibility that other evolutionary pathways, though currently
nied by changes in transcriptional control elements (step 8b), leading unknown or unexplored, contributed to the miRNA gene class in
to further regulatory diversification. Further duplication and subse- plants. This model does not necessarily apply to animal miRNA genes,
quent diversification of miRNA target genes (step 9) leads to specia- which encode foldback structures that are too short to search for
lized regulation by distinct miRNA family members. Thus, new evidence of derivation from target genes. Bartel and Chen42 proposed
regulatory networks might emerge through a series of duplication that animal miRNAs function primarily as redundant modulators to
events followed by retention or loss of sequence complementarity fine-tune expression of thousands of genes through combinatorial
between regulator (miRNA) and target genes. The evolutionary force interactions within 3¢ untranslated regions. Thus, new animal target
driving these events is probably selective advantage gained by differ- genes are more likely to come under the control of miRNAs using
ential regulation of target family members26. Application of RNA- existing miRNAs through gain-of-interaction events, such as target
based control may be particularly well-suited for genes conferring cell- gene recombination or duplication of the 3¢ untranslated region
sequence, rather than by de novo miRNA formation events.
fate properties or responses to stress13,22.
Finally, given that most plant miRNAs regulate genes belonging to
Seven A. thaliana small RNA–generating loci, including ASRP1729,
possess the properties predicted for transitional forms resulting after families with roles in development, miRNA-directed regulation may
step 2 in the model (Fig. 7). We do not consider these to be miRNA have provided selective advantages for evolution of a multicellular
genes because the small RNA biogenesis requirements are not clear. body plan. Recent evidence suggests that organism complexity arises
Also, the functionality of these loci as negative regulators of related primarily from application of new regulatory control over duplicated
protein-coding genes has not been established. Further, the ultimate genes rather than by invention of new activities43. The coincident
evolution of these loci toward miRNA gene status cannot be deter- expansion of gene families controlling development and miRNAmined at this point. We propose, however, that these and similar loci based control mechanisms may have had a profound influence on
comprise an evolutionary reservoir from which to sample RNA-based the independent evolution of multicellularity in plants and animals11.
elements that confer advantageous regulatory properties to target
family members41. This model does not explain how loci that generate
trans-acting siRNA evolved33,34. Evidence is lacking for transcripts METHODS
Identification of small RNA loci from inverted duplications of proteinwith miRNA-like foldback structure arising from loci that generate
coding genes. We identified matches between small RNA loci in the ASRP
trans-acting siRNA. It is possible that these loci yield transcripts that database and protein-coding genes (The Institute for Genome Research AGI
are adapted as RDR6 templates rather than direct DCL substrates33,34. gene database, version 3.0) by BLAST searches using segments containing 500
The miRNA evolution model offers a mechanism for de novo nucleotides flanking both sides of the small RNA. We included only small RNAs
generation of new, RNA-based regulatory genes from protein-coding from nonrepetitive, intergenic regions in the search. We treated each small
genes. If all plant miRNA genes arose through this mechanism, it may RNA sequence as an independent locus, regardless of overlap among related
1288
VOLUME 36
[
NUMBER 12
[
DECEMBER 2004 NATURE GENETICS
ARTICLES
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
sequences. We omitted segments of annotated genes within 500 nucleotides of
small RNA loci from the analysis. We analyzed small RNA loci with gene
matches with an expect value E r 6 10–5 for potential to form a foldback
structure using EINVERTED44. We manually inspected loci with an EINVERTED score 450 for the presence of an inverted repeat that included the
small RNA sequence. We eliminated small RNA–generating loci that had
BLAST hits to protein-coding genes containing the same or similar inverted
duplication. We predicted foldback structures using RNAFold45.
Computational analysis of miRNA foldback and target gene sequences. We
used FASTA to search The Institute for Genome Research AGI transcript
database (version 4.0) for genes with similarity to a reference set of 91 miRNA
precursor foldback sequences (Supplementary Table 1 online). Searches used a
5:–4 matrix. We also did a second search using the foldback with the mature
miRNA sequence excluded. We analyzed further those foldback sequences (with
miRNA sequence computationally deleted) that hit gene sequences with an
expect value E o 0.001 by FASTA searches against the transcript database using
each foldback arm independently. We calculated global alignment scores using
the foldback arms and gene sequences with NEEDLE using the NeedlemanWunche alignment algorithm44. Gene sequences from At5g41170, At1g64590,
At1g62590 and At1g63630 were aligned to the MIR161 foldback arms;
sequences from At1g66700, At1g66690, At1g66720 and At3g44840 were aligned
to MIR163 arms; and sequences from At1g61830, At2g13900, At5g02340 and
At2g13910 were aligned to ASRP1729 arms. We repeated this analysis with
foldback arms lacking miRNA or miRNA-complementary sequences. To
evaluate the significance of the Needleman-Wunche score, we created 1,000
shuffled sequences for each miRNA foldback segment using SHUFFLESEQ44.
We used FASTA to find the top-scoring gene hit (based on expect value) for
each shuffled sequence. We calculated Needleman-Wunche scores for each
shuffled sequence and corresponding top gene hit and used them to determine
a mean and s.d. for shuffled foldback and target sequence alignments. We
calculated the probability of a random match46.
miRNA and small RNA–blot analysis. We isolated total RNA from independent pools of inflorescence or 3-d-post-germination seedlings using Trizol
reagent (Invitrogen) and purified it with an RNA/DNA Midi Kit (Qiagen). We
resolved total RNA (7.5 mg) on a 17% polyacrylamide-urea gel, blotted it to a
nylon membrane and probed it with a 32P end-labeled oligonucleotide probe30.
A. thaliana mutants containing dcl1-7, dcl1-9, dcl2-1, dcl3-1, rdr1-1, rdr2-1
and hen1-1 were described previously5,32,35. The rdr6-15 allele (SAIL_617)
contains a T-DNA insertion at a position 312 nucleotides beyond the start
codon of At3g49500.
5¢ RACE analysis of miRNA target genes. We mapped cleavage sites in the
mRNAs of miRNA target genes using a modified 5¢ RACE procedure as
described previously40. We designed gene-specific primers B500 nucleotides
downstream of the predicted miRNA target site. We cloned purified PCR
products into pGEM-T Easy (Promega) and sequenced them.
Phylogenetic reconstructions. We aligned the amino acid sequences of
proteins from genes used in phylogeny reconstruction using T-Coffee47 and
then converted them to the corresponding nucleotide sequence using TRANALIGN44. We then aligned miRNA precursor arms to the prealigned protein
coding sequences using T-Coffee and excluded the ambiguous regions. We
carried out Bayesian phylogenetic analyses with MrBayes v3.0B4 (ref. 48). Each
codon of each miRNA or target family was assigned its own model of evolution
using likelihood ratio test implemented in Modeltest49. In each case, we used
the GTR + g model for individual positions. Analyses were done using four
chains and run for 1,000,000 generations. Trees were sampled every tenth
generation and the resulting 100,000 trees were plotted against their likelihoods
to discard all trees before the convergence point in the burn-in phase. A
majority-rule consensus was generated with the remaining 90,000 trees for each
gene to calculate the posterior probability. The maximum parsimony analyses
were also done for each gene using PAUP50 with the following options: 100
replicates of random sequence addition, TBR (Tree bisection-reconnection)
branch swapping and Multrees in effect. Relative support for the resulting tree
was determined by 1,000 bootstrap replications and the corresponding bootstrap values were presented in the Bayesian consensus trees with the posterior
NATURE GENETICS VOLUME 36
[
NUMBER 12
[
DECEMBER 2004
probability. The topology of both Bayesian and maximum parsimony trees had
similar topology for all gene families.
URLs. The ASRP database is available at http://asrp.cgrb.oregonstate.edu/.
Note: Supplementary information is available on the Nature Genetics website.
ACKNOWLEDGMENTS
We thank S. Givan, D. Smith and C. Sullivan for assistance and advice with
computational resources; L. Johansen for initial propagation of the rdr6-15
mutant; and S. Poethig and H. Vaucheret for discussions about trans-acting
siRNAs and the suggestion to analyze miR161 in multiple dcl1 mutated alleles.
This work was supported by grants from the US National Science Foundation,
the US National Institutes of Health and the United States Department
of Agriculture.
COMPETING INTERESTS STATEMENT
The authors declare that they have no competing financial interests.
Received 17 August; accepted 26 October 2004
Published online at http://www.nature.com/naturegenetics/
1. Bartel, D. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,
281–297 (2004).
2. Ketting, R.F. et al. Dicer functions in RNA interference and in synthesis of small
RNA involved in developmental timing in C. elegans. Genes Dev. 15, 2654–2659
(2001).
3. Hutvagner, G. et al. A cellular function for the RNA-interference enzyme Dicer in the
maturation of the let-7 small temporal RNA. Science 293, 834–838 (2001).
4. Lee, Y. et al. The nuclear RNase III Drosha initiates microRNA processing. Nature
425, 415–419 (2003).
5. Park, W., Li, J., Song, R., Messing, J. & Chen, X. CARPEL FACTORY, a Dicer homolog,
and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr.
Biol. 12, 1484–1495 (2002).
6. Kurihara, Y. & Watanabe, Y. Arabidopsis micro-RNA biogenesis through Dicer-like 1
protein functions. Proc. Natl. Acad. Sci. USA 101, 12753–12758 (2004).
7. Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B. & Bartel, D.P. MicroRNAs
in plants. Genes Dev. 16, 1616–1626 (2002).
8. Finnegan, E.J. & Matzke, M.A. The small RNA world. J. Cell Sci. 116, 4689–4693
(2003).
9. Floyd, S.K. & Bowman, J.L. Gene regulation: ancient microRNA target sequences
in plants. Nature 428, 485–486 (2004).
10. Pasquinelli, A.E. et al. Conservation of the sequence and temporal expression of
let-7 heterochronic regulatory RNA. Nature 408, 86–89 (2000).
11. Meyerowitz, E.M. Plants compared to animals: the broadest comparative study
of development. Science 295, 1482–1485 (2002).
12. Poethig, R.S. Life with 25,000 genes. Genome Res. 11, 313–316 (2001).
13. Jones-Rhoades, M.W. & Bartel, D.P. Computational identification of plant
microRNAs and their targets, including a stress-induced miRNA. Mol. Cell 14,
787–799 (2004).
14. Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S.R. Rfam: an
RNA family database. Nucleic Acids Res. 31, 439–441 (2003).
15. Palatnik, J.F. et al. Control of leaf morphogenesis by microRNAs. Nature 425,
257–263 (2003).
16. Aukerman, M.J. & Sakai, H. Regulation of flowering time and floral organ identity
by a microRNA and its APETALA2-like target genes. Plant Cell 15, 2730–2741
(2003).
17. Chen, X. A microRNA as a translational repressor of APETALA2 in Arabidopsis
flower development. Science 303, 2022–2025 (2004).
18. Mallory, A.C., Dugas, D.V., Bartel, D.P. & Bartel, B. MicroRNA regulation of NACdomain targets is required for proper formation and separation of adjacent embryonic,
vegetative, and floral organs. Curr. Biol. 14, 1035–106 (2004).
19. Achard, P., Herr, A., Baulcombe, D.C. & Harberd, N.P. Modulation of floral development by a gibberellin-regulated microRNA. Development 131, 3357–3365 (2004).
20. Emery, J.F. et al. Radial patterning of Arabidopsis shoots by class III HD-ZIP and
KANADI genes. Curr. Biol. 13, 1768–1774 (2003).
21. Laufs, P., Peaucelle, A., Morin, H. & Traas, J. MicroRNA regulation of the CUC genes
is required for boundary size control in Arabidopsis meristems. Development 131,
4311–4322 (2004).
22. Sunkar, R. & Zhu, J.K. Novel and stress-regulated microRNAs and other small RNAs
from Arabidopsis. Plant Cell 16, 2001–2019 (2004).
23. Xie, Z., Kasschau, K.D. & Carrington, J.C. Negative feedback regulation of Dicer-Like1
in Arabidopsis by microRNA-guided mRNA degradation. Curr. Biol. 13, 784–789
(2003).
24. Vaucheret, H., Vazquez, F., Crete, P. & Bartel, D.P. The action of ARGONAUTE1 in the
miRNA pathway and its regulation by the miRNA pathway are crucial for plant
development. Genes Dev. 18, 1187–117 (2004).
25. Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424,
147–151 (2003).
1289
© 2004 Nature Publishing Group http://www.nature.com/naturegenetics
ARTICLES
26. Teichmann, S.A. & Babu, M.M. Gene regulatory network growth by duplication.
Nat. Genet. 36, 492–496 (2004).
27. Rhoades, M.W. et al. Prediction of plant microRNA targets. Cell 110, 513–520
(2002).
28. Vazquez, F., Gasciolli, V., Crete, P. & Vaucheret, H. The nuclear dsRNA binding protein
HYL1 is required for microRNA accumulation and plant development, but not
posttranscriptional transgene silencing. Curr. Biol. 14, 346–351 (2004).
29. Dunoyer, P., Lecellier, C.H., Parizotto, E.A., Himber, C. & Voinnet, O. Probing the
microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA
silencing. Plant Cell 16, 1235–1250 (2004).
30. Llave, C., Kasschau, K.D., Rector, M.A. & Carrington, J.C. Endogenous and silencingassociated small RNAs in plants. Plant Cell 14, 1605–1619 (2002).
31. Papp, I. et al. Evidence for nuclear processing of plant micro RNA and short interfering
RNA precursors. Plant Physiol. 132, 1382–1390 (2003).
32. Xie, Z. et al. Genetic and functional diversification of small RNA pathways in plants.
PLoS Biol. 2, E104 (2004).
33. Peragine, A., Yoshikawa, M., Wu, G., Albrecht, H.L. & Poethig, R.S. SGS3 and SGS2/
SDE1/RDR6 are required for juvenile development and the production of trans-acting
siRNAs in Arabidopsis. Genes Dev. 18, 2368–2379(2004).
34. Vazquez, F. et al. Endogenous trans-acting siRNAs regulate the accumulation of
Arabidopsis mRNAs. Mol. Cell 16, 69–79 (2004).
35. Schauer, S.E., Jacobsen, S.E., Meinke, D.W. & Ray, A. DICER-LIKE1: blind men and
elephants in Arabidopsis development. Trends Plant Sci. 7, 487–491 (2002).
36. Holder, M. & Lewis, P.O. Phylogeny estimation: traditional and Bayesian approaches.
Nat. Rev. Genet. 4, 275–284 (2003).
37. Mallory, A.C. et al. MicroRNA control of PHABULOSA in leaf development: importance
of pairing to the microRNA 5¢ region. EMBO J. 23, 3356–3364 (2004).
1290
38. Kasschau, K.D. et al. P1/HC-Pro, a viral suppressor of RNA silencing, interferes with
Arabidopsis development and miRNA unction. Dev. Cell 4, 205–217 (2003).
39. Bartel, B. & Bartel, D.P. MicroRNAs: at the root of plant development? Plant Physiol.
132, 709–717 (2003).
40. Llave, C., Xie, Z., Kasschau, K.D. & Carrington, J.C. Cleavage of Scarecrow-like
mRNA targets directed by a class of Arabidopsis miRNA. Science 297, 2053–2056
(2002).
41. Herbert, A. The four Rs of RNA-directed evolution. Nat. Genet. 36, 19–25 (2004).
42. Bartel, D.P. & Chen, C.Z. Micromanagers of gene expression: the potentially
widespread influence of metazoan microRNAs. Nat. Rev. Genet. 5, 396–400
(2004).
43. Hurles, M. Gene duplication: the genomic trade in spare parts. PLoS Biol. 2, E206
(2004).
44. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open
Software Suite. Trends Genet. 16, 276–277 (2000).
45. Hofacker, I.L. Vienna RNA secondary structure server. Nucleic Acids Res. 31,
3429–3431 (2003).
46. Lin, J.-T. Alternatives to Hamaker’s approximations to the cumulative normal
distribution and its inverse. Statistician 37, 413–414 (1988).
47. Notredame, C., Higgins, D.G. & Heringa, J. T-Coffee: A novel method for fast and
accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).
48. Huelsenbeck, J.P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic
trees. Bioinformatics 17, 754–755 (2001).
49. Posada, D. & Crandall, K.A. MODELTEST: testing the model of DNA substitution.
Bioinformatics 14, 817–818 (1998).
50. Swofford, D PAUP*: Phylogenetic analysis using parsimony (*and other methods)
4.0b10 edn. (Sinauer, Sunderland, Massachusetts, 2002).
VOLUME 36
[
NUMBER 12
[
DECEMBER 2004 NATURE GENETICS
Download