downloaded

advertisement
Ahmed et al.
1
Supplemental Results and Discussions: Comprehensive
analysis of small RNA-seq data reveals that combination of
miRNA with its isomiRs increase the accuracy of target
prediction in Arabidopsis
Firoz Ahmed1$, Muthappa Senthil-Kumar1, 2, Seonghee Lee1, Xinbin Dai1, Kirankumar S.
Mysore1 and Patrick Xuechun Zhao1, *
1
Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, Oklahoma 73401, USA
2
National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
$ Current Address: Center for Genomics & Systems Biology, New York University, 12 Waverly
Place, New York, NY 10003, USA
* Corresponding author: Patrick Xuechun Zhao, E-mail: pzhao@noble.org, Phone: +1-580-2246725, Fax: +1-580-224-4743
Ahmed et al.
2
Supplemental Results
In this section, we describe our investigation of sib-miRs with a focus on their biogenesis,
characteristic length, and terminal sequence patterns leading to Argonaute (AGO) protein
loading and DCL cleavage.
Features of sib-miRs
Overview of sib-miRs and their orientation on pre-miRNAs
Although one arm of a pre-miRNA typically generates a canonical miRNA, previous studies
have reported an arm-switching phenomena in which the source of the dominant mature miRNA
changes from one arm to the other depending on the tissue and developmental stage
1-4
.
Therefore, in order to better understand the biogenesis of mature sequences, we analyzed the
propensity of each arm (5p and 3p) of 132 Arabidopsis pre-miRNAs (miRBase 18) to generate
sib-miRs. From deep-sequencing data of small RNAs, 3,574,444 reads (normalized as Read Per
Million, see Materials and Methods) were mapped to the 5p-arm, which contained 1,322 distinct
sib-miRs including 76 annotated miRNAs and one annotated miRNA*. Similarly, 8,457,739
reads were mapped to the 3p-arm, which contained 1,408 distinct sib-miRs including 71
annotated miRNAs and three annotated miRNAs*.
On average, our data demonstrate that the read count of sib-miRs generated from the 3parm was 2.3 times greater than the 5p-arm. However, the 3p-arm of ath-MIR159a contributed a
large number of sib-miRs (3,820 from the 5p-arm and 6,607,434 from the 3p-arm). We also
observed a few unusual features of the sib-miRs. One sRNA from the 5p-arm mapped to two
different pre-miRNAs (ath-MIR167a, ath-miR167c). Another sRNA mapped to three different
locations on ath-MIR2933a (5p-arm, loop-region, and 3p-arm), revealing a stretch of sequence
“GGCGAUUUUGAGUGAAAUCGGAGAGG” found in triplicate. Moreover, 19 sRNAs from
the 3p-arm mapped to two different locations within ath-MIR783, revealing a stretch of sequence
“UCAUGUUCUCCUGAAUCUUCCGACAAAA” in duplicate. Further analysis of several
distinct sib-miRs demonstrated that the median number of sib-miRs generated from each premiRNA was 5 (mean = 10) from the 5p-arm and 5 (mean = 10.7) from the 3p-arm with no
significant difference (Wilcoxon test p-value = 0.6162, W = 9022.5) (Figure S9). We also
compared the number of sib-miRs generated from each arm of a pre-miRNA and found that a
Ahmed et al.
3
median of 463 (mean = 27,079) and 351 (mean = 64,073) reads were generated from the 5p- and
3p-arms, respectively, with no significant difference (Wilcoxon test p-value = 0.6332, W =
9008.5) (Figure S10).
miRNA abundance may be influenced by the transcription rate of miRNA genes, tissue
type, stage of cell differentiation, and environmental conditions 5-7. To negate the effect of these
variables on miRNA gene expression levels, we normalized sib-miRs within pre-miRNA (see
Materials and Methods). Doing so yielded more reliable data representing the features of sibmiRs. Therefore, unless otherwise stated, this normalized data was used throughout the
remainder of the study. Analysis of these results showed that the median fraction of mature
sequences generated from the 5p- and 3p-arms was 54.79% (mean = 51.28) and 45.21% (mean =
48.71%), respectively, with no statistical difference (Wilcoxon test p-value = 0.8107, W = 8861)
(Figure S11).
Length diversity of sib-miRs
Plants produce different classes of endogenous sRNA through a distinct mechanism of
transcriptional or post-transcriptional gene silencing 8-10. Each class is characterized by a specific
length and accomplishes certain biological functions
potential to sort into a specific AGO protein
11
8, 9
. Moreover, sRNA length determines its
. In order to gain a deeper understanding of the
optimal miRNA size in Arabidopsis, we analyzed the size distribution of sib-miRs. Our data
demonstrated that the length varies from 18-28 nucleotides for the 5p-arm and 18-27 nucleotides
for the 3p-arm. Overall, 67% of all sib-miRs were 21 nucleotides in length while 14% were 22
nucleotides long (Figure S12). Both arms contributed nearly the same ratio of the various
lengths.
Terminal nucleotide characteristics of sib-miRs
The importance of terminal nucleotides in miRNA/siRNA/pre-miRNA has been established by
several investigations
11, 12
. Nucleotides at both terminals provide asymmetrical stability to the
miRNA duplex and can differentiate miRNA from miRNA*
13-15
. Moreover, nucleotides at the
5’- and 3’-end are responsible for loading into distinct AGO complexes
miRNA half-life, respectively
16-19
11
and controlling the
. Therefore, the terminal nucleotides of sib-miRs were
analyzed to gain insight into any nucleotide patterns that may be preferred at each terminus. Our
data revealed the highest preference for U at the 5-end in 60% of all the sib-miRs examined. The
Ahmed et al.
4
frequency of the remaining nucleotides at the 5’-end were A > C > G (Figure S13A).
Furthermore, a clear pattern of dinucleotides at the 5’-terminus was also observed (Table S10)
with UU and GC as the most (22%) and least (0.42%) preferred, respectively. Interestingly, both
arms contributed equally to the mono- and di-nucleotide patterns observed.
To examine the sequence preferences at the 3-end of sib-miRs, we computed the
frequency of mono- and dinucleotides. Our data demonstrated clear differences in the nucleotide
preferences of the two arms. While the 5p-arm showed a preference of A > G > C > U, the 3parm preferred U > A > C > G (Figure S13B). At 5p-arm, AA was most preferred (15.7%)
dinucleotide followed by CA (12.88%); while at 3p-arm, AU (10.68%) and UU (10.08%) were
highly preferred (Table S10). A few dinucleotides were not preferred in the sib-miRs sequences
analyzed in this study, namely CU (1.58%) at the 5p-arm and GG (1.53%) at the 3p-arm.
Sorting of sib-miRs into AGO proteins
The AGO protein is the catalytic core of the RNA-induced silencing complex (RISC) for gene
silencing, which is guided by sRNA
20
. The Arabidopsis genome encodes ten classes of AGO
proteins, which have different biological functions (e.g., RNA slicing, RNA binding, and
chromatin modification) and binding affinity with different classes of sRNA 21, 22. To understand
the miRNA sorting specificity towards different AGO proteins for different biological functions,
we analyzed how sib-miRs are distributed between four AGO proteins, namely AGO1, AGO2,
AGO4, and AGO5. Among 2,709 unique sib-miRs, only 960 sequences have been confirmed
experimentally to bind AGOs 23 (Figure S14A). However, our analysis found that 254 sequences
may sort to AGO1, 100 to AGO2, 171 to AGO4, and 131 to AGO5. Several sequences sorted
into more than one AGO protein; thus, the overall number of unique sequences sorted into AGOs
were 487 for AGO1, 248 for AGO2, 370 for AGO4, and 334 for AGO5.
The remaining 1,749 sequences were predicted for AGO sorting specificity using our inhouse SVM models (unpublished). SVM is a state-of-the-art machine learning technique that has
been extensively applied for classification and regression problems, for instance: prediction of
guide strand of miRNA
13
, polyadenylation signals
24
, siRNA design
25
and protein subcellular
localization 26-28. We used SVM models at “0” threshold and found 1,740 sequences assigned to
AGOs and nine sequences that lacked specificity for any AGO protein (Figure S14B). Here we
found 411 sequences specifically sorted into AGO1, while 52 were assigned to AGO2, 306 to
Ahmed et al.
5
AGO4, and 207 to AGO5. The total number of sequences, including the ones that are commonly
sorted into different AGOs were: 844 for AGO1, 377 for AGO2, 877 for AGO4, and 422 for
AGO5.
Expression of sib-miRs from conserved and non-conserved miRNA genes
Studies suggest that miRNA genes have evolved over time and were retained or removed from
the genome based on fitness
29, 30
. Several A. thaliana pre-miRNAs are highly conserved across
other species of plants, while a few are species-specific and considered evolutionarily young.
Previous studies reported that conserved miRNA genes are expressed at higher levels than nonconserved ones
31-33
. However, to the best of our knowledge, the relationship between miRNA
gene conservation across species and their expression level as sib-miRs has yet to be studied in
Arabidopsis. Therefore, we examined the Spearman correlation between the number of plant
species in which an ath-miRNA gene is conserved and the number of distinct sib-miRs generated
from it. We found a positive correlation of ρ = 0.31688 with p-value = 2.14e-4 (Figure S15).
When we calculated the correlation between the number of plant species in which an ath-miRNA
gene is conserved and their expression level (as read counts of sib-miRs), a positive correlation
of ρ = 0.49 with p-value = 2.22e-9 was observed (Figure S16). Finally, we examined the
correlation between the number of distinct sib-miRs generated from pre-miRNA in Arabidopsis
and their expression level (as read counts of sib-miRs), a positive correlation of ρ = 0.84 with pvalue = 1.13e-36 was observed (Figure S17). The results indicate that conserved miRNA gene
expresses distinct and more sib-miRs compared to the non-conserved miRNA gene.
Accuracy of DCL cleavage of miRNA hairpins
Along with a canonical miRNA, several isomiRs are also generated with slight variations in the
terminal sequence. Previous studies indicate that terminal variation arises due to slippage of DCL
at the cleavage site on precursor molecules; however, the exact mechanism remains unclear 34, 35.
Thus, to understand the accuracy of pre-miRNA cleavage in A. thaliana, we analyzed the
terminal heterogeneity at the 5’- and 3’-end miRNA generated from both arms (see Materials and
Methods). On the 5p-arm, DCL cleaves the 5’-end with a mean heterogeneity of 0.447 (median =
0.0867), while the 3’-end is cleaved with a mean heterogeneity of 0.443 (median = 0.1595)
(Figure S18A). A similar pattern was observed on the 3p-arm, in which the 5’- and 3’-ends
Ahmed et al.
6
displayed a mean heterogeneity of 0.4376 (median = 0.0758559) and 0.5436162 (median =
0.1555154), respectively (Figure S18B). We observed more variation at 3’-end processing than
5’-end in 3p-arm generated isomiRs. However, processing at the 5’- and 3’-ends were not
statistically different in Arabidopsis (Wilcoxon test p-value: 0.3302 for the 5p-arm and 0.1998
for the 3p-arm).
Binding targets of miRNAs and corresponding isomiRs in human
We also analyzed the targets of miRNAs and their isomiRs in human to check the relevance of
isomiRs in target binding outside of plant species. We retrieved deep-sequencing reads of hsalet-7a-5p, hsa-miR-17-5p, and hsa-miR-21-5p from miRBase 19, and selected reads having an
RMP equal to or greater than 1. The miRanda-3.3 tool
36
was then used to predict the targets of
miRNAs and their isomiRs against the 3′-UTR sequence of human coding RefSeqs. (The 3’UTR sequences were downloaded from assembly hg19 from the UCSC Human Genome Brower
and were made non-redundant before use
37
.) Candidate targets were selected if binding with
miRNA/isomiRs resulted in a Max Score > 140 and Max Energy (ΔG) < −10 kcal/mol
36
. For
each miRNA, this threshold yielded thousands of candidate targets, which were further compared
against experimentally validated target mRNAs compiled from mirTarBase, TarBase,
miRecords, miR2Disease, and published literature
38-43
. Our miRanda analysis produced 26 out
of 42 validated targets of hsa-let-7a-5p, 43 out of 56 validated targets of hsa-let-17-5p, and 51
out of 76 validated targets of hsa-let-21-5p. We found that nearly all validated hsa-miR-17-5p
target mRNAs are recognized by several isomiRs (Figure S19). Similar results were obtained for
hsa-let-7a-5p and hsa-miR-21-5p (Figure S20, S21). Furthermore, incorporating our new finding,
we have predicted targets of miRNA which need to validate experimentally (Figure S20, S21).
All available miRNA target prediction tools produce false positives. Nevertheless, we found that
miRNA target binding in animals was similar to that in plants, suggesting that validated mRNAs
are targeted by most members of isomiRs.
Supplemental Discussion
Our initial analysis demonstrated that there are no significant differences in generating sib-miRs
from the 5p- or 3p-arm of a pre-miRNA (Figure S11). This suggests that both arms have similar
Ahmed et al.
7
potential to generate mature sequences, and that arm-switching may be governed by features of
the mature sequence, cellular conditions, or other unknown factors. We observed that sib-miR
size is not distributed evenly, with 21-nt sib-miRs in greatest abundance (67%) (Figure S12).
This result is consistent with earlier findings in Arabidopsis (~77%)
44
and (~80%)
45
. Our
analysis revealed that sib-miRs of 22 and 24 nucleotides in length were the second and third
most abundant species in Arabidopsis, respectively. Another study reported that sRNAs of 24
nucleotides in length were expressed most highly (83%) in Arabidopsis and mostly belonged to
cis-acting siRNA 10. However, this study was conducted on a small dataset consisting of different
classes of small RNAs including miRNA 10.
Both terminal nucleotides of sRNAs do not participate in target site recognition, but play
other important roles
46, 47
. For instance, sRNAs beginning with 5’ terminal Uridine, Adenine,
Adenine and Cytosine generally facilitate its sorting into AGO1, AGO2, AGO4, and AGO5,
respectively 11. Uridylation and adenylation at the 3’-end of miRNAs leads to destabilization and
stabilization, respectively, thereby controlling miRNA decay in plants and animals 16-19, 48-51. Our
data shows a high prevalence of U at the 5-end of sib-miRs, indicating its importance in making
a guide strand (Figure S13A). Higher prevalence of Adenine and lower prevalence of Uridine at
3-end of sib-miRs may be acting as important role for stabilizing sib-miRs (Figure S13B).
Interestingly, the 5- and 3’-ends of sib-miRs also display a higher propensity for certain
dinucleotides. In addition, we found that most of the 2,709 unique sib-miRs have an affinity for
sorting into multiple AGO proteins, with AGO1 being the most preferred (Figure S14). This
result also supports other studies demonstrating that miRNA mostly sort into AGO1 in
Arabidopsis 10, 23.
The variations observed in sib-miR size have biological significance. Previous studies have
indicated that Dicer acts like a ruler by recognizing the 3-end with its PAZ domain and cleaving
the pre-miRNA at approximately 25 nucleotides from the 3-end with the RNase III domain 52-54.
Therefore, one would expect the length of sib-miRs derived from pre-miRNA to be consistent,
especially from the 3p-arm. However, variations in length were observed instead (i.e., 18-28
nucleotides from the 5p-arm, 18-27 nucleotides from the 3p-arm). In addition, we observed a
clear pattern of nucleotide preference at both sib-miR terminals. Dicer possesses an alpha helix
between the PAZ and RNase III domains, is enriched in basic amino acids, and interacts with
pre-miRNA by electrostatic interactions
53
. Moreover, one study reported that affinity of PAZ
Ahmed et al.
8
domain to interact with siRNA 3-end is different for different sequence patterns of 3-end
12
.
sRNA with CA overhangs are processed more efficiently than those with an AC overhang 55. Our
data support these findings due to the double prevalence of CA (7.24%) over AC at the 3terminal of the 3p-arm (Table S10). Studies indicate that the pre-miRNA structure and imprecise
cleavage by Drosha and Dicer are primarily responsible for generating miRNAs of varying
lengths in humans 56, 57. Studies also support that loop position in hairpin 58, and distance from 5end play critical role in the Dicer processing
59
. However, another possibility is that certain
sequence patterns are more favored for cleavage by Dicer/DCL
10, 57
and alpha-helix of Dicer
adjust to get more favorable cleavage pattern which leads to different lengths of sib-miRs. It may
also be possible that terminal nucleotides of sib-miRs play other roles. For example, high affinity
binding with different DCL domains may facilitate AGO loading or another functional role that
has yet to be identified experimentally.
The characteristic features of sib-miRs discovered in this study may be imparting a
physiological advantage in the RNAi mechanism which needs further experimental validation.
The features discovered in this study are more reliable and unbiased since the dataset examined
was generated using the following stringent criteria: (1) inclusion of canonical miRNA and all
sRNA generated from the miRNA precursor, (2) use of two normalization methods to negate the
influence of the transcription rate of miRNA genes, and (3) examination of only small RNAs
generated from non-redundant sets of miRNA genes. Therefore, we believe the features
identified by our study can be used to discover new miRNA
10, 45
and to design more effective
and specific siRNA/artificial miRNA 25, 60, 61.
Our data also demonstrate a positive correlation between a conserved miRNA gene and its
expression, which is consistent with previous findings
31-33
. The correlation data indicate that
more highly conserved miRNA genes express more distinct sib-miRs (Figure S15) and with high
copy number (Figure S16). Irrespective of evolution, we observed that highly expressed miRNA
genes produce more distinct sib-miRs (Figure S17).
In animals, different proteins, namely Drosha and Dicer, produce the terminal ends of
miRNAs. However, DCL1 generates both ends in plants
35, 62
. Studies in animals have shown
that miRNA precursors are more precisely processed at the 5’-end than the 3’-end irrespective of
the miRNA source arm
34, 35
. Intriguingly, our study did not identify any significant differences
in the precision of 5- and 3’-end processing in Arabidopsis. However, slightly greater 3’-end
Ahmed et al.
9
heterogeneity of isomiRs was observed on the 3p-arm. Therefore, these observations indicate
that DCL cleaves both hairpin arms with similar accuracy (Figures S17A and S17B). When we
analyzed the effect of terminal heterogeneity of miRNA in human, we again observed that
validated target mRNA has potential to bind with several miRNA/isomiRs. This result further
strengthens the role of isomiRs in specific gene silencing and strongly favors the implementation
of our novel method beyond plants.
Ahmed et al.
10
References
1.
Griffiths-Jones S, Hui JH, Marco A, Ronshaugen M. MicroRNA evolution by arm
switching. EMBO reports 2011; 12:172-7.
2.
Glazov EA, Cottee PA, Barris WC, Moore RJ, Dalrymple BP, Tizard ML. A microRNA
catalog of the developing chicken embryo identified by a deep sequencing approach. Genome
research 2008; 18:957-64.
3.
Jagadeeswaran G, Zheng Y, Sumathipala N, Jiang H, Arrese EL, Soulages JL, et al. Deep
sequencing of small RNA libraries reveals dynamic regulation of conserved and novel
microRNAs and microRNA-stars during silkworm development. BMC genomics 2010; 11:52.
4.
Marco A, Hui JH, Ronshaugen M, Griffiths-Jones S. Functional shifts in insect
microRNA evolution. Genome biology and evolution 2010; 2:686-96.
5.
Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of
microRNAs on target mRNA expression. Proceedings of the National Academy of Sciences of
the United States of America 2006; 103:2746-51.
6.
Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. Identification
of tissue-specific microRNAs from mouse. Current biology : CB 2002; 12:735-9.
7.
Meng Y, Shao C, Wang H, Chen M. The regulatory activities of plant microRNAs: a
more dynamic perspective. Plant physiology 2011; 157:1583-95.
8.
Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, et al.
Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2004;
2:E104.
9.
Vaucheret H. Post-transcriptional small RNA pathways in plants: mechanisms and
regulations. Genes Dev 2006; 20:759-71.
Ahmed et al.
10.
11
Wang X, Laurie JD, Liu T, Wentz J, Liu XS. Computational dissection of Arabidopsis
smRNAome leads to discovery of novel microRNAs and short interfering RNAs associated with
transcription start sites. Genomics 2011; 97:235-43.
11.
Mi S, Cai T, Hu Y, Chen Y, Hodges E, Ni F, et al. Sorting of small RNAs into
Arabidopsis Argonaute complexes is directed by the 5' terminal nucleotide. Cell 2008; 133:116 27.
12.
Ma JB, Ye K, Patel DJ. Structural basis for overhang-specific small interfering RNA
recognition by the PAZ domain. Nature 2004; 429:318-22.
13.
Ahmed F, Ansari HR, Raghava GP. Prediction of guide strand of microRNAs from its
sequence and secondary structure. BMC bioinformatics 2009; 10:105.
14.
Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. Asymmetry in the
assembly of the RNAi enzyme complex. Cell 2003; 115:199-208.
15.
Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand
bias. Cell 2003; 115:209-16.
16.
Bail S, Swerdel M, Liu H, Jiao X, Goff LA, Hart RP, et al. Differential regulation of
microRNA stability. RNA 2010; 16:1032-9.
17.
Ruegger S, Grosshans H. MicroRNA turnover: when, how, and why. Trends in
biochemical sciences 2012; 37:436-46.
18.
Li J, Yang Z, Yu B, Liu J, Chen X. Methylation protects miRNAs and siRNAs from a 3'-
end uridylation activity in Arabidopsis. Current biology : CB 2005; 15:1501-7.
19.
Neilsen CT, Goodall GJ, Bracken CP. IsomiRs - the overlooked repertoire in the dynamic
microRNAome. Trends in genetics : TIG 2012; 28:544-9.
Ahmed et al.
20.
12
Mallory A, Vaucheret H. Form, function, and regulation of ARGONAUTE proteins. The
Plant cell 2010; 22:3879-89.
21.
Vaucheret H. Plant ARGONAUTES. Trends in plant science 2008; 13:350-8.
22.
Kim VN. Sorting out small RNAs. Cell 2008; 133:25-6.
23.
Mi S, Cai T, Hu Y, Chen Y, Hodges E, Ni F, et al. Sorting of small RNAs into
Arabidopsis argonaute complexes is directed by the 5' terminal nucleotide. Cell 2008; 133:11627.
24.
Ahmed F, Kumar M, Raghava GP. Prediction of polyadenylation signals in human DNA
sequences using nucleotide frequencies. In silico biology 2009; 9:135-48.
25.
Ahmed F, Raghava GPS. Designing of Highly Effective Complementary and Mismatch
siRNAs for Silencing a Gene. PLoS ONE 2011; 6:e23443.
26.
Kaundal R, Raghava GP. RSLpred: an integrative system for predicting subcellular
localization of rice proteins combining compositional and evolutionary information. Proteomics
2009; 9:2324-42.
27.
Kaundal R, Saini R, Zhao PX. Combining machine learning and homology-based
approaches to accurately predict subcellular localization in Arabidopsis. Plant physiology 2010;
154:36-54.
28.
Garg A, Bhasin M, Raghava GP. Support vector machine-based method for subcellular
localization of human proteins using amino acid compositions, their order, and similarity search.
J Biol Chem 2005; 280:14427-32.
29.
Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, et al.
High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death
of MIRNA genes. PLoS One 2007; 2:e219.
Ahmed et al.
30.
13
Lu J, Shen Y, Wu Q, Kumar S, He B, Shi S, et al. The birth and death of microRNA
genes in Drosophila. Nat Genet 2008; 40:351-5.
31.
Roux J, Gonzalez-Porta M, Robinson-Rechavi M. Comparative analysis of human and
mouse expression data illuminates tissue-specific evolutionary patterns of miRNAs. Nucleic
acids research 2012; 40:5890-900.
32.
Liang H, Li WH. Lowly expressed human microRNA genes evolve rapidly. Molecular
biology and evolution 2009; 26:1195-8.
33.
Ruby JG, Stark A, Johnston WK, Kellis M, Bartel DP, Lai EC. Evolution, biogenesis,
expression, and target predictions of a substantially expanded set of Drosophila microRNAs.
Genome Res 2007; 17:1850-64.
34.
Hu HY, Yan Z, Xu Y, Hu H, Menzel C, Zhou YH, et al. Sequence features associated
with microRNA strand selection in humans and flies. BMC Genomics 2009; 10:413.
35.
Seitz H, Ghildiyal M, Zamore PD. Argonaute loading improves the 5' precision of both
MicroRNAs and their miRNA strands in flies. Curr Biol 2008; 18:147-51.
36.
Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. MicroRNA targets in
Drosophila. Genome Biol 2003; 5:R1.
37.
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human
genome browser at UCSC. Genome Res 2002; 12:996-1006.
38.
Cloonan N, Brown MK, Steptoe AL, Wani S, Chan WL, Forrest AR, et al. The miR-17-
5p microRNA is a key regulator of the G1/S phase cell cycle transition. Genome biology 2008;
9:R127.
Ahmed et al.
39.
14
Wang P, Ning S, Wang Q, Li R, Ye J, Zhao Z, et al. mirTarPri: improved prioritization of
microRNA targets through incorporation of functional genomics data. PLoS ONE 2013;
8:e53685.
40.
Hsu SD, Lin FM, Wu WY, Liang C, Huang WC, Chan WL, et al. miRTarBase: a
database curates experimentally validated microRNA-target interactions. Nucleic Acids Res
2011; 39:D163-9.
41.
Vergoulis T, Vlachos IS, Alexiou P, Georgakilas G, Maragkakis M, Reczko M, et al.
TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support.
Nucleic Acids Res 2012; 40:D222-9.
42.
Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. miR2Disease: a manually
curated database for microRNA deregulation in human disease. Nucleic acids research 2009;
37:D98-104.
43.
Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T. miRecords: an integrated resource for
microRNA-target interactions. Nucleic Acids Res 2009; 37:D105-10.
44.
Meng Y, Shao C, Gou L, Jin Y, Chen M. Construction of microRNA- and microRNA*-
mediated regulatory networks in plants. RNA biology 2011; 8:1124-48.
45.
Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of
microRNAs in Arabidopsis thaliana. Genes Dev 2006; 20:3407-25.
46.
Du Q, Thonberg H, Wang J, Wahlestedt C, Liang Z. A systematic analysis of the
silencing effects of an active siRNA at all single-nucleotide mismatched target sites. Nucleic
Acids Res 2005; 33:1671-7.
Ahmed et al.
47.
15
Dahlgren C, Zhang HY, Du Q, Grahn M, Norstedt G, Wahlestedt C, et al. Analysis of
siRNA specificity on targets with double-nucleotide mismatches. Nucleic Acids Res 2008;
36:e53.
48.
Ren G, Chen X, Yu B. Uridylation of miRNAs by hen1 suppressor1 in Arabidopsis.
Current biology : CB 2012; 22:695-700.
49.
Zhao Y, Yu Y, Zhai J, Ramachandran V, Dinh TT, Meyers BC, et al. The Arabidopsis
nucleotidyl transferase HESO1 uridylates unmethylated small RNAs to trigger their degradation.
Current biology : CB 2012; 22:689-94.
50.
Katoh T, Sakaguchi Y, Miyauchi K, Suzuki T, Kashiwabara S, Baba T. Selective
stabilization of mammalian microRNAs by 3' adenylation mediated by the cytoplasmic poly(A)
polymerase GLD-2. Genes & development 2009; 23:433-8.
51.
Lu S, Sun YH, Chiang VL. Adenylation of plant miRNAs. Nucleic acids research 2009;
37:1878-85.
52.
MacRae IJ, Zhou K, Doudna JA. Structural determinants of RNA recognition and
cleavage by Dicer. Nat Struct Mol Biol 2007; 14:934-40.
53.
Macrae IJ, Zhou K, Li F, Repic A, Brooks AN, Cande WZ, et al. Structural basis for
double-stranded RNA processing by Dicer. Science 2006; 311:195-8.
54.
Lima WF, Murray H, Nichols JG, Wu H, Sun H, Prakash TP, et al. Human Dicer binds
short single-strand and double-strand RNA with high affinity and interacts with different regions
of the nucleic acids. J Biol Chem 2009; 284:2535-48.
55.
Vermeulen A, Behlen L, Reynolds A, Wolfson A, Marshall WS, Karpilow J, et al. The
contributions of dsRNA structure to Dicer specificity and efficiency. RNA 2005; 11:674-82.
Ahmed et al.
56.
16
Starega-Roslan J, Krol J, Koscianska E, Kozlowski P, Szlachcic WJ, Sobczak K, et al.
Structural basis of microRNA length variety. Nucleic Acids Res 2011; 39:257-68.
57.
Ahmed F, Kaundal R, Raghava GP. PHDcleav: a SVM based method for predicting
human Dicer cleavage sites using sequence and secondary structure of miRNA precursors. BMC
bioinformatics 2013; 14 Suppl 14:S9.
58.
Gu S, Jin L, Zhang Y, Huang Y, Zhang F, Valdmanis PN, et al. The loop position of
shRNAs and pre-miRNAs is critical for the accuracy of dicer processing in vivo. Cell 2012;
151:900-11.
59.
Park JE, Heo I, Tian Y, Simanshu DK, Chang H, Jee D, et al. Dicer recognizes the 5' end
of RNA for efficient and accurate processing. Nature 2011; 475:201-5.
60.
Tyagi A, Ahmed F, Thakur N, Sharma A, Raghava GPS, Kumar M. HIVsirDB: A
Database of HIV Inhibiting siRNAs. PLoS ONE 2011; 6:e25917.
61.
Senthil-Kumar M, Mysore KS. Caveat of RNAi in plants: the off-target effect. Methods
in molecular biology 2011; 744:13-25.
62.
Henderson IR, Zhang X, Lu C, Johnson L, Meyers BC, Green PJ, et al. Dissecting
Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA
methylation patterning. Nature genetics 2006; 38:721-5.
Ahmed et al.
17
Figure Legends
Figure S9: Number of sib-miRNAs generated from the 5p- and 3p-arm of each pre-miRNA.
The median number of sib-miRNAs generated was 5 (mean = 10) from the 5p-arm and 5 (mean
= 10.7) from the 3p-arm with no significant difference (Wilcoxon test p-value = 0.6162).
Figure S10: Reads count of sib-miRNAs generated from the 5p- and 3p-arm of each premiRNA. A median of 463 (mean = 27,079) and 351 (mean = 64,073) reads was generated from
the 5p- and 3p-arms, respectively, with no significant difference (Wilcoxon test p-value =
0.6332).
Figure S11: Fraction of sib-miRs generated from the 5p- and 3p-arm of each pre-miRNA. The
median fraction of mature sequence was 54.79% (mean = 51.28) for the 5p-arm and 45.21%
(mean = 48.71%) for the 3p-arm with no statistical difference (Wilcoxon test p-value = 0.8107).
Figure S12: Distribution of sib-miR sizes generated from a population of 132 pre-miRNAs.
Figure S13: Frequency of nucleotide occurrences at the (A) 5’-end and the (B) 3’-end of sibmiRs.
Figure S14: Sorting of unique sib-miRs into different AGO proteins. (A) Results from
experimental validation of sorting 960 sib-miR sequences with AGO proteins. (B) sib-miR
sequences
(1,740)
predicted
to
sort
with
AGO
proteins.
Figure S15: Correlation between conserved plant pre-miRNAs and the number of distinct sibmiRNAs produced (ρ = 0.31, p = 2.14e-4). The y-axis is presented in logarithmic scale.
Figure S16: Correlation between conserved plant pre-miRNAs and the read counts (RPM) of
sib-miRNAs produced (ρ = 0.49, p = 2.22e-9). The y-axis is presented in logarithmic scale.
Ahmed et al.
18
Figure S17: Correlation between the number of sib-miRNAs generated from a pre-miRNA and
their read count (RPM) in A. thaliana (ρ = 0.84, p = 1.13e-36). The x- and y-axes are presented
in logarithmic scale.
Figure S18: Cleavage accuracy of DCL on pre-miRNAs. (A) For the 5p-arm, the median
heterogeneity at the 5’- and 3’-ends were 0.086 and 0.159, respectively (Wilcoxon test p-value =
0.3302). (B) For the 3p-arm, the median heterogeneity at the 5’- and 3’-ends were 0.0758559 and
0.1555154, respectively (Wilcoxon test p-value = 0.1998).
Figure S19. Effect of terminal heterogeneity of isomiRs on hsa-miR-17-5p target genes. Green
and black in the heatmap indicates the presence and absence of isomiR target, respectively. The
canonical miRNA, hsa-miR-17-5p, is denoted by the red box while experimentally validated
targets are listed in red.
Figure S20. Effect of terminal heterogeneity of isomiRs on hsa-let-7a-5p target genes. Green
and black in the heatmap indicates the presence and absence of isomiR target, respectively. The
canonical miRNA, hsa-let-7a-5p, is denoted by the red box while experimentally validated
targets are listed in red and putative targets are indicated in blue.
Figure S21. Effect of terminal heterogeneity of isomiRs on hsa-miR-21-5p target genes. Green
and black in the heatmap indicates the presence and absence of isomiR target, respectively. The
canonical miRNA, hsa-miR-21-5p, is denoted by the red box while experimentally validated
targets are listed in red and putative targets are indicated in blue.
Download