Ahmed et al. 1 Supplemental Results and Discussions: Comprehensive analysis of small RNA-seq data reveals that combination of miRNA with its isomiRs increase the accuracy of target prediction in Arabidopsis Firoz Ahmed1$, Muthappa Senthil-Kumar1, 2, Seonghee Lee1, Xinbin Dai1, Kirankumar S. Mysore1 and Patrick Xuechun Zhao1, * 1 Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, Oklahoma 73401, USA 2 National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India $ Current Address: Center for Genomics & Systems Biology, New York University, 12 Waverly Place, New York, NY 10003, USA * Corresponding author: Patrick Xuechun Zhao, E-mail: pzhao@noble.org, Phone: +1-580-2246725, Fax: +1-580-224-4743 Ahmed et al. 2 Supplemental Results In this section, we describe our investigation of sib-miRs with a focus on their biogenesis, characteristic length, and terminal sequence patterns leading to Argonaute (AGO) protein loading and DCL cleavage. Features of sib-miRs Overview of sib-miRs and their orientation on pre-miRNAs Although one arm of a pre-miRNA typically generates a canonical miRNA, previous studies have reported an arm-switching phenomena in which the source of the dominant mature miRNA changes from one arm to the other depending on the tissue and developmental stage 1-4 . Therefore, in order to better understand the biogenesis of mature sequences, we analyzed the propensity of each arm (5p and 3p) of 132 Arabidopsis pre-miRNAs (miRBase 18) to generate sib-miRs. From deep-sequencing data of small RNAs, 3,574,444 reads (normalized as Read Per Million, see Materials and Methods) were mapped to the 5p-arm, which contained 1,322 distinct sib-miRs including 76 annotated miRNAs and one annotated miRNA*. Similarly, 8,457,739 reads were mapped to the 3p-arm, which contained 1,408 distinct sib-miRs including 71 annotated miRNAs and three annotated miRNAs*. On average, our data demonstrate that the read count of sib-miRs generated from the 3parm was 2.3 times greater than the 5p-arm. However, the 3p-arm of ath-MIR159a contributed a large number of sib-miRs (3,820 from the 5p-arm and 6,607,434 from the 3p-arm). We also observed a few unusual features of the sib-miRs. One sRNA from the 5p-arm mapped to two different pre-miRNAs (ath-MIR167a, ath-miR167c). Another sRNA mapped to three different locations on ath-MIR2933a (5p-arm, loop-region, and 3p-arm), revealing a stretch of sequence “GGCGAUUUUGAGUGAAAUCGGAGAGG” found in triplicate. Moreover, 19 sRNAs from the 3p-arm mapped to two different locations within ath-MIR783, revealing a stretch of sequence “UCAUGUUCUCCUGAAUCUUCCGACAAAA” in duplicate. Further analysis of several distinct sib-miRs demonstrated that the median number of sib-miRs generated from each premiRNA was 5 (mean = 10) from the 5p-arm and 5 (mean = 10.7) from the 3p-arm with no significant difference (Wilcoxon test p-value = 0.6162, W = 9022.5) (Figure S9). We also compared the number of sib-miRs generated from each arm of a pre-miRNA and found that a Ahmed et al. 3 median of 463 (mean = 27,079) and 351 (mean = 64,073) reads were generated from the 5p- and 3p-arms, respectively, with no significant difference (Wilcoxon test p-value = 0.6332, W = 9008.5) (Figure S10). miRNA abundance may be influenced by the transcription rate of miRNA genes, tissue type, stage of cell differentiation, and environmental conditions 5-7. To negate the effect of these variables on miRNA gene expression levels, we normalized sib-miRs within pre-miRNA (see Materials and Methods). Doing so yielded more reliable data representing the features of sibmiRs. Therefore, unless otherwise stated, this normalized data was used throughout the remainder of the study. Analysis of these results showed that the median fraction of mature sequences generated from the 5p- and 3p-arms was 54.79% (mean = 51.28) and 45.21% (mean = 48.71%), respectively, with no statistical difference (Wilcoxon test p-value = 0.8107, W = 8861) (Figure S11). Length diversity of sib-miRs Plants produce different classes of endogenous sRNA through a distinct mechanism of transcriptional or post-transcriptional gene silencing 8-10. Each class is characterized by a specific length and accomplishes certain biological functions potential to sort into a specific AGO protein 11 8, 9 . Moreover, sRNA length determines its . In order to gain a deeper understanding of the optimal miRNA size in Arabidopsis, we analyzed the size distribution of sib-miRs. Our data demonstrated that the length varies from 18-28 nucleotides for the 5p-arm and 18-27 nucleotides for the 3p-arm. Overall, 67% of all sib-miRs were 21 nucleotides in length while 14% were 22 nucleotides long (Figure S12). Both arms contributed nearly the same ratio of the various lengths. Terminal nucleotide characteristics of sib-miRs The importance of terminal nucleotides in miRNA/siRNA/pre-miRNA has been established by several investigations 11, 12 . Nucleotides at both terminals provide asymmetrical stability to the miRNA duplex and can differentiate miRNA from miRNA* 13-15 . Moreover, nucleotides at the 5’- and 3’-end are responsible for loading into distinct AGO complexes miRNA half-life, respectively 16-19 11 and controlling the . Therefore, the terminal nucleotides of sib-miRs were analyzed to gain insight into any nucleotide patterns that may be preferred at each terminus. Our data revealed the highest preference for U at the 5-end in 60% of all the sib-miRs examined. The Ahmed et al. 4 frequency of the remaining nucleotides at the 5’-end were A > C > G (Figure S13A). Furthermore, a clear pattern of dinucleotides at the 5’-terminus was also observed (Table S10) with UU and GC as the most (22%) and least (0.42%) preferred, respectively. Interestingly, both arms contributed equally to the mono- and di-nucleotide patterns observed. To examine the sequence preferences at the 3-end of sib-miRs, we computed the frequency of mono- and dinucleotides. Our data demonstrated clear differences in the nucleotide preferences of the two arms. While the 5p-arm showed a preference of A > G > C > U, the 3parm preferred U > A > C > G (Figure S13B). At 5p-arm, AA was most preferred (15.7%) dinucleotide followed by CA (12.88%); while at 3p-arm, AU (10.68%) and UU (10.08%) were highly preferred (Table S10). A few dinucleotides were not preferred in the sib-miRs sequences analyzed in this study, namely CU (1.58%) at the 5p-arm and GG (1.53%) at the 3p-arm. Sorting of sib-miRs into AGO proteins The AGO protein is the catalytic core of the RNA-induced silencing complex (RISC) for gene silencing, which is guided by sRNA 20 . The Arabidopsis genome encodes ten classes of AGO proteins, which have different biological functions (e.g., RNA slicing, RNA binding, and chromatin modification) and binding affinity with different classes of sRNA 21, 22. To understand the miRNA sorting specificity towards different AGO proteins for different biological functions, we analyzed how sib-miRs are distributed between four AGO proteins, namely AGO1, AGO2, AGO4, and AGO5. Among 2,709 unique sib-miRs, only 960 sequences have been confirmed experimentally to bind AGOs 23 (Figure S14A). However, our analysis found that 254 sequences may sort to AGO1, 100 to AGO2, 171 to AGO4, and 131 to AGO5. Several sequences sorted into more than one AGO protein; thus, the overall number of unique sequences sorted into AGOs were 487 for AGO1, 248 for AGO2, 370 for AGO4, and 334 for AGO5. The remaining 1,749 sequences were predicted for AGO sorting specificity using our inhouse SVM models (unpublished). SVM is a state-of-the-art machine learning technique that has been extensively applied for classification and regression problems, for instance: prediction of guide strand of miRNA 13 , polyadenylation signals 24 , siRNA design 25 and protein subcellular localization 26-28. We used SVM models at “0” threshold and found 1,740 sequences assigned to AGOs and nine sequences that lacked specificity for any AGO protein (Figure S14B). Here we found 411 sequences specifically sorted into AGO1, while 52 were assigned to AGO2, 306 to Ahmed et al. 5 AGO4, and 207 to AGO5. The total number of sequences, including the ones that are commonly sorted into different AGOs were: 844 for AGO1, 377 for AGO2, 877 for AGO4, and 422 for AGO5. Expression of sib-miRs from conserved and non-conserved miRNA genes Studies suggest that miRNA genes have evolved over time and were retained or removed from the genome based on fitness 29, 30 . Several A. thaliana pre-miRNAs are highly conserved across other species of plants, while a few are species-specific and considered evolutionarily young. Previous studies reported that conserved miRNA genes are expressed at higher levels than nonconserved ones 31-33 . However, to the best of our knowledge, the relationship between miRNA gene conservation across species and their expression level as sib-miRs has yet to be studied in Arabidopsis. Therefore, we examined the Spearman correlation between the number of plant species in which an ath-miRNA gene is conserved and the number of distinct sib-miRs generated from it. We found a positive correlation of ρ = 0.31688 with p-value = 2.14e-4 (Figure S15). When we calculated the correlation between the number of plant species in which an ath-miRNA gene is conserved and their expression level (as read counts of sib-miRs), a positive correlation of ρ = 0.49 with p-value = 2.22e-9 was observed (Figure S16). Finally, we examined the correlation between the number of distinct sib-miRs generated from pre-miRNA in Arabidopsis and their expression level (as read counts of sib-miRs), a positive correlation of ρ = 0.84 with pvalue = 1.13e-36 was observed (Figure S17). The results indicate that conserved miRNA gene expresses distinct and more sib-miRs compared to the non-conserved miRNA gene. Accuracy of DCL cleavage of miRNA hairpins Along with a canonical miRNA, several isomiRs are also generated with slight variations in the terminal sequence. Previous studies indicate that terminal variation arises due to slippage of DCL at the cleavage site on precursor molecules; however, the exact mechanism remains unclear 34, 35. Thus, to understand the accuracy of pre-miRNA cleavage in A. thaliana, we analyzed the terminal heterogeneity at the 5’- and 3’-end miRNA generated from both arms (see Materials and Methods). On the 5p-arm, DCL cleaves the 5’-end with a mean heterogeneity of 0.447 (median = 0.0867), while the 3’-end is cleaved with a mean heterogeneity of 0.443 (median = 0.1595) (Figure S18A). A similar pattern was observed on the 3p-arm, in which the 5’- and 3’-ends Ahmed et al. 6 displayed a mean heterogeneity of 0.4376 (median = 0.0758559) and 0.5436162 (median = 0.1555154), respectively (Figure S18B). We observed more variation at 3’-end processing than 5’-end in 3p-arm generated isomiRs. However, processing at the 5’- and 3’-ends were not statistically different in Arabidopsis (Wilcoxon test p-value: 0.3302 for the 5p-arm and 0.1998 for the 3p-arm). Binding targets of miRNAs and corresponding isomiRs in human We also analyzed the targets of miRNAs and their isomiRs in human to check the relevance of isomiRs in target binding outside of plant species. We retrieved deep-sequencing reads of hsalet-7a-5p, hsa-miR-17-5p, and hsa-miR-21-5p from miRBase 19, and selected reads having an RMP equal to or greater than 1. The miRanda-3.3 tool 36 was then used to predict the targets of miRNAs and their isomiRs against the 3′-UTR sequence of human coding RefSeqs. (The 3’UTR sequences were downloaded from assembly hg19 from the UCSC Human Genome Brower and were made non-redundant before use 37 .) Candidate targets were selected if binding with miRNA/isomiRs resulted in a Max Score > 140 and Max Energy (ΔG) < −10 kcal/mol 36 . For each miRNA, this threshold yielded thousands of candidate targets, which were further compared against experimentally validated target mRNAs compiled from mirTarBase, TarBase, miRecords, miR2Disease, and published literature 38-43 . Our miRanda analysis produced 26 out of 42 validated targets of hsa-let-7a-5p, 43 out of 56 validated targets of hsa-let-17-5p, and 51 out of 76 validated targets of hsa-let-21-5p. We found that nearly all validated hsa-miR-17-5p target mRNAs are recognized by several isomiRs (Figure S19). Similar results were obtained for hsa-let-7a-5p and hsa-miR-21-5p (Figure S20, S21). Furthermore, incorporating our new finding, we have predicted targets of miRNA which need to validate experimentally (Figure S20, S21). All available miRNA target prediction tools produce false positives. Nevertheless, we found that miRNA target binding in animals was similar to that in plants, suggesting that validated mRNAs are targeted by most members of isomiRs. Supplemental Discussion Our initial analysis demonstrated that there are no significant differences in generating sib-miRs from the 5p- or 3p-arm of a pre-miRNA (Figure S11). This suggests that both arms have similar Ahmed et al. 7 potential to generate mature sequences, and that arm-switching may be governed by features of the mature sequence, cellular conditions, or other unknown factors. We observed that sib-miR size is not distributed evenly, with 21-nt sib-miRs in greatest abundance (67%) (Figure S12). This result is consistent with earlier findings in Arabidopsis (~77%) 44 and (~80%) 45 . Our analysis revealed that sib-miRs of 22 and 24 nucleotides in length were the second and third most abundant species in Arabidopsis, respectively. Another study reported that sRNAs of 24 nucleotides in length were expressed most highly (83%) in Arabidopsis and mostly belonged to cis-acting siRNA 10. However, this study was conducted on a small dataset consisting of different classes of small RNAs including miRNA 10. Both terminal nucleotides of sRNAs do not participate in target site recognition, but play other important roles 46, 47 . For instance, sRNAs beginning with 5’ terminal Uridine, Adenine, Adenine and Cytosine generally facilitate its sorting into AGO1, AGO2, AGO4, and AGO5, respectively 11. Uridylation and adenylation at the 3’-end of miRNAs leads to destabilization and stabilization, respectively, thereby controlling miRNA decay in plants and animals 16-19, 48-51. Our data shows a high prevalence of U at the 5-end of sib-miRs, indicating its importance in making a guide strand (Figure S13A). Higher prevalence of Adenine and lower prevalence of Uridine at 3-end of sib-miRs may be acting as important role for stabilizing sib-miRs (Figure S13B). Interestingly, the 5- and 3’-ends of sib-miRs also display a higher propensity for certain dinucleotides. In addition, we found that most of the 2,709 unique sib-miRs have an affinity for sorting into multiple AGO proteins, with AGO1 being the most preferred (Figure S14). This result also supports other studies demonstrating that miRNA mostly sort into AGO1 in Arabidopsis 10, 23. The variations observed in sib-miR size have biological significance. Previous studies have indicated that Dicer acts like a ruler by recognizing the 3-end with its PAZ domain and cleaving the pre-miRNA at approximately 25 nucleotides from the 3-end with the RNase III domain 52-54. Therefore, one would expect the length of sib-miRs derived from pre-miRNA to be consistent, especially from the 3p-arm. However, variations in length were observed instead (i.e., 18-28 nucleotides from the 5p-arm, 18-27 nucleotides from the 3p-arm). In addition, we observed a clear pattern of nucleotide preference at both sib-miR terminals. Dicer possesses an alpha helix between the PAZ and RNase III domains, is enriched in basic amino acids, and interacts with pre-miRNA by electrostatic interactions 53 . Moreover, one study reported that affinity of PAZ Ahmed et al. 8 domain to interact with siRNA 3-end is different for different sequence patterns of 3-end 12 . sRNA with CA overhangs are processed more efficiently than those with an AC overhang 55. Our data support these findings due to the double prevalence of CA (7.24%) over AC at the 3terminal of the 3p-arm (Table S10). Studies indicate that the pre-miRNA structure and imprecise cleavage by Drosha and Dicer are primarily responsible for generating miRNAs of varying lengths in humans 56, 57. Studies also support that loop position in hairpin 58, and distance from 5end play critical role in the Dicer processing 59 . However, another possibility is that certain sequence patterns are more favored for cleavage by Dicer/DCL 10, 57 and alpha-helix of Dicer adjust to get more favorable cleavage pattern which leads to different lengths of sib-miRs. It may also be possible that terminal nucleotides of sib-miRs play other roles. For example, high affinity binding with different DCL domains may facilitate AGO loading or another functional role that has yet to be identified experimentally. The characteristic features of sib-miRs discovered in this study may be imparting a physiological advantage in the RNAi mechanism which needs further experimental validation. The features discovered in this study are more reliable and unbiased since the dataset examined was generated using the following stringent criteria: (1) inclusion of canonical miRNA and all sRNA generated from the miRNA precursor, (2) use of two normalization methods to negate the influence of the transcription rate of miRNA genes, and (3) examination of only small RNAs generated from non-redundant sets of miRNA genes. Therefore, we believe the features identified by our study can be used to discover new miRNA 10, 45 and to design more effective and specific siRNA/artificial miRNA 25, 60, 61. Our data also demonstrate a positive correlation between a conserved miRNA gene and its expression, which is consistent with previous findings 31-33 . The correlation data indicate that more highly conserved miRNA genes express more distinct sib-miRs (Figure S15) and with high copy number (Figure S16). Irrespective of evolution, we observed that highly expressed miRNA genes produce more distinct sib-miRs (Figure S17). In animals, different proteins, namely Drosha and Dicer, produce the terminal ends of miRNAs. However, DCL1 generates both ends in plants 35, 62 . Studies in animals have shown that miRNA precursors are more precisely processed at the 5’-end than the 3’-end irrespective of the miRNA source arm 34, 35 . Intriguingly, our study did not identify any significant differences in the precision of 5- and 3’-end processing in Arabidopsis. However, slightly greater 3’-end Ahmed et al. 9 heterogeneity of isomiRs was observed on the 3p-arm. Therefore, these observations indicate that DCL cleaves both hairpin arms with similar accuracy (Figures S17A and S17B). When we analyzed the effect of terminal heterogeneity of miRNA in human, we again observed that validated target mRNA has potential to bind with several miRNA/isomiRs. This result further strengthens the role of isomiRs in specific gene silencing and strongly favors the implementation of our novel method beyond plants. Ahmed et al. 10 References 1. Griffiths-Jones S, Hui JH, Marco A, Ronshaugen M. MicroRNA evolution by arm switching. EMBO reports 2011; 12:172-7. 2. Glazov EA, Cottee PA, Barris WC, Moore RJ, Dalrymple BP, Tizard ML. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome research 2008; 18:957-64. 3. Jagadeeswaran G, Zheng Y, Sumathipala N, Jiang H, Arrese EL, Soulages JL, et al. Deep sequencing of small RNA libraries reveals dynamic regulation of conserved and novel microRNAs and microRNA-stars during silkworm development. BMC genomics 2010; 11:52. 4. Marco A, Hui JH, Ronshaugen M, Griffiths-Jones S. Functional shifts in insect microRNA evolution. Genome biology and evolution 2010; 2:686-96. 5. Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of microRNAs on target mRNA expression. Proceedings of the National Academy of Sciences of the United States of America 2006; 103:2746-51. 6. Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. Identification of tissue-specific microRNAs from mouse. Current biology : CB 2002; 12:735-9. 7. Meng Y, Shao C, Wang H, Chen M. The regulatory activities of plant microRNAs: a more dynamic perspective. Plant physiology 2011; 157:1583-95. 8. Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, et al. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2004; 2:E104. 9. Vaucheret H. Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev 2006; 20:759-71. Ahmed et al. 10. 11 Wang X, Laurie JD, Liu T, Wentz J, Liu XS. Computational dissection of Arabidopsis smRNAome leads to discovery of novel microRNAs and short interfering RNAs associated with transcription start sites. Genomics 2011; 97:235-43. 11. Mi S, Cai T, Hu Y, Chen Y, Hodges E, Ni F, et al. Sorting of small RNAs into Arabidopsis Argonaute complexes is directed by the 5' terminal nucleotide. Cell 2008; 133:116 27. 12. Ma JB, Ye K, Patel DJ. Structural basis for overhang-specific small interfering RNA recognition by the PAZ domain. Nature 2004; 429:318-22. 13. Ahmed F, Ansari HR, Raghava GP. Prediction of guide strand of microRNAs from its sequence and secondary structure. BMC bioinformatics 2009; 10:105. 14. Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD. Asymmetry in the assembly of the RNAi enzyme complex. Cell 2003; 115:199-208. 15. Khvorova A, Reynolds A, Jayasena SD. Functional siRNAs and miRNAs exhibit strand bias. Cell 2003; 115:209-16. 16. Bail S, Swerdel M, Liu H, Jiao X, Goff LA, Hart RP, et al. Differential regulation of microRNA stability. RNA 2010; 16:1032-9. 17. Ruegger S, Grosshans H. MicroRNA turnover: when, how, and why. Trends in biochemical sciences 2012; 37:436-46. 18. Li J, Yang Z, Yu B, Liu J, Chen X. Methylation protects miRNAs and siRNAs from a 3'- end uridylation activity in Arabidopsis. Current biology : CB 2005; 15:1501-7. 19. Neilsen CT, Goodall GJ, Bracken CP. IsomiRs - the overlooked repertoire in the dynamic microRNAome. Trends in genetics : TIG 2012; 28:544-9. Ahmed et al. 20. 12 Mallory A, Vaucheret H. Form, function, and regulation of ARGONAUTE proteins. The Plant cell 2010; 22:3879-89. 21. Vaucheret H. Plant ARGONAUTES. Trends in plant science 2008; 13:350-8. 22. Kim VN. Sorting out small RNAs. Cell 2008; 133:25-6. 23. Mi S, Cai T, Hu Y, Chen Y, Hodges E, Ni F, et al. Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5' terminal nucleotide. Cell 2008; 133:11627. 24. Ahmed F, Kumar M, Raghava GP. Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies. In silico biology 2009; 9:135-48. 25. Ahmed F, Raghava GPS. Designing of Highly Effective Complementary and Mismatch siRNAs for Silencing a Gene. PLoS ONE 2011; 6:e23443. 26. Kaundal R, Raghava GP. RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 2009; 9:2324-42. 27. Kaundal R, Saini R, Zhao PX. Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis. Plant physiology 2010; 154:36-54. 28. Garg A, Bhasin M, Raghava GP. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 2005; 280:14427-32. 29. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS One 2007; 2:e219. Ahmed et al. 30. 13 Lu J, Shen Y, Wu Q, Kumar S, He B, Shi S, et al. The birth and death of microRNA genes in Drosophila. Nat Genet 2008; 40:351-5. 31. Roux J, Gonzalez-Porta M, Robinson-Rechavi M. Comparative analysis of human and mouse expression data illuminates tissue-specific evolutionary patterns of miRNAs. Nucleic acids research 2012; 40:5890-900. 32. Liang H, Li WH. Lowly expressed human microRNA genes evolve rapidly. Molecular biology and evolution 2009; 26:1195-8. 33. Ruby JG, Stark A, Johnston WK, Kellis M, Bartel DP, Lai EC. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res 2007; 17:1850-64. 34. Hu HY, Yan Z, Xu Y, Hu H, Menzel C, Zhou YH, et al. Sequence features associated with microRNA strand selection in humans and flies. BMC Genomics 2009; 10:413. 35. Seitz H, Ghildiyal M, Zamore PD. Argonaute loading improves the 5' precision of both MicroRNAs and their miRNA strands in flies. Curr Biol 2008; 18:147-51. 36. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. MicroRNA targets in Drosophila. Genome Biol 2003; 5:R1. 37. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res 2002; 12:996-1006. 38. Cloonan N, Brown MK, Steptoe AL, Wani S, Chan WL, Forrest AR, et al. The miR-17- 5p microRNA is a key regulator of the G1/S phase cell cycle transition. Genome biology 2008; 9:R127. Ahmed et al. 39. 14 Wang P, Ning S, Wang Q, Li R, Ye J, Zhao Z, et al. mirTarPri: improved prioritization of microRNA targets through incorporation of functional genomics data. PLoS ONE 2013; 8:e53685. 40. Hsu SD, Lin FM, Wu WY, Liang C, Huang WC, Chan WL, et al. miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res 2011; 39:D163-9. 41. Vergoulis T, Vlachos IS, Alexiou P, Georgakilas G, Maragkakis M, Reczko M, et al. TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucleic Acids Res 2012; 40:D222-9. 42. Jiang Q, Wang Y, Hao Y, Juan L, Teng M, Zhang X, et al. miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic acids research 2009; 37:D98-104. 43. Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res 2009; 37:D105-10. 44. Meng Y, Shao C, Gou L, Jin Y, Chen M. Construction of microRNA- and microRNA*- mediated regulatory networks in plants. RNA biology 2011; 8:1124-48. 45. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 2006; 20:3407-25. 46. Du Q, Thonberg H, Wang J, Wahlestedt C, Liang Z. A systematic analysis of the silencing effects of an active siRNA at all single-nucleotide mismatched target sites. Nucleic Acids Res 2005; 33:1671-7. Ahmed et al. 47. 15 Dahlgren C, Zhang HY, Du Q, Grahn M, Norstedt G, Wahlestedt C, et al. Analysis of siRNA specificity on targets with double-nucleotide mismatches. Nucleic Acids Res 2008; 36:e53. 48. Ren G, Chen X, Yu B. Uridylation of miRNAs by hen1 suppressor1 in Arabidopsis. Current biology : CB 2012; 22:695-700. 49. Zhao Y, Yu Y, Zhai J, Ramachandran V, Dinh TT, Meyers BC, et al. The Arabidopsis nucleotidyl transferase HESO1 uridylates unmethylated small RNAs to trigger their degradation. Current biology : CB 2012; 22:689-94. 50. Katoh T, Sakaguchi Y, Miyauchi K, Suzuki T, Kashiwabara S, Baba T. Selective stabilization of mammalian microRNAs by 3' adenylation mediated by the cytoplasmic poly(A) polymerase GLD-2. Genes & development 2009; 23:433-8. 51. Lu S, Sun YH, Chiang VL. Adenylation of plant miRNAs. Nucleic acids research 2009; 37:1878-85. 52. MacRae IJ, Zhou K, Doudna JA. Structural determinants of RNA recognition and cleavage by Dicer. Nat Struct Mol Biol 2007; 14:934-40. 53. Macrae IJ, Zhou K, Li F, Repic A, Brooks AN, Cande WZ, et al. Structural basis for double-stranded RNA processing by Dicer. Science 2006; 311:195-8. 54. Lima WF, Murray H, Nichols JG, Wu H, Sun H, Prakash TP, et al. Human Dicer binds short single-strand and double-strand RNA with high affinity and interacts with different regions of the nucleic acids. J Biol Chem 2009; 284:2535-48. 55. Vermeulen A, Behlen L, Reynolds A, Wolfson A, Marshall WS, Karpilow J, et al. The contributions of dsRNA structure to Dicer specificity and efficiency. RNA 2005; 11:674-82. Ahmed et al. 56. 16 Starega-Roslan J, Krol J, Koscianska E, Kozlowski P, Szlachcic WJ, Sobczak K, et al. Structural basis of microRNA length variety. Nucleic Acids Res 2011; 39:257-68. 57. Ahmed F, Kaundal R, Raghava GP. PHDcleav: a SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors. BMC bioinformatics 2013; 14 Suppl 14:S9. 58. Gu S, Jin L, Zhang Y, Huang Y, Zhang F, Valdmanis PN, et al. The loop position of shRNAs and pre-miRNAs is critical for the accuracy of dicer processing in vivo. Cell 2012; 151:900-11. 59. Park JE, Heo I, Tian Y, Simanshu DK, Chang H, Jee D, et al. Dicer recognizes the 5' end of RNA for efficient and accurate processing. Nature 2011; 475:201-5. 60. Tyagi A, Ahmed F, Thakur N, Sharma A, Raghava GPS, Kumar M. HIVsirDB: A Database of HIV Inhibiting siRNAs. PLoS ONE 2011; 6:e25917. 61. Senthil-Kumar M, Mysore KS. Caveat of RNAi in plants: the off-target effect. Methods in molecular biology 2011; 744:13-25. 62. Henderson IR, Zhang X, Lu C, Johnson L, Meyers BC, Green PJ, et al. Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nature genetics 2006; 38:721-5. Ahmed et al. 17 Figure Legends Figure S9: Number of sib-miRNAs generated from the 5p- and 3p-arm of each pre-miRNA. The median number of sib-miRNAs generated was 5 (mean = 10) from the 5p-arm and 5 (mean = 10.7) from the 3p-arm with no significant difference (Wilcoxon test p-value = 0.6162). Figure S10: Reads count of sib-miRNAs generated from the 5p- and 3p-arm of each premiRNA. A median of 463 (mean = 27,079) and 351 (mean = 64,073) reads was generated from the 5p- and 3p-arms, respectively, with no significant difference (Wilcoxon test p-value = 0.6332). Figure S11: Fraction of sib-miRs generated from the 5p- and 3p-arm of each pre-miRNA. The median fraction of mature sequence was 54.79% (mean = 51.28) for the 5p-arm and 45.21% (mean = 48.71%) for the 3p-arm with no statistical difference (Wilcoxon test p-value = 0.8107). Figure S12: Distribution of sib-miR sizes generated from a population of 132 pre-miRNAs. Figure S13: Frequency of nucleotide occurrences at the (A) 5’-end and the (B) 3’-end of sibmiRs. Figure S14: Sorting of unique sib-miRs into different AGO proteins. (A) Results from experimental validation of sorting 960 sib-miR sequences with AGO proteins. (B) sib-miR sequences (1,740) predicted to sort with AGO proteins. Figure S15: Correlation between conserved plant pre-miRNAs and the number of distinct sibmiRNAs produced (ρ = 0.31, p = 2.14e-4). The y-axis is presented in logarithmic scale. Figure S16: Correlation between conserved plant pre-miRNAs and the read counts (RPM) of sib-miRNAs produced (ρ = 0.49, p = 2.22e-9). The y-axis is presented in logarithmic scale. Ahmed et al. 18 Figure S17: Correlation between the number of sib-miRNAs generated from a pre-miRNA and their read count (RPM) in A. thaliana (ρ = 0.84, p = 1.13e-36). The x- and y-axes are presented in logarithmic scale. Figure S18: Cleavage accuracy of DCL on pre-miRNAs. (A) For the 5p-arm, the median heterogeneity at the 5’- and 3’-ends were 0.086 and 0.159, respectively (Wilcoxon test p-value = 0.3302). (B) For the 3p-arm, the median heterogeneity at the 5’- and 3’-ends were 0.0758559 and 0.1555154, respectively (Wilcoxon test p-value = 0.1998). Figure S19. Effect of terminal heterogeneity of isomiRs on hsa-miR-17-5p target genes. Green and black in the heatmap indicates the presence and absence of isomiR target, respectively. The canonical miRNA, hsa-miR-17-5p, is denoted by the red box while experimentally validated targets are listed in red. Figure S20. Effect of terminal heterogeneity of isomiRs on hsa-let-7a-5p target genes. Green and black in the heatmap indicates the presence and absence of isomiR target, respectively. The canonical miRNA, hsa-let-7a-5p, is denoted by the red box while experimentally validated targets are listed in red and putative targets are indicated in blue. Figure S21. Effect of terminal heterogeneity of isomiRs on hsa-miR-21-5p target genes. Green and black in the heatmap indicates the presence and absence of isomiR target, respectively. The canonical miRNA, hsa-miR-21-5p, is denoted by the red box while experimentally validated targets are listed in red and putative targets are indicated in blue.