Comments by Stefan (yellow) Glossary (green) Referral to other chapters, please do not change (pink) List of vendors (gray) Reagents added to the database (blue) Comparative genomics methods for the prediction of small RNA binding sites Rym Kachouri-Lafond1 and Mihaela Zavolan1 1Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland. e-mail? One page summary: Graphic overview QuickTime™ and a decompressor are needed to see this picture. This chapter presents a comparative genomics approach to the identification of binding sites for regulatory RNAs. 1. Abstract In the recent years new classes as well as new members of already known classes of small regulatory RNAs have been discovered. Examples are small nucleolar RNAs (snoRNAs) that act in ribonucleoprotein complexes, in which they guide modification or splicing of target transcripts, and microRNAs (miRNAs) that modulate the turnover rate and protein output of mRNAs. Although small RNAs recognize their targets through RNA-RNA hybridization, this interaction is constrained by protein factors within the ribonucleoprotein complexes, thus making target prediction challenging. Genomic regions that are functionally important for the organism, be they such as protein-coding genes, non-coding RNA genes or regulatory sites, are conserved over longer evolutionary distances compared to regions that do not carry functional elements. As whole-genome sequencing became available, comparison of genomic sequences of different species has become instrumental to the prediction and identification of functional elements. Here we describe comparative genomics approaches that we applied in order to discover binding sites of small regulatory RNAs (summarized in the graphic overview at the beginning of the chapter). In these approaches, we first define a model that describes the interaction between the regulatory RNA and its target. Then we identify the putative binding sites of the regulatory RNA genome- or transcriptome-wide in any given species of interest. We then examine the orthologous genomic regions from other species to determine whether the site is conserved. In parallel, we apply this procedure to randomized variants of the regulatory RNAs that have similar properties (e.g. nucleotide composition). To the extent to which the number of putative sites that are conserved across species is larger for a real regulatory RNA compared to its randomized variants, we can infer that i) the RNA-RNA interaction model that we have defined is appropriate for describing productive interactions of the small RNA of interest, and that ii) we can distinguish functional, meaningful targets of the small RNAs from similar sequences in the reference genome. 2. Theoretical background Initial efforts to clone and sequence small RNAs revealed numerous such molecules (see chapter 2 Meister). They are encoded in the genome, from which they are transcribed, processed, and incorporated into various ribonucleoprotein (RNP) complexes [1-7]. Deep sequencing studies revealed in fact that much of the genome is transcribed, generating numerous types of non-protein-coding RNAs. Moreover, even molecules that have been extensively studied such as the snoRNAs appear to give rise to processing products that may have acquired novel functions [8]. 2.1. SnoRNAs SnoRNAs (small nucleolar RNAs) are relatively short RNAs that accumulate either in the nucleolus or the Cajal bodies (for reviews see [9-11]). When located in the Cajal bodies they are called small Cajal bodies RNAs or scaRNAs. Based on their structural features, two major classes of snoRNAs have been defined: the C/D box and the H/ACA box snoRNAs (Figure 1). They associate with specific sets of proteins, distinct for each snoRNA class, to form snoRNP complexes (reviewed in [12]). For instance, C/D box snoRNPs contain the protein fibrillarin (Nop1p in yeast), which is thought to be the methyltransferase component of C/D snoRNPs [12]. A third structuredefined class, found only among the scaRNAs, contains both the C/D and the H/ACA) domains. SnoRNA size ranges in human from 50 to 235 nucleotides for C/D snoRNAs, 120 to 250 for H/ACA snoRNAs, and 80 to 550 for scaRNAs (see also the database of [13]). In vertebrates, most of the snoRNAs are cotranscribed as intronic sequences in host precursor mRNAs, and only few are independently transcribed by RNA polymerase II (reviewed in [9, 14]). Most is this true? I thought that most of them are orphan, i.e they have no target of the snoRNAs base-pair with ribosomal RNA sequences (rRNAs), guiding either their methylation (C/D snoRNAs) or pseudouridylation (H/ACA snoRNAs). The U3 C/D box snoRNA guides instead pre-rRNA cleavage. Ribosomal rRNAs are not the only substrates of snoRNA-guided modifications. Other substrates of snoRNA-guided methylation and/or pseudouridylation include the small nuclear RNAs (snRNAs) U1, U2, U4, U5, U6 and U12 in vertebrates (summarized in database [13]), U2 and U5 in plants (database introduced in [16]), U2 snRNA in yeast ([17] and [18]). In addition, tRNAs in archeae are processed with the help of snoRNAs [19, 20], which have been termed in this context sRNAs (sno-like RNAs). Although hundreds of sequences that have the hallmarks of snoRNAs have been cloned and sequenced, for many of them no apparent target could be readily identified (see may comment above). For this reason, these snoRNAs were called “orphan” snoRNAs [1, 21-27]. Finding their targets is of great interest, particularly because the antisense boxes, which in prototypical snoRNAs are involved in rRNA recognition, are conserved in many of these snoRNAs, suggesting that they have a biological function. It has recently been determined that among these “orphan” snoRNAs, the HBII-52 (also called SNORD115) targets in fact an mRNA, namely the mRNA encoding the serotonin receptor 2C [21]. The effect of the snoRNA-mRNA interaction has been reported to be either alternative splicing [28] or editing [29]. Together with the HBII-85 C/D box snoRNA (also called SNORD116), which also occurs in multiple copies in the genome, HBII-52 is part of a region of chromosome 15 that appears to be deleted in the Prader-Willi syndrome (PWS) [21, 22, 28, 30]. Consistent with the neurological phenotype of this genetic syndrome, the two families of snoRNAs have brain-specific expression. Moreover, they are both expressed from the paternal chromosome, whereas the corresponding allelic region in the maternal chromosome is silenced due to genomic imprinting [31]. There is of course now the follow up on this: Kishore, S., Khanna, A., Zhang, Z., Hui, J., Balwierz, P., Stefan, M., Beach, C., Nicholls, R.D., Zavolan, M. and Stamm, S. (2010) The snoRNA MBII-52 (SNORD 115) is processed into smaller RNAs and regulates alternative splicing. Hum Mol Genet, in press. The first predictions of snoRNA-rRNA interactions were part of a reverseengineering approach for snoRNA identification in the yeast genome that started from known modified rRNA nucleotides [32, 33]. Following the study of Cavaille and Bachellerie [34] it was generally assumed that rRNA-snoRNA interactions require 10 to 21 nucleotides of complementarity (Watson-Crick or G-U pairing), see chapter 1 Baralle T between the antisense box of the snoRNA and the rRNA target. The rRNA nucleotide that pairs with the 5 th snoRNA nucleotide upstream of the D (or D’) box undergoes methylation. Following the discovery of “orphan” snoRNAs and the reports that they may target mRNAs, some attempts have been made to predict such mRNA targets. snoTARGET [35] is a tool that was developed for this purpose. The criteria defining the putative targets sites were taken from previous studies that analyzed snoRNA/target interactions (rRNAs and snRNAs) [36, 37]. That is, the duplex length was required to be between 9 and 20 nucleotides, a maximum of three G-U pairs and a single mismatch, located either at position 2 or at a position >11 in the antisense box were allowed. The positions that appear to tolerate mismatches were inferred in a study of Chen et al. [37], in which more than 400 known rRNA and C/D snoRNA complementary sequences were analyzed. The first position was defined to be the first nucleotide after the D or D’ box and it did not participate in base-pairing interactions. The application of snoTARGET allowed the authors to infer that predicted target sites of HBII-85 snoRNAs are enriched near exons and preferentially in alternatively spliced genes, supporting a proposed role of these snoRNAs in alternative splicing. 2.2. MiRNAs MiRNAs (micro RNAs) are genome-encoded 21-23 nucleotides-long RNAs. They are transcribed by polymerase II [38] either from independent genes or as part of the introns of protein-coding transcripts. Within the primary transcripts, the miRNAs are part of hairpin structures that are recognized and processed by the Drosha/DGCR8 complex to release individual pre-miRNAs [39-42], which are then transported by exportin 5 [43] to the cytoplasm. Here, the Dicer/TRBP (what does TRBP stand for) complex releases from the premiRNAs the 21-23 nt-long duplexes, from which usually only one strand is incorporated into an miRNA-induced silencing complex (miRISC). This complex, in which the miRNA acts as a guide, contains additionally an Argonaute protein, and binds to target mRNAs to induce translational repression, deadenylation and degradation of the mRNA target [44, 45]. (see chapter 2, meister) Structural studies revealed that the miRNA 5' end is anchored in the MID domain (what does MID stand for) of the Argonaute protein, and that the 5' half of the miRNA is accessible and in a relatively rigid conformation for target recognition. This explains numerous previous observations of the 5' end of the miRNAs being most important for target recognition [46-51]. What about that in siRNA, the PAZ domain contacts the 3’end of the RNA, is there a similar arrangement in miRNAs? Although the principles of miRNA-target recognition are only partially understood, many methods for predicting targets have already been proposed. The variables that are taken into consideration range from evolutionary conservation [50, 52-55], the sequence composition of the environment of the putative target site, its location within the 3’UTR, the basepairing pattern in the 3' region of the miRNA [56], the structural accessibility of the target site and the energy of interaction between miRNA and target [57, 58]. High-throughput measurements of mRNA [56, 59-62] or protein [61, 62] levels upon miRNA transfection enable one to evaluate the performance of prediction methods. Recent studies [63, 64] indicate that for miRNAs, comparative genomics-based target prediction methods with very few assumptions perform remarkably well in comparison to methods that aim to capture mechanistic information. The aim of comparative genomic analyses is to identify regions that are conserved in evolution, indicating that they are under selective constraint. For instance, Xie et al. [65] and Lewis et al. [50] predicted targets of miRNAs by identifying miRNA-complementary 3'UTR segments that were conserved among a chosen set of species. To be able to handle the continuously growing whole genome sequence data, various authors proposed methods for quantifying the degree of conservation of putative target sites [54, 55] or the probability that a putative target site has been under selection [53]. 3. Protocol In order to predict targets of non-coding RNAs, be they snoRNAs or miRNAs, we first need to define a model of interaction between the small RNA and the target. Because only one example of a snoRNA-mRNA interaction is known [28], in predicting mRNA targets of orphan snoRNAs we were guided ourselves by principles of snoRNA-rRNA interaction. Cavaille and Bachelerie [34] showed that snoRNA-rRNA duplexes are usually 10-21 bp, with an average of 12-13 bp. A more recent study that included 415 rRNA and C/D snoRNA complementary sequences from animals, plants and yeasts showed that the constraint on duplex length can be further relaxes to 7-24 bp [37]. Thermodynamic stability of the duplex appears to play an important role, because short duplexes are always GC-rich, whereas long duplexes appear to tolerate G-U wobble and non-canonical base pairs, particularly when the duplex is GC rich [34]. We therefore defined a putative snoRNA target site as either a genomic region that has perfect complementarity to at least 10 contiguous nucleotides in the snoRNA antisense box, or a genomic region that can form a stable hybrid (free energy of hybridization lower than -15 kcal/mol) with the antisense box of the snoRNA. We performed pattern matching to identify regions of at least 10 nucleotides perfect complementarity, and we applied RNAhybrid [66] can you give the url for all programs used in a table to predict stable hybrids. We imposed additional constraints on the hybrids, as suggested by the results of Cavaille and Bachelerie [34]. Namely, we only allowed a maximum of two bulged nucleotides in the snoRNA and/or the target sequence. The final step of our snoRNA target prediction in human was to extract the orthologous regions in rhesus maccaque, mouse, cow and dog and to determine whether they also contain putative snoRNA target sites as defined above. The final set of predictions included only putative target sites that were conserved in all these species. Because we were interested in the potential involvement of the snoRNAs in alternative splicing, we intersected the set of predictions with the loci of protein-coding genes. For miRNAs we considered as models of interaction perfect complementarity between the target and the 1-8, 1-7 or 2-8 nucleotides of the miRNA. We further expanded on the relatively simple model described above, developing a Bayesian model can you explain this here and also add to the glossary to quantify the selection pressure on sites with particular patterns of conservation across species. This enables us to incorporate information from any number of species located at arbitrary evolutionary distances from each other, appropriately weighing conservation between species that are close in evolutionary distance and between species that are farther apart. Given a putative site in the species of interest, we are interested to know the probability that the site has been under evolutionary selection. The genome sequence data however, only enables us to determine whether the site has been conserved in other species, and it is of course more likely that the site is conserved in closely related species relative to more distantly related species. We thus set to estimate the probability P(s | c ) that a site with an observed pattern c of conservation across species has a pattern of selection s in these species. A conservation pattern c is defined as a vector of 0’s and 1’s, with 1 meaning that the site is present in a given species and can form the same base pairs with the miRNA as in the reference species, and 0 meaning that any of these two conditions does not hold. Similarly, a selection pattern s is defined as a vector of +’s and –‘s, with a ‘+’ denoting that the site is under selection on the branch of the evolutionary tree leading to the given species, and ‘–‘ meaning the opposite. The vector s having only ‘–‘ entries corresponds to the situation in which the site has not been under selection in any of the considered species, whatever its conservation pattern is. If we know in which species a site is under selection, we can infer the probabilities for observing any of the possible conservation patterns. These are simply P(c | bg) . That is, we identify all the conservation given by P(c | s ) c C(s ) P(c | bg) patterns C(s ) that are consistent with the chosen selection pattern s , and among these we compute the relative probability of conservation pattern c . By a conservation pattern that is consistent with a selection pattern we mean that the site is conserved in all the species in which it is under selection, but in species in which it is not under selection it may or may notbe conserved. the Conservation in species in which the site is not under selection is a chance occurrence whose probability we can estimate as follows. Ideally we would need 7- and 8-mer sequences that are evolving neutrally, i.e. are under no particular selection. Because in reality we do not know what parts of the transcripts are evolving neutrally, we rather estimate the probability to observe each possible conservation pattern over all possible 7- and 8-mers, only a small fraction of which corresponds to miRNA-complementary 7- or 8-mers. What we are interested in however is not the probability of a conservation pattern given a selection pattern, but rather the probability of a selection pattern given the observed conservation patterns. From Bayes’ theorem, can you explain this here and also add to the glossary we have that P(c | s )P(s ) P(c | s )P(s ) , where P(s ) is the prior probability of P(s | c ) P(c ) s P(c | s ) selection pattern s . We estimate these priors by maximizing the likelihood of the conservation patterns. That is, the likelihood of the conservation data is n(c ) given by L P(c ) , where n(c ) is the number of times conservation c pattern c has been observed in the data, and P(c ) P(c | s )P(s ) , the s S probability for conservation pattern c , is given in terms of a quantity which be computed as described above, P(c | s ) , and the prior probabilities can over selection patterns, P(s ) . The probabilities over selection patterns are the parameters that we need to optimize. The number of these parameters grows exponentially with the number of species, i.e. as 2g1, where g is the number of species that we take into consideration (including the reference species for which we want to predict sites). It is clear that as the number of fully sequenced genomes that we use in our inference grows, the number of parameters that we would have to estimate would quickly become too large. To circumvent this problem, we used the following approximation. We computed the probability of a selection pattern as a product of the selection patterns at every node in the evolutionary tree. With a further assumption that sites can only be lost in evolution, at each node in the tree we can have one of three situations: the site is under selection only along the left branch, only along the right branch, or in both branches originating at the node. Why do you rule out that sites are created? In the branch linking the reference species with the rest of the evolutionary tree we have an additional probability that the site in under selection. Finally, we take into consideration the pattern of conservation of the miRNAs, reasoning that if a miRNA is absent in a species, regions that are complementary to the miRNA in this species cannot be under In the end, selection. A complete discussion of this model is presented in [53]. for each miRNA-complementary site in the reference species we estimate the probability that the site is under selection in any other of the considered P(c | bg,)(1 ) species as P(functional | c ) P(s | c ) 1 . Here the P(c | s )P(s ) s S vector denotes absence of selection in all the considered species. Can you summarize this is a bullet-style list, step 1, 2 3, etc as a condensed Can you also put this protocol on the eurasnet site? protocol. What kind of controls do you use for your model? 4. Example of an experiment With the methods described above we predicted targets for all miRNAs in human, mouse, rat, fish, fruitfly and worm, and targets of the orphan snoRNA HBII-52 in human mRNAs. In the case of HBII-52, we obtained 222 predicted target sites located within or at most 200 nt nucleotides away from a known human exon. Can you also indicate the web site that was used in the paper? Many of these predictions were tested experimentally (Stamm group, submitted 1. Kishore, S., Khanna, A., Zhang, Z., Hui, J., Balwierz, P., Stefan, M., Beach, C., Nicholls, R.D., Zavolan, M. and Stamm, S. (2010) The snoRNA MBII-52 (SNORD 115) is processed into smaller RNAs and regulates alternative splicing. Hum Mol Genet, in press.) by the transfection of neuronal cells (Neuro2A) with MBII-52 (the mouse ortholog of HBII-52). RT-PCRs of the isolated RNA revealed the splicing pattern of each of the candidate gene. As a control, a mutant MBII-52 construct with a scrambled antisense box was used. Based on these experiments, the Stamm group identified five additional targets, whose splicing pattern is affected by MBII-52 (Stamm group, submitted). Returning to the question of evidence of evolutionary selection on MBII-52-complementary sites, we found that applying the same algorithm to randomized sequences with the same dinucleotide composition as the real snoRNAs does not yield larger numbers of conserved putative target sites. This indicates that we have not yet captured the relevant determinants of functional snoRNA-mRNA interactions and that additional work will be needed to uncover these determinants. Alternatively, the number of targets of orphan snoRNAs may be too small to yield a statistical signal when compared to predicted targets of randomized snoRNA sequences. We could show that the snoRNAs are actually processed, which would leave other parts of the molecule open for pairing, not just the antisense box, is this a possibility? On the other hand, many miRNAs have hundreds of strongly conserved target sites, many more than could be expected by chance. This indicates that the miRNA 5’ end is indeed a very important determinant of target recognition and that miRNAs are part of vast regulatory networks. Target predictions of all human, mouse, rat, fish, fruitfly and worm miRNAs with the Bayesian method described above can be found at www.mirz.unibas.ch. Although only 3’UTRs were taken into consideration for miRNA target prediction, the method can be equally well applied to other transcript or genomic regions (5’UTRs, CDS, promoters, etc.), provided that the probabilities of “chance conservation” patterns are estimated from the region of interest. To make it less abstract, can you provide some screenshots as a figure for this experiment? 5. Troubleshooting For many miRNAs the number of target sites appears to be in the range of hundreds, and particularly among conserved sites, miRNA-complementary sites clearly outnumber the sites of similar length that are not complementary to miRNAs. Thus, we can be relatively confident in such predictions. This is however, not true for all miRNAs. Perhaps the most notorious case is the lsy6 miRNA in the worm. This miRNA is involved in the establishment of left-right asymmetry in the ASE taste receptor sensory neurons, and it has only one known target, the cog-1 (Connection of Gonad defective family member 1) transcript [67]. Although the predicted lsy-6 sites in the cog-1 transcript are completely conserved in worms, they are assigned very low posterior probabilities because lsy-6 complementary site cannot be distinguished from other 8-mers that are conserved without being under selection. Thus, if a site is assigned a low probability, it does not necessarily mean that it is not functional. It can also mean that the corresponding small RNA has a low number of sites in the transcriptome, with the number of conserved instances being just as high as we would predict for randomized variants of the small RNA. We would also obtain low posterior probabilities for the sites if the small RNA-target interaction model were inaccurate. For these reasons, predicting small RNA interaction sites using comparative genomics methods needs to build onto the knowledge about the relevant determinants of small RNA– target interaction that is accumulated through experiments or other computational analyses. Figure legend Figure 1. A. Secondary structure consensus of C/D snoRNAs. The classical drawing of helices with canonical as well as other isosteric base pairs that form the K-turn follows [15]. The C’ and D’ boxes that are not always conserved (by contrast to the C and D boxes) are shown in lowercase letters and dashed base pair symbols (forming preferentially a K-loop instead of a K- turn). B. 2’-O-ribose methylation reaction catalyzed by C/D snoRNPs. C. Secondary structure consensus of H/ACA snoRNAs. D. Pseudouridylation reaction catalyzed by H/ACA snoRNPs: a uridine (U) base is isomerized into a pseudouridine (). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Huttenhofer, A., Kiefmann, M., Meier-Ewert, S., O'Brien, J., Lehrach, H., Bachellerie, J. P., and Brosius, J. (2001). RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J 20, 2943-2953. Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858-862. Lee, R. C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862-864. Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of novel genes coding for small expressed RNAs. Science 294, 853-858. Aravin, A., et al. (2006). A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442, 203-207. Girard, A., Sachidanandam, R., Hannon, G. J., and Carmell, M. A. (2006). A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 442, 199-202. Lau, N. C., Seto, A. G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T., Bartel, D. P., and Kingston, R. E. (2006). Characterization of the piRNA complex from rat testes. Science 313, 363-367. Ender, C., Krek, A., Friedlander, M. R., Beitzinger, M., Weinmann, L., Chen, W., Pfeffer, S., Rajewsky, N., and Meister, G. (2008). A human snoRNA with microRNA-like functions. Mol Cell 32, 519-528. Kiss, T. (2002). Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell 109, 145-148. Bachellerie, J. P., Cavaille, J., and Huttenhofer, A. (2002). The expanding snoRNA world. Biochimie 84, 775-790. Matera, A. G., Terns, R. M., and Terns, M. P. (2007). Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs. Nat Rev Mol Cell Biol 8, 209-220. Reichow, S. L., Hamma, T., Ferre-D'Amare, A. R., and Varani, G. (2007). The structure and function of small nucleolar ribonucleoproteins. Nucleic Acids Res 35, 1452-1464. Lestrade, L., and Weber, M. J. (2006). snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34, D158-62. Dieci, G., Preti, M., and Montanini, B. (2009). Eukaryotic snoRNAs: a paradigm for gene expression flexibility. Genomics 94, 83-88. Leontis, N. B., and Westhof, E. (2001). Geometric nomenclature and classification of RNA base pairs. RNA 7, 499-512. Brown, J. W., Echeverria, M., Qu, L. H., Lowe, T. M., Bachellerie, J. P., Huttenhofer, A., Kastenmayer, J. P., Green, P. J., Shaw, P., and Marshall, D. F. (2003). Plant snoRNA database. Nucleic Acids Res 31, 432-435. 17. Ma, X., Yang, C., Alexandrov, A., Grayhack, E. J., Behm-Ansmant, I., and Yu, Y. T. (2005). Pseudouridylation of yeast U2 snRNA is catalyzed by either an RNA-guided or RNA-independent mechanism. EMBO J 24, 2403-2413. 18. Piekna-Przybylska, D., Decatur, W. A., and Fournier, M. J. (2007). New bioinformatic tools for analysis of nucleotide modifications in eukaryotic rRNA. RNA 13, 305-312. 19. Omer, A. D., Lowe, T. M., Russell, A. G., Ebhardt, H., Eddy, S. R., and Dennis, P. P. (2000). Homologs of small nucleolar RNAs in Archaea. Science 288, 517-522. 20. Clouet d'Orval, B., Bortolin, M. L., Gaspin, C., and Bachellerie, J. P. (2001). Box C/D RNA guides for the ribose methylation of archaeal tRNAs. The tRNATrp intron guides the formation of two ribosemethylated nucleosides in the mature tRNATrp. Nucleic Acids Res 29, 4518-4529. 21. Cavaille, J., Buiting, K., Kiefmann, M., Lalande, M., Brannan, C. I., Horsthemke, B., Bachellerie, J. P., Brosius, J., and Huttenhofer, A. (2000). Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc Natl Acad Sci U S A 97, 14311-14316. 22. Cavaille, J., Vitali, P., Basyuk, E., Huttenhofer, A., and Bachellerie, J. P. (2001). A novel brain-specific box C/D small nucleolar RNA processed from tandemly repeated introns of a noncoding RNA gene in rats. J Biol Chem 276, 26374-26383. 23. Cavaille, J., Seitz, H., Paulsen, M., Ferguson-Smith, A. C., and Bachellerie, J. P. (2002). Identification of tandemly-repeated C/D snoRNA genes at the imprinted human 14q32 domain reminiscent of those at the Prader-Willi/Angelman syndrome region. Hum Mol Genet 11, 1527-1538. 24. Vitali, P., Royo, H., Seitz, H., Bachellerie, J. P., Huttenhofer, A., and Cavaille, J. (2003). Identification of 13 novel human modification guide RNAs. Nucleic Acids Res 31, 6543-6551. 25. Kiss, A. M., Jady, B. E., Bertrand, E., and Kiss, T. (2004). Human box H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol 24, 57975807. 26. Fedorov, A., Stombaugh, J., Harr, M. W., Yu, S., Nasalean, L., and Shepelev, V. (2005). Computer identification of snoRNA genes using a Mammalian Orthologous Intron Database. Nucleic Acids Res 33, 45784583. 27. Yang, J. H., Zhang, X. C., Huang, Z. P., Zhou, H., Huang, M. B., Zhang, S., Chen, Y. Q., and Qu, L. H. (2006). snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res 34, 5112-5123. 28. Kishore, S., and Stamm, S. (2006). The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science 311, 230-232. 29. Vitali, P., Basyuk, E., Le Meur, E., Bertrand, E., Muscatelli, F., Cavaille, J., and Huttenhofer, A. (2005). ADAR2-mediated editing of RNA substrates in the nucleolus is inhibited by C/D small nucleolar RNAs. J Cell Biol 169, 745-753. 30. Sahoo, T., del Gaudio, D., German, J. R., Shinawi, M., Peters, S. U., Person, R. E., Garnica, A., Cheung, S. W., and Beaudet, A. L. (2008). Prader-Willi phenotype caused by paternal deficiency for the HBII-85 C/D box small nucleolar RNA cluster. Nat Genet 40, 719-721. 31. Horsthemke, B., and Wagstaff, J. (2008). Mechanisms of imprinting of the Prader-Willi/Angelman region. Am J Med Genet A 146A, 2041-2052. 32. Lowe, T. M., and Eddy, S. R. (1999). A computational screen for methylation guide snoRNAs in yeast. Science 283, 1168-1171. 33. Wood, V., et al. (2002). The genome sequence of Schizosaccharomyces pombe. Nature 415, 871-880. 34. Cavaille, J., and Bachellerie, J. P. (1998). SnoRNA-guided ribose methylation of rRNA: structural features of the guide RNA duplex influencing the extent of the reaction. Nucleic Acids Res 26, 1576-1587. 35. Bazeley, P. S., Shepelev, V., Talebizadeh, Z., Butler, M. G., Fedorova, L., Filatov, V., and Fedorov, A. (2008). snoTARGET shows that human orphan snoRNA targets locate close to alternative splice junctions. Gene 408, 172-179. 36. Huttenhofer, A., Cavaille, J., and Bachellerie, J. P. (2004). Experimental RNomics: a global approach to identifying small nuclear RNAs and their targets in different model organisms. Methods Mol Biol 265, 409-428. 37. Chen, C. L., Perasso, R., Qu, L. H., and Amar, L. (2007). Exploration of pairing constraints identifies a 9 base-pair core within box C/D snoRNArRNA duplexes. J Mol Biol 369, 771-783. 38. Lee, Y., Kim, M., Han, J., Yeom, K. H., Lee, S., Baek, S. H., and Kim, V. N. (2004). MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051-4060. 39. Lee, Y., et al. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature 425, 415-419. 40. Yeom, K. H., Lee, Y., Han, J., Suh, M. R., and Kim, V. N. (2006). Characterization of DGCR8/Pasha, the essential cofactor for Drosha in primary miRNA processing. Nucleic Acids Res 34, 4622-4629. 41. Han, J., Lee, Y., Yeom, K. H., Nam, J. W., Heo, I., Rhee, J. K., Sohn, S. Y., Cho, Y., Zhang, B. T., and Kim, V. N. (2006). Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887-901. 42. Han, J., Lee, Y., Yeom, K. H., Kim, Y. K., Jin, H., and Kim, V. N. (2004). The Drosha-DGCR8 complex in primary microRNA processing. Genes Dev 18, 3016-3027. 43. Lund, E., Guttinger, S., Calado, A., Dahlberg, J. E., and Kutay, U. (2004). Nuclear export of microRNA precursors. Science 303, 95-98. 44. Eulalio, A., Huntzinger, E., Nishihara, T., Rehwinkel, J., Fauser, M., and Izaurralde, E. (2009). Deadenylation is a widespread effect of miRNA regulation. RNA 15, 21-32. 45. Fabian, M. R., et al. (2009). Mammalian miRNA RISC recruits CAF1 and PABP to affect PABP-dependent deadenylation. Mol Cell 35, 868-880. 46. Lai, E. C. (2002). Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation. Nat Genet 30, 363-364. 47. Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell 115, 787-798. 48. Doench, J. G., and Sharp, P. A. (2004). Specificity of microRNA target selection in translational repression. Genes Dev 18, 504-511. 49. Rajewsky, N., and Socci, N. D. (2004). Computational identification of microRNA targets. Dev Biol 267, 529-535. 50. Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15-20. 51. Brennecke, J., Stark, A., Russell, R. B., and Cohen, S. M. (2005). Principles of microRNA-target recognition. PLoS Biol 3, e85. 52. Krek, A., et al. (2005). Combinatorial microRNA target predictions. Nat Genet 37, 495-500. 53. Gaidatzis, D., van Nimwegen, E., Hausser, J., and Zavolan, M. (2007). Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC Bioinformatics 8, 69. 54. Kheradpour, P., Stark, A., Roy, S., and Kellis, M. (2007). Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res 17, 1919-1931. 55. Friedman, R. C., Farh, K. K., Burge, C. B., and Bartel, D. P. (2009). Most mammalian mRNAs are conserved targets of microRNAs. Genome Res 19, 92-105. 56. Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P., and Bartel, D. P. (2007). MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol Cell 27, 91-105. 57. Long, D., Lee, R., Williams, P., Chan, C. Y., Ambros, V., and Ding, Y. (2007). Potent effect of target structure on microRNA function. Nat Struct Mol Biol 14, 287-294. 58. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., and Segal, E. (2007). The role of site accessibility in microRNA target recognition. Nat Genet 39, 1278-1284. 59. Lim, L. P., Lau, N. C., Garrett-Engele, P., Grimson, A., Schelter, J. M., Castle, J., Bartel, D. P., Linsley, P. S., and Johnson, J. M. (2005). Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433, 769-773. 60. Linsley, P. S., et al. (2007). Transcripts targeted by the microRNA-16 family cooperatively regulate cell cycle progression. Mol Cell Biol 27, 2240-2252. 61. Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. (2008). Widespread changes in protein synthesis induced by microRNAs. Nature 455, 58-63. 62. Baek, D., Villen, J., Shin, C., Camargo, F. D., Gygi, S. P., and Bartel, D. P. (2008). The impact of microRNAs on protein output. Nature 455, 6471. 63. Hausser, J., Landthaler, M., Jaskiewicz, L., Gaidatzis, D., and Zavolan, M. (2009). Relative contribution of sequence and structure features to the mRNA binding of Argonaute/EIF2C-miRNA complexes and the degradation of miRNA targets. Genome Res 64. Bartel, D. P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 215-233. 65. Xie, X., Lu, J., Kulbokas, E. J., Golub, T. R., Mootha, V., Lindblad-Toh, K., Lander, E. S., and Kellis, M. (2005). Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345. 66. Rehmsmeier, M., Steffen, P., Hochsmann, M., and Giegerich, R. (2004). Fast and effective prediction of microRNA/target duplexes. RNA 10, 1507-1517. 67. Johnston, R. J., and Hobert, O. (2003). A microRNA controlling left/right neuronal asymmetry in Caenorhabditis elegans. Nature 426, 845-849. Bayesian model Bayes’ theorem prior probability posterior probabilities zavolan zavolan zavolan zavolan Abbreviations cog-1 K-turn miRNA mRNA nt PWS RNase RNP rRNA scaRNA snoRNA snoRNP snRNA sRNA tRNA U Connection of Gonad defective family member 1 Kink-turn microRNA messenger RNA nucleotides Prader-Willi syndrome Pseudouridine Ribonuclease Ribonucleoprotein ribosomal RNA small Cajal bodies RNA small nucleolar RNA small nucleolar ribonucleoprotein particle small nuclear RNA sno-like RNA transfer RNA Uridine