Comparative genomics methods for the prediction of small RNA

advertisement
Comments by Stefan (yellow)
Glossary (green)
Referral to other chapters, please do not change (pink)
List of vendors (gray)
Reagents added to the database (blue)
Comparative genomics methods for the prediction of small RNA binding
sites
Rym Kachouri-Lafond1 and Mihaela Zavolan1
1Biozentrum,
University of Basel and Swiss Institute of Bioinformatics,
Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.
e-mail?
One page summary:
Graphic overview
QuickTime™ and a
decompressor
are needed to see this picture.
This chapter presents a comparative genomics approach to the identification
of binding sites for regulatory RNAs.
1. Abstract
In the recent years new classes as well as new members of already known
classes of small regulatory RNAs have been discovered. Examples are small
nucleolar RNAs (snoRNAs) that act in ribonucleoprotein complexes, in which
they guide modification or splicing of target transcripts, and microRNAs
(miRNAs) that modulate the turnover rate and protein output of mRNAs.
Although small RNAs recognize their targets through RNA-RNA hybridization,
this interaction is constrained by protein factors within the ribonucleoprotein
complexes, thus making target prediction challenging.
Genomic regions that are functionally important for the organism, be they
such as protein-coding genes, non-coding RNA genes or regulatory sites, are
conserved over longer evolutionary distances compared to regions that do not
carry functional elements. As whole-genome sequencing became available,
comparison of genomic sequences of different species has become
instrumental to the prediction and identification of functional elements. Here
we describe comparative genomics approaches that we applied in order to
discover binding sites of small regulatory RNAs (summarized in the graphic
overview at the beginning of the chapter). In these approaches, we first define
a model that describes the interaction between the regulatory RNA and its
target. Then we identify the putative binding sites of the regulatory RNA
genome- or transcriptome-wide in any given species of interest. We then
examine the orthologous genomic regions from other species to determine
whether the site is conserved. In parallel, we apply this procedure to
randomized variants of the regulatory RNAs that have similar properties (e.g.
nucleotide composition). To the extent to which the number of putative sites
that are conserved across species is larger for a real regulatory RNA
compared to its randomized variants, we can infer that i) the RNA-RNA
interaction model that we have defined is appropriate for describing
productive interactions of the small RNA of interest, and that ii) we can
distinguish functional, meaningful targets of the small RNAs from similar
sequences in the reference genome.
2. Theoretical background
Initial efforts to clone and sequence small RNAs revealed numerous such
molecules (see chapter 2 Meister). They are encoded in the genome, from
which they are transcribed, processed, and incorporated into various
ribonucleoprotein (RNP) complexes [1-7]. Deep sequencing studies revealed
in fact that much of the genome is transcribed, generating numerous types of
non-protein-coding RNAs. Moreover, even molecules that have been
extensively studied such as the snoRNAs appear to give rise to processing
products that may have acquired novel functions [8].
2.1. SnoRNAs
SnoRNAs (small nucleolar RNAs) are relatively short RNAs that accumulate
either in the nucleolus or the Cajal bodies (for reviews see [9-11]). When
located in the Cajal bodies they are called small Cajal bodies RNAs or
scaRNAs. Based on their structural features, two major classes of snoRNAs
have been defined: the C/D box and the H/ACA box snoRNAs (Figure 1).
They associate with specific sets of proteins, distinct for each snoRNA class,
to form snoRNP complexes (reviewed in [12]). For instance, C/D box
snoRNPs contain the protein fibrillarin (Nop1p in yeast), which is thought to be
the methyltransferase component of C/D snoRNPs [12]. A third structuredefined class, found only among the scaRNAs, contains both the C/D and the
H/ACA) domains. SnoRNA size ranges in human from 50 to 235 nucleotides
for C/D snoRNAs, 120 to 250 for H/ACA snoRNAs, and 80 to 550 for
scaRNAs (see also the database of [13]). In vertebrates, most of the
snoRNAs are cotranscribed as intronic sequences in host precursor mRNAs,
and only few are independently transcribed by RNA polymerase II (reviewed
in [9, 14]).
Most is this true? I thought that most of them are orphan, i.e they have no
target of the snoRNAs base-pair with ribosomal RNA sequences (rRNAs),
guiding either their methylation (C/D snoRNAs) or pseudouridylation (H/ACA
snoRNAs). The U3 C/D box snoRNA guides instead pre-rRNA cleavage.
Ribosomal rRNAs are not the only substrates of snoRNA-guided
modifications. Other substrates of snoRNA-guided methylation and/or
pseudouridylation include the small nuclear RNAs (snRNAs) U1, U2, U4, U5,
U6 and U12 in vertebrates (summarized in database [13]), U2 and U5 in
plants (database introduced in [16]), U2 snRNA in yeast ([17] and [18]). In
addition, tRNAs in archeae are processed with the help of snoRNAs [19, 20],
which have been termed in this context sRNAs (sno-like RNAs).
Although hundreds of sequences that have the hallmarks of snoRNAs have
been cloned and sequenced, for many of them no apparent target could be
readily identified (see may comment above). For this reason, these snoRNAs
were called “orphan” snoRNAs [1, 21-27]. Finding their targets is of great
interest, particularly because the antisense boxes, which in prototypical
snoRNAs are involved in rRNA recognition, are conserved in many of these
snoRNAs, suggesting that they have a biological function.
It has recently been determined that among these “orphan” snoRNAs, the
HBII-52 (also called SNORD115) targets in fact an mRNA, namely the mRNA
encoding the serotonin receptor 2C [21]. The effect of the snoRNA-mRNA
interaction has been reported to be either alternative splicing [28] or editing
[29]. Together with the HBII-85 C/D box snoRNA (also called SNORD116),
which also occurs in multiple copies in the genome, HBII-52 is part of a region
of chromosome 15 that appears to be deleted in the Prader-Willi syndrome
(PWS) [21, 22, 28, 30]. Consistent with the neurological phenotype of this
genetic syndrome, the two families of snoRNAs have brain-specific
expression. Moreover, they are both expressed from the paternal
chromosome, whereas the corresponding allelic region in the maternal
chromosome is silenced due to genomic imprinting [31]. There is of course
now the follow up on this: Kishore, S., Khanna, A., Zhang, Z., Hui, J.,
Balwierz, P., Stefan, M., Beach, C., Nicholls, R.D., Zavolan, M. and Stamm,
S. (2010) The snoRNA MBII-52 (SNORD 115) is processed into smaller RNAs
and regulates alternative splicing. Hum Mol Genet, in press.
The first predictions of snoRNA-rRNA interactions were part of a reverseengineering approach for snoRNA identification in the yeast genome that
started from known modified rRNA nucleotides [32, 33]. Following the study of
Cavaille and Bachellerie [34] it was generally assumed that rRNA-snoRNA
interactions require 10 to 21 nucleotides of complementarity (Watson-Crick or
G-U pairing), see chapter 1 Baralle T between the antisense box of the
snoRNA and the rRNA target. The rRNA nucleotide that pairs with the 5 th
snoRNA nucleotide upstream of the D (or D’) box undergoes methylation.
Following the discovery of “orphan” snoRNAs and the reports that they may
target mRNAs, some attempts have been made to predict such mRNA
targets. snoTARGET [35] is a tool that was developed for this purpose. The
criteria defining the putative targets sites were taken from previous studies
that analyzed snoRNA/target interactions (rRNAs and snRNAs) [36, 37]. That
is, the duplex length was required to be between 9 and 20 nucleotides, a
maximum of three G-U pairs and a single mismatch, located either at position
2 or at a position >11 in the antisense box were allowed. The positions that
appear to tolerate mismatches were inferred in a study of Chen et al. [37], in
which more than 400 known rRNA and C/D snoRNA complementary
sequences were analyzed. The first position was defined to be the first
nucleotide after the D or D’ box and it did not participate in base-pairing
interactions. The application of snoTARGET allowed the authors to infer that
predicted target sites of HBII-85 snoRNAs are enriched near exons and
preferentially in alternatively spliced genes, supporting a proposed role of
these snoRNAs in alternative splicing.
2.2. MiRNAs
MiRNAs (micro RNAs) are genome-encoded 21-23 nucleotides-long RNAs.
They are transcribed by polymerase II [38] either from independent genes or
as part of the introns of protein-coding transcripts. Within the primary
transcripts, the miRNAs are part of hairpin structures that are recognized and
processed by the Drosha/DGCR8 complex to release individual pre-miRNAs
[39-42], which are then transported by exportin 5 [43] to the cytoplasm. Here,
the Dicer/TRBP (what does TRBP stand for) complex releases from the premiRNAs the 21-23 nt-long duplexes, from which usually only one strand is
incorporated into an miRNA-induced silencing complex (miRISC). This
complex, in which the miRNA acts as a guide, contains additionally an
Argonaute protein, and binds to target mRNAs to induce translational
repression, deadenylation and degradation of the mRNA target [44, 45]. (see
chapter 2, meister) Structural studies revealed that the miRNA 5' end is
anchored in the MID domain (what does MID stand for) of the Argonaute
protein, and that the 5' half of the miRNA is accessible and in a relatively rigid
conformation for target recognition. This explains numerous previous
observations of the 5' end of the miRNAs being most important for target
recognition [46-51]. What about that in siRNA, the PAZ domain contacts the
3’end of the RNA, is there a similar arrangement in miRNAs?
Although the principles of miRNA-target recognition are only partially
understood, many methods for predicting targets have already been
proposed. The variables that are taken into consideration range from
evolutionary conservation [50, 52-55], the sequence composition of the
environment of the putative target site, its location within the 3’UTR, the basepairing pattern in the 3' region of the miRNA [56], the structural accessibility of
the target site and the energy of interaction between miRNA and target [57,
58]. High-throughput measurements of mRNA [56, 59-62] or protein [61, 62]
levels upon miRNA transfection enable one to evaluate the performance of
prediction methods. Recent studies [63, 64] indicate that for miRNAs,
comparative genomics-based target prediction methods with very few
assumptions perform remarkably well in comparison to methods that aim to
capture mechanistic information. The aim of comparative genomic analyses is
to identify regions that are conserved in evolution, indicating that they are
under selective constraint. For instance, Xie et al. [65] and Lewis et al. [50]
predicted targets of miRNAs by identifying miRNA-complementary 3'UTR
segments that were conserved among a chosen set of species. To be able to
handle the continuously growing whole genome sequence data, various
authors proposed methods for quantifying the degree of conservation of
putative target sites [54, 55] or the probability that a putative target site has
been under selection [53].
3. Protocol
In order to predict targets of non-coding RNAs, be they snoRNAs or miRNAs,
we first need to define a model of interaction between the small RNA and the
target. Because only one example of a snoRNA-mRNA interaction is known
[28], in predicting mRNA targets of orphan snoRNAs we were guided
ourselves by principles of snoRNA-rRNA interaction. Cavaille and Bachelerie
[34] showed that snoRNA-rRNA duplexes are usually 10-21 bp, with an
average of 12-13 bp. A more recent study that included 415 rRNA and C/D
snoRNA complementary sequences from animals, plants and yeasts showed
that the constraint on duplex length can be further relaxes to 7-24 bp [37].
Thermodynamic stability of the duplex appears to play an important role,
because short duplexes are always GC-rich, whereas long duplexes appear
to tolerate G-U wobble and non-canonical base pairs, particularly when the
duplex is GC rich [34].
We therefore defined a putative snoRNA target site as either a genomic
region that has perfect complementarity to at least 10 contiguous nucleotides
in the snoRNA antisense box, or a genomic region that can form a stable
hybrid (free energy of hybridization lower than -15 kcal/mol) with the antisense
box of the snoRNA. We performed pattern matching to identify regions of at
least 10 nucleotides perfect complementarity, and we applied RNAhybrid [66]
can you give the url for all programs used in a table to predict stable hybrids.
We imposed additional constraints on the hybrids, as suggested by the results
of Cavaille and Bachelerie [34]. Namely, we only allowed a maximum of two
bulged nucleotides in the snoRNA and/or the target sequence. The final step
of our snoRNA target prediction in human was to extract the orthologous
regions in rhesus maccaque, mouse, cow and dog and to determine whether
they also contain putative snoRNA target sites as defined above. The final set
of predictions included only putative target sites that were conserved in all
these species. Because we were interested in the potential involvement of the
snoRNAs in alternative splicing, we intersected the set of predictions with the
loci of protein-coding genes.
For miRNAs we considered as models of interaction perfect complementarity
between the target and the 1-8, 1-7 or 2-8 nucleotides of the miRNA. We
further expanded on the relatively simple model described above, developing
a Bayesian model can you explain this here and also add to the glossary to
quantify the selection pressure on sites with particular patterns of
conservation across species. This enables us to incorporate information from
any number of species located at arbitrary evolutionary distances from each
other, appropriately weighing conservation between species that are close in
evolutionary distance and between species that are farther apart. Given a
putative site in the species of interest, we are interested to know the
probability that the site has been under evolutionary selection. The genome
sequence data however, only enables us to determine whether the site has
been conserved in other species, and it is of course more likely that the site is
conserved in closely related species relative to more distantly related species.
We thus set to estimate the probability P(s | c ) that a site with an observed

pattern c of conservation across species has a pattern of selection s in these
species. A conservation pattern c is defined as a vector of 0’s and 1’s, with 1
meaning that the site is present in a given species and can form the same
base pairs with the miRNA as in the reference species, and 0 meaning that


any of these two conditions does not hold. Similarly, a selection pattern s is

defined as a vector of +’s and –‘s, with a ‘+’ denoting that the site is under
selection on the branch of the evolutionary tree leading to the given species,
and ‘–‘ meaning the opposite. The vector s having only ‘–‘ entries

corresponds to the situation in which the site has not been under selection in
any of the considered species, whatever its conservation pattern is. If we
know in which species a site is under selection, we can infer the probabilities

for observing any of the possible conservation patterns. These are simply
P(c | bg)
. That is, we identify all the conservation
given by P(c | s ) 
c C(s ) P(c | bg)

patterns C(s ) that are consistent with the chosen selection pattern s , and
among these we compute the relative probability of conservation pattern c .
By a conservation pattern that is consistent with a selection pattern we mean

that the site is conserved in all the species in which it is under selection, but in
 species in which it is not under selection it may or may notbe conserved.
the

Conservation in species in which the site is not under selection is a chance
occurrence whose probability we can estimate as follows. Ideally we would
need 7- and 8-mer sequences that are evolving neutrally, i.e. are under no
particular selection. Because in reality we do not know what parts of the
transcripts are evolving neutrally, we rather estimate the probability to observe
each possible conservation pattern over all possible 7- and 8-mers, only a
small fraction of which corresponds to miRNA-complementary 7- or 8-mers.
What we are interested in however is not the probability of a conservation
pattern given a selection pattern, but rather the probability of a selection
pattern given the observed conservation patterns. From Bayes’ theorem, can
you explain this here and also add to the glossary we have that
P(c | s )P(s ) P(c | s )P(s )
, where P(s ) is the prior probability of
P(s | c ) 

P(c )
s P(c | s )
selection pattern s . We estimate these priors by maximizing the likelihood of
the conservation patterns. That is, the likelihood of the conservation data is

n(c )
given by L   P(c )
, where n(c ) is the number of times conservation
c

pattern c has been observed in the data, and P(c )   P(c | s )P(s ) , the
s S
probability for conservation
 pattern c , is given in terms of a quantity which
 be computed as described above, P(c | s ) , and the prior probabilities
can

over selection patterns, P(s ) . The probabilities over selection patterns are the

parameters that we need to
optimize. The number of these parameters grows
exponentially with the number of
species, i.e. as 2g1, where g is the number
of species that 
we take into consideration (including the reference species for
which we want to predict sites). It is clear that as the number of fully
sequenced genomes that we use in our inference grows, the number of


parameters that we would have to estimate would quickly
become too large.
To circumvent this problem, we used the following approximation. We
computed the probability of a selection pattern as a product of the selection
patterns at every node in the evolutionary tree. With a further assumption that
sites can only be lost in evolution, at each node in the tree we can have one
of three situations: the site is under selection only along the left branch, only
along the right branch, or in both branches originating at the node. Why do
you rule out that sites are created? In the branch linking the reference species
with the rest of the evolutionary tree we have an additional probability  that
the site in under selection. Finally, we take into consideration the pattern of
conservation of the miRNAs, reasoning that if a miRNA is absent in a species,
regions that are complementary to the miRNA in this species cannot be under
 In the end,
selection. A complete discussion of this model is presented in [53].
for each miRNA-complementary site in the reference species we estimate the
probability that the site is under selection in any other of the considered
P(c | bg,)(1 )
species as P(functional | c )  P(s  | c )  1
. Here the
 P(c | s )P(s )
s S
vector  denotes absence of selection in all the considered species.
Can you summarize this is a bullet-style list, step 1, 2 3, etc as a condensed
 Can you also put this protocol on the eurasnet site?
protocol.

What kind of controls do you use for your model?
4. Example of an experiment
With the methods described above we predicted targets for all miRNAs in
human, mouse, rat, fish, fruitfly and worm, and targets of the orphan snoRNA
HBII-52 in human mRNAs.
In the case of HBII-52, we obtained 222 predicted target sites located within
or at most 200 nt nucleotides away from a known human exon. Can you also
indicate the web site that was used in the paper? Many of these predictions
were tested experimentally (Stamm group, submitted 1.
Kishore,
S.,
Khanna, A., Zhang, Z., Hui, J., Balwierz, P., Stefan, M., Beach, C., Nicholls,
R.D., Zavolan, M. and Stamm, S. (2010) The snoRNA MBII-52 (SNORD 115)
is processed into smaller RNAs and regulates alternative splicing. Hum Mol
Genet, in press.) by the transfection of neuronal cells (Neuro2A) with MBII-52
(the mouse ortholog of HBII-52). RT-PCRs of the isolated RNA revealed the
splicing pattern of each of the candidate gene. As a control, a mutant MBII-52
construct with a scrambled antisense box was used. Based on these
experiments, the Stamm group identified five additional targets, whose
splicing pattern is affected by MBII-52 (Stamm group, submitted). Returning to
the question of evidence of evolutionary selection on MBII-52-complementary
sites, we found that applying the same algorithm to randomized sequences
with the same dinucleotide composition as the real snoRNAs does not yield
larger numbers of conserved putative target sites. This indicates that we have
not yet captured the relevant determinants of functional snoRNA-mRNA
interactions and that additional work will be needed to uncover these
determinants. Alternatively, the number of targets of orphan snoRNAs may be
too small to yield a statistical signal when compared to predicted targets of
randomized snoRNA sequences. We could show that the snoRNAs are
actually processed, which would leave other parts of the molecule open for
pairing, not just the antisense box, is this a possibility?
On the other hand, many miRNAs have hundreds of strongly conserved target
sites, many more than could be expected by chance. This indicates that the
miRNA 5’ end is indeed a very important determinant of target recognition and
that miRNAs are part of vast regulatory networks. Target predictions of all
human, mouse, rat, fish, fruitfly and worm miRNAs with the Bayesian method
described above can be found at www.mirz.unibas.ch. Although only 3’UTRs
were taken into consideration for miRNA target prediction, the method can be
equally well applied to other transcript or genomic regions (5’UTRs, CDS,
promoters, etc.), provided that the probabilities of “chance conservation”
patterns are estimated from the region of interest.
To make it less abstract, can you provide some screenshots as a figure for
this experiment?
5. Troubleshooting
For many miRNAs the number of target sites appears to be in the range of
hundreds, and particularly among conserved sites, miRNA-complementary
sites clearly outnumber the sites of similar length that are not complementary
to miRNAs. Thus, we can be relatively confident in such predictions. This is
however, not true for all miRNAs. Perhaps the most notorious case is the lsy6 miRNA in the worm. This miRNA is involved in the establishment of left-right
asymmetry in the ASE taste receptor sensory neurons, and it has only one
known target, the cog-1 (Connection of Gonad defective family member 1)
transcript [67]. Although the predicted lsy-6 sites in the cog-1 transcript are
completely conserved in worms, they are assigned very low posterior
probabilities because lsy-6 complementary site cannot be distinguished from
other 8-mers that are conserved without being under selection. Thus, if a site
is assigned a low probability, it does not necessarily mean that it is not
functional. It can also mean that the corresponding small RNA has a low
number of sites in the transcriptome, with the number of conserved instances
being just as high as we would predict for randomized variants of the small
RNA. We would also obtain low posterior probabilities for the sites if the small
RNA-target interaction model were inaccurate. For these reasons, predicting
small RNA interaction sites using comparative genomics methods needs to
build onto the knowledge about the relevant determinants of small RNA–
target interaction that is accumulated through experiments or other
computational analyses.
Figure legend
Figure 1. A. Secondary structure consensus of C/D snoRNAs. The classical
drawing of helices with canonical as well as other isosteric base pairs that
form the K-turn follows [15]. The C’ and D’ boxes that are not always
conserved (by contrast to the C and D boxes) are shown in lowercase letters
and dashed base pair symbols (forming preferentially a K-loop instead of a K-
turn). B. 2’-O-ribose methylation reaction catalyzed by C/D snoRNPs. C.
Secondary structure consensus of H/ACA snoRNAs. D. Pseudouridylation
reaction catalyzed by H/ACA snoRNPs: a uridine (U) base is isomerized into a
pseudouridine ().
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Huttenhofer, A., Kiefmann, M., Meier-Ewert, S., O'Brien, J., Lehrach, H.,
Bachellerie, J. P., and Brosius, J. (2001). RNomics: an experimental
approach that identifies 201 candidates for novel, small, non-messenger
RNAs in mouse. EMBO J 20, 2943-2953.
Lau, N. C., Lim, L. P., Weinstein, E. G., and Bartel, D. P. (2001). An
abundant class of tiny RNAs with probable regulatory roles in
Caenorhabditis elegans. Science 294, 858-862.
Lee, R. C., and Ambros, V. (2001). An extensive class of small RNAs in
Caenorhabditis elegans. Science 294, 862-864.
Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001).
Identification of novel genes coding for small expressed RNAs. Science
294, 853-858.
Aravin, A., et al. (2006). A novel class of small RNAs bind to MILI protein
in mouse testes. Nature 442, 203-207.
Girard, A., Sachidanandam, R., Hannon, G. J., and Carmell, M. A.
(2006). A germline-specific class of small RNAs binds mammalian Piwi
proteins. Nature 442, 199-202.
Lau, N. C., Seto, A. G., Kim, J., Kuramochi-Miyagawa, S., Nakano, T.,
Bartel, D. P., and Kingston, R. E. (2006). Characterization of the piRNA
complex from rat testes. Science 313, 363-367.
Ender, C., Krek, A., Friedlander, M. R., Beitzinger, M., Weinmann, L.,
Chen, W., Pfeffer, S., Rajewsky, N., and Meister, G. (2008). A human
snoRNA with microRNA-like functions. Mol Cell 32, 519-528.
Kiss, T. (2002). Small nucleolar RNAs: an abundant group of noncoding
RNAs with diverse cellular functions. Cell 109, 145-148.
Bachellerie, J. P., Cavaille, J., and Huttenhofer, A. (2002). The
expanding snoRNA world. Biochimie 84, 775-790.
Matera, A. G., Terns, R. M., and Terns, M. P. (2007). Non-coding RNAs:
lessons from the small nuclear and small nucleolar RNAs. Nat Rev Mol
Cell Biol 8, 209-220.
Reichow, S. L., Hamma, T., Ferre-D'Amare, A. R., and Varani, G. (2007).
The structure and function of small nucleolar ribonucleoproteins. Nucleic
Acids Res 35, 1452-1464.
Lestrade, L., and Weber, M. J. (2006). snoRNA-LBME-db, a
comprehensive database of human H/ACA and C/D box snoRNAs.
Nucleic Acids Res 34, D158-62.
Dieci, G., Preti, M., and Montanini, B. (2009). Eukaryotic snoRNAs: a
paradigm for gene expression flexibility. Genomics 94, 83-88.
Leontis, N. B., and Westhof, E. (2001). Geometric nomenclature and
classification of RNA base pairs. RNA 7, 499-512.
Brown, J. W., Echeverria, M., Qu, L. H., Lowe, T. M., Bachellerie, J. P.,
Huttenhofer, A., Kastenmayer, J. P., Green, P. J., Shaw, P., and
Marshall, D. F. (2003). Plant snoRNA database. Nucleic Acids Res 31,
432-435.
17. Ma, X., Yang, C., Alexandrov, A., Grayhack, E. J., Behm-Ansmant, I.,
and Yu, Y. T. (2005). Pseudouridylation of yeast U2 snRNA is catalyzed
by either an RNA-guided or RNA-independent mechanism. EMBO J 24,
2403-2413.
18. Piekna-Przybylska, D., Decatur, W. A., and Fournier, M. J. (2007). New
bioinformatic tools for analysis of nucleotide modifications in eukaryotic
rRNA. RNA 13, 305-312.
19. Omer, A. D., Lowe, T. M., Russell, A. G., Ebhardt, H., Eddy, S. R., and
Dennis, P. P. (2000). Homologs of small nucleolar RNAs in Archaea.
Science 288, 517-522.
20. Clouet d'Orval, B., Bortolin, M. L., Gaspin, C., and Bachellerie, J. P.
(2001). Box C/D RNA guides for the ribose methylation of archaeal
tRNAs. The tRNATrp intron guides the formation of two ribosemethylated nucleosides in the mature tRNATrp. Nucleic Acids Res 29,
4518-4529.
21. Cavaille, J., Buiting, K., Kiefmann, M., Lalande, M., Brannan, C. I.,
Horsthemke, B., Bachellerie, J. P., Brosius, J., and Huttenhofer, A.
(2000). Identification of brain-specific and imprinted small nucleolar RNA
genes exhibiting an unusual genomic organization. Proc Natl Acad Sci U
S A 97, 14311-14316.
22. Cavaille, J., Vitali, P., Basyuk, E., Huttenhofer, A., and Bachellerie, J. P.
(2001). A novel brain-specific box C/D small nucleolar RNA processed
from tandemly repeated introns of a noncoding RNA gene in rats. J Biol
Chem 276, 26374-26383.
23. Cavaille, J., Seitz, H., Paulsen, M., Ferguson-Smith, A. C., and
Bachellerie, J. P. (2002). Identification of tandemly-repeated C/D
snoRNA genes at the imprinted human 14q32 domain reminiscent of
those at the Prader-Willi/Angelman syndrome region. Hum Mol Genet
11, 1527-1538.
24. Vitali, P., Royo, H., Seitz, H., Bachellerie, J. P., Huttenhofer, A., and
Cavaille, J. (2003). Identification of 13 novel human modification guide
RNAs. Nucleic Acids Res 31, 6543-6551.
25. Kiss, A. M., Jady, B. E., Bertrand, E., and Kiss, T. (2004). Human box
H/ACA pseudouridylation guide RNA machinery. Mol Cell Biol 24, 57975807.
26. Fedorov, A., Stombaugh, J., Harr, M. W., Yu, S., Nasalean, L., and
Shepelev, V. (2005). Computer identification of snoRNA genes using a
Mammalian Orthologous Intron Database. Nucleic Acids Res 33, 45784583.
27. Yang, J. H., Zhang, X. C., Huang, Z. P., Zhou, H., Huang, M. B., Zhang,
S., Chen, Y. Q., and Qu, L. H. (2006). snoSeeker: an advanced
computational package for screening of guide and orphan snoRNA
genes in the human genome. Nucleic Acids Res 34, 5112-5123.
28. Kishore, S., and Stamm, S. (2006). The snoRNA HBII-52 regulates
alternative splicing of the serotonin receptor 2C. Science 311, 230-232.
29. Vitali, P., Basyuk, E., Le Meur, E., Bertrand, E., Muscatelli, F., Cavaille,
J., and Huttenhofer, A. (2005). ADAR2-mediated editing of RNA
substrates in the nucleolus is inhibited by C/D small nucleolar RNAs. J
Cell Biol 169, 745-753.
30. Sahoo, T., del Gaudio, D., German, J. R., Shinawi, M., Peters, S. U.,
Person, R. E., Garnica, A., Cheung, S. W., and Beaudet, A. L. (2008).
Prader-Willi phenotype caused by paternal deficiency for the HBII-85
C/D box small nucleolar RNA cluster. Nat Genet 40, 719-721.
31. Horsthemke, B., and Wagstaff, J. (2008). Mechanisms of imprinting of
the Prader-Willi/Angelman region. Am J Med Genet A 146A, 2041-2052.
32. Lowe, T. M., and Eddy, S. R. (1999). A computational screen for
methylation guide snoRNAs in yeast. Science 283, 1168-1171.
33. Wood, V., et al. (2002). The genome sequence of Schizosaccharomyces
pombe. Nature 415, 871-880.
34. Cavaille, J., and Bachellerie, J. P. (1998). SnoRNA-guided ribose
methylation of rRNA: structural features of the guide RNA duplex
influencing the extent of the reaction. Nucleic Acids Res 26, 1576-1587.
35. Bazeley, P. S., Shepelev, V., Talebizadeh, Z., Butler, M. G., Fedorova,
L., Filatov, V., and Fedorov, A. (2008). snoTARGET shows that human
orphan snoRNA targets locate close to alternative splice junctions. Gene
408, 172-179.
36. Huttenhofer, A., Cavaille, J., and Bachellerie, J. P. (2004). Experimental
RNomics: a global approach to identifying small nuclear RNAs and their
targets in different model organisms. Methods Mol Biol 265, 409-428.
37. Chen, C. L., Perasso, R., Qu, L. H., and Amar, L. (2007). Exploration of
pairing constraints identifies a 9 base-pair core within box C/D snoRNArRNA duplexes. J Mol Biol 369, 771-783.
38. Lee, Y., Kim, M., Han, J., Yeom, K. H., Lee, S., Baek, S. H., and Kim, V.
N. (2004). MicroRNA genes are transcribed by RNA polymerase II.
EMBO J 23, 4051-4060.
39. Lee, Y., et al. (2003). The nuclear RNase III Drosha initiates microRNA
processing. Nature 425, 415-419.
40. Yeom, K. H., Lee, Y., Han, J., Suh, M. R., and Kim, V. N. (2006).
Characterization of DGCR8/Pasha, the essential cofactor for Drosha in
primary miRNA processing. Nucleic Acids Res 34, 4622-4629.
41. Han, J., Lee, Y., Yeom, K. H., Nam, J. W., Heo, I., Rhee, J. K., Sohn, S.
Y., Cho, Y., Zhang, B. T., and Kim, V. N. (2006). Molecular basis for the
recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell
125, 887-901.
42. Han, J., Lee, Y., Yeom, K. H., Kim, Y. K., Jin, H., and Kim, V. N. (2004).
The Drosha-DGCR8 complex in primary microRNA processing. Genes
Dev 18, 3016-3027.
43. Lund, E., Guttinger, S., Calado, A., Dahlberg, J. E., and Kutay, U.
(2004). Nuclear export of microRNA precursors. Science 303, 95-98.
44. Eulalio, A., Huntzinger, E., Nishihara, T., Rehwinkel, J., Fauser, M., and
Izaurralde, E. (2009). Deadenylation is a widespread effect of miRNA
regulation. RNA 15, 21-32.
45. Fabian, M. R., et al. (2009). Mammalian miRNA RISC recruits CAF1 and
PABP to affect PABP-dependent deadenylation. Mol Cell 35, 868-880.
46. Lai, E. C. (2002). Micro RNAs are complementary to 3' UTR sequence
motifs that mediate negative post-transcriptional regulation. Nat Genet
30, 363-364.
47. Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P., and
Burge, C. B. (2003). Prediction of mammalian microRNA targets. Cell
115, 787-798.
48. Doench, J. G., and Sharp, P. A. (2004). Specificity of microRNA target
selection in translational repression. Genes Dev 18, 504-511.
49. Rajewsky, N., and Socci, N. D. (2004). Computational identification of
microRNA targets. Dev Biol 267, 529-535.
50. Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed
pairing, often flanked by adenosines, indicates that thousands of human
genes are microRNA targets. Cell 120, 15-20.
51. Brennecke, J., Stark, A., Russell, R. B., and Cohen, S. M. (2005).
Principles of microRNA-target recognition. PLoS Biol 3, e85.
52. Krek, A., et al. (2005). Combinatorial microRNA target predictions. Nat
Genet 37, 495-500.
53. Gaidatzis, D., van Nimwegen, E., Hausser, J., and Zavolan, M. (2007).
Inference of miRNA targets using evolutionary conservation and pathway
analysis. BMC Bioinformatics 8, 69.
54. Kheradpour, P., Stark, A., Roy, S., and Kellis, M. (2007). Reliable
prediction of regulator targets using 12 Drosophila genomes. Genome
Res 17, 1919-1931.
55. Friedman, R. C., Farh, K. K., Burge, C. B., and Bartel, D. P. (2009). Most
mammalian mRNAs are conserved targets of microRNAs. Genome Res
19, 92-105.
56. Grimson, A., Farh, K. K., Johnston, W. K., Garrett-Engele, P., Lim, L. P.,
and Bartel, D. P. (2007). MicroRNA targeting specificity in mammals:
determinants beyond seed pairing. Mol Cell 27, 91-105.
57. Long, D., Lee, R., Williams, P., Chan, C. Y., Ambros, V., and Ding, Y.
(2007). Potent effect of target structure on microRNA function. Nat Struct
Mol Biol 14, 287-294.
58. Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., and Segal, E. (2007).
The role of site accessibility in microRNA target recognition. Nat Genet
39, 1278-1284.
59. Lim, L. P., Lau, N. C., Garrett-Engele, P., Grimson, A., Schelter, J. M.,
Castle, J., Bartel, D. P., Linsley, P. S., and Johnson, J. M. (2005).
Microarray analysis shows that some microRNAs downregulate large
numbers of target mRNAs. Nature 433, 769-773.
60. Linsley, P. S., et al. (2007). Transcripts targeted by the microRNA-16
family cooperatively regulate cell cycle progression. Mol Cell Biol 27,
2240-2252.
61. Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R.,
and Rajewsky, N. (2008). Widespread changes in protein synthesis
induced by microRNAs. Nature 455, 58-63.
62. Baek, D., Villen, J., Shin, C., Camargo, F. D., Gygi, S. P., and Bartel, D.
P. (2008). The impact of microRNAs on protein output. Nature 455, 6471.
63. Hausser, J., Landthaler, M., Jaskiewicz, L., Gaidatzis, D., and Zavolan,
M. (2009). Relative contribution of sequence and structure features to
the mRNA binding of Argonaute/EIF2C-miRNA complexes and the
degradation of miRNA targets. Genome Res
64. Bartel, D. P. (2009). MicroRNAs: target recognition and regulatory
functions. Cell 136, 215-233.
65. Xie, X., Lu, J., Kulbokas, E. J., Golub, T. R., Mootha, V., Lindblad-Toh,
K., Lander, E. S., and Kellis, M. (2005). Systematic discovery of
regulatory motifs in human promoters and 3' UTRs by comparison of
several mammals. Nature 434, 338-345.
66. Rehmsmeier, M., Steffen, P., Hochsmann, M., and Giegerich, R. (2004).
Fast and effective prediction of microRNA/target duplexes. RNA 10,
1507-1517.
67. Johnston, R. J., and Hobert, O. (2003). A microRNA controlling left/right
neuronal asymmetry in Caenorhabditis elegans. Nature 426, 845-849.
Bayesian model
Bayes’ theorem
prior probability
posterior probabilities
zavolan
zavolan
zavolan
zavolan
Abbreviations
cog-1
K-turn
miRNA
mRNA
nt
PWS

RNase
RNP
rRNA
scaRNA
snoRNA
snoRNP
snRNA
sRNA
tRNA
U
Connection of Gonad defective family member 1
Kink-turn
microRNA
messenger RNA
nucleotides
Prader-Willi syndrome
Pseudouridine
Ribonuclease
Ribonucleoprotein
ribosomal RNA
small Cajal bodies RNA
small nucleolar RNA
small nucleolar ribonucleoprotein particle
small nuclear RNA
sno-like RNA
transfer RNA
Uridine
Download