Introduction

advertisement
Aadel Chaudhuri
Long Term Project
Prediction of let-7 Target Sites and Analyses of Possible
lin-4 Target Sites in Caenorhabditis Elegans
Introduction
Micro RNAs (miRNAs) are a large class of naturally derived small RNA
sequences that are found in Caenorhabditis Elegans, Drosophila, humans, rodents, plants,
and possibly other organisms [3, 5]. These sequences seem to function by regulating
gene expression [2, 3]. Much of this regulation, as shown in C. Elegans, occurs during
development. Specifically, miRNAs such as let-7 and lin-4 appear to play a decisive role
in the timing of the worm's early development [2].
It is still relatively unknown how exactly this proceeds. Accumulating data
indicates that miRNAs might form silencing complexes very similar to those formed by
small interfering RNAs (siRNAs) [1]. However, evidence also indicates that while
siRNAs bind with perfect complementarity to target sequences, miRNAs, with the
exception of some plant miRNAs, do not [1, 2, 5, 6]. As a result, locating miRNA target
sequences has been a difficult task for biologists and bioinformaticists alike [5].
Here we propose a bioinformatics method of determining miRNA binding sites by
using known patterns of binding. As shown by recent research, miRNAs seem to bind to
targets with stricter complementarity on their 5' and 3' ends, and tend to exhibit less
stringent complementarity with possible looping in the miRNA sequence middle [1, 4].
Additionally, known miRNAs such as let-7 and lin-4 tend to preferentially bind to the 3'
UTR region of their target mRNAs [1, 2, 4, 10]. Micro RNAs also appear to regulate
mRNAs with numerous binding sites [6]. Using this information, we've constructed a
1
bioinformatics method designed to determine potential miRNA binding sites. We've
tested this method using C. elegans, for which the roles of several miRNAs and some
binding site sequences are already known [2, 3, 7, 15].
Methods
Development of Search Strategy
The let-7:lin-41 interaction, which has been determined experimentally, was
analyzed for a pattern motif. A loose pattern resembling this interaction was developed
(figure 1) [7]. The determined pattern did not require complementarity of either the 5' or
the last two 3' miRNA bases. It did require perfect complementarity of the miRNA's six
bases on the 3' end (not including the two terminal bases) with the mRNA target; on the
5' end of the miRNA, a moving window with three possible positions of complementarity
of four bases, allowing a bulge of one nucleotide. The middle of the miRNA was allowed
to contain any sequence between the length of five less than and five greater than the
length of the miRNA sequence middle. Guanine:Uridine pairing at any position was
allowed [4].
Using the prescribed pattern specifications, a pattern file in accordance with the
Patscan sequencing program was made [8, 9]. An invertebrate UTR database containing
known and predicted 3' UTR sequences, and deprived of redundancy, was downloaded
[10, 11]. Patscan was run on this database using the let-7 pattern specifications. The
results were then sorted by number of hits per UTR fasta entry, where entries with the
greatest number of hits were placed at the top of the list. UTR fasta entries containing
one potential binding site were discarded.
2
Reverse compliment 3’- ACUAUACAACGUACUACCUCA -5’
let-7
|||||||||||||||||||||
5’- UGAUAUGUUGCAUGAUGGAGU -3’
make a pattern
moving window
(allow 1 bulge)
exact match*
let-7
3’-ACUAUACAACGUACUACCUCA-5’
||||||||||||||||||||
5’-UGAUAUGUUGCAUGAUGGAGU -3’
allow for a loop
(~ ±5 nt)
Figure 1: Method for making a let-7 search pattern
The same approach was taken to predict binding sites for the lin-4 miRNA. The
lin-4:lin-41 interaction has not been determined experimentally, but has been predicted
computationally [7]. Here the pattern differed from that of let-7. The 3' and 5' terminal
bases were not considered in the pattern. Not including these, perfect complementarity
for the first four 5' terminus bases was required; complementarity of the last seven 3'
terminus bases was also required, with one bulge of one nucleotide allowed; The middle
of the miRNA was allowed to contain any sequence between the length of five less than
and five greater than the length of the miRNA sequence middle. Unlike before,
Guanine:Uridine pairing was not allowed.
3
Testing Randomized miRNAs
The above search approach was used to test randomized let-7 and lin-4 sequences,
which served as controls. A common randomizing algorithm, the Fisher-Yates shuffle
was used to develop ten randomized sequences for let-7 and and five randomized
sequences for lin-4 [16]. Potential binding sites were ranked as described above.
Measuring Thermodynamics of Binding
Sorted search results for all randomized and actual sequences were analyzed for
binding strength by running miRNA and potential target sequences through mfold, an
RNA folding program that also measures the thermodynamics of binding [11-12].
Binding sites for let-7 were ranked according to the free energy of binding.
Final Ranking of Target Sites
Top-ranked let-7 binding sites from both ranked lists (thermodynamics and hits
per UTR) were compared. Only those that appeared in both lists were chosen as
predicted let-7 targets.
Functional Analysis of let-7 Binding Sites
The functions of the protein products corresponding to the predicted let-7 binding
sites were determined using the UTR database. These sites were compared with known
experimentally determined target sites.
4
Results
Determination of Let-7 Target Sites
The let-7 search on the invertebrate 3' UTR database yielded sixty-seven
predicted targets out of a total of 26,664 UTR fasta entries. Of these, 31 have known
developmental roles, while 36 have other or unknown roles. Of the targets with
developmental roles, six are known to be heterochronic, three are involved in neuronal
remodelling, five have vulval functions, and five are nuclear hormone receptors. The
functions of the top 20 targets are all developmental, with the exception of one which has
unknown function (table 1) [14].
Table 1: Predicted let-7 Target Genes in C. Elegans
Top 20 of 76 predicted target genes
(in ranking order)
DAF-12
Hunchback-related protein
Majuk protein (DLG-1)
Histone H1.1
Lin-41B
Nuclear receptor NHR-51
Kinesin-like protein
Nuclear receptor NHR-66
hnRNP protein
Gene 5N224
Eat-20
LET-413
Lin-14
Nuclear receptor yk452
Nuclear receptor yk487
DAF-16
Tyrosine phosphatase
Lin-14 (exon4-13)
Notch-like membrane receptor
EFL-2
phenotype
dauer formation
heterochronic
neurogenesis
differentiation
heterochronic
development
neurogenesis
development
??
novel
notch-homolog
neurogenesis
heterochronic
development
development
dauer formation
development
heterochronic
development
development
repeats*
5
4
4
3
2
3
2
2
2
3
2
3
2
2
2
2
2
2
2
*Number of predicted let-7 binding sites within target gene
5
The binding interaction between let-7 and its two highest-ranked target UTR sites (Daf12 and hunchback) is shown in figure 2.
U| GCCC
UCA UGCAGCCUA
CUACCUC
||| |||||||||
|||||||
AGU AUGUUGGAU
GAUGGAG
-^ U
---U
---|
ACC UAUUU
AUGCAAC
AC ACCUC
|||||||
|| |||||
UAUGUUG
UG UGGAG
AGU^
GA- A
U
---|
UUU
U G UU
AUGUAACUU
CUG CU UC
|||||||||
||| || ||
UAUGUUGGA
GAU GG AG
AGU^
U-- - U
U| UG----- UCA
UU CUACUUC
|||
|| |||||||
AGU
GA GAUGGAG
-^ UAUGUUG U
U
UA| UG
UG UUG AUG UC ACUAUUUU
||| ||| || ||||||||
AGU UGU GG UGAUGGAG
--^ UA
U- A
U
DAF-12 site 1
let-7
DAF-12 site 2
let-7
DAF-12 site 3
let-7
DAF-12 site 4
let-7
DAF-12 site 5
let-7
CU-|
AAU
U
Hunchback site 1
GUAU
GCCU CUACCUC
let-7
||||
|||| |||||||
UAUG
UGGA GAUGGAG
AGU^
U-U
U
CU-|
CC
G
GUAU ACC AUUACUUU
|||| ||| ||||||||
UAUG UGG UGAUGGAG
AGU^
UA
U
CU-|
C AUAUGCA AC
UUGUUUC
||||| ||
|||||||
UAUGU UG
GAUGGAG
AGU^
- GAU
U
C| U---GUUUU
UCA
ACUUGUUACU
|||
||||||||||
AGU
UGGAUGAUGG
-^ UAUGU
AGU
Hunchback site 2
let-7
Hunchback site 3
let-7
Hunchback site 4
let-7
Figure 2: Predicted let-7 Binding Interactions of Daf-12 and
Hunchback Gene 3' UTR Sequences*
*Interactions were predicted by mfold [13].
Thermodynamic analysis of let-7 binding with all possible target sites reveals
values of free energy of binding ranging from -58 kJ/mol to -15 kJ/mol (figure 3) [12-13].
Values for free energy of binding of the ten randomized sequences ranged within 25
6
kJ/mol of the wild type interaction, with seven randomized sequences binding (on
average) with lower free energy to candidate sites. This indicates that randomized let-7
sequences may bind strongly to their respective targets.
Figure 3: Free energy of binding values for wild type let-7 and ten randomized
let-7 sequences. Dark black line in graph indicates wild type sequence. Colored
lines indicate randomized sequences. The ten randomized sequences are
shown above with conserved regions (between them) highlighted.*
*Conservation determined with Clustal Alignment [17]
Analyses of lin-4 Binding Interactions
Thermodynamics of binding between lin-4 and its target sequences revealed
values between -105 kJ/mol and -18 kJ/mol (figure 4). The curve correlating free energy
and UTR entry is quite a bit steeper for lin-4 than for any of the random sequences. Still,
the 40 strongest target sites have free energy values of binding that are on average 35
kJ/mol lower than those of the scrambled sequences.
7
Candidate lin-4 target sequence
0
0
100
200
300
400
500
-20
random3
-40
Sum of dG
lin-4
random3
random5
random4
-60
random2
random1
lin-4
-80
-100
-120
Figure 4: Free energy of binding values for wild type lin-4 and ten randomized
let-7 sequences. Dark black line in graph indicates wild type sequence. Colored
lines indicate randomized sequences. The ten randomized sequences are
shown above with conserved regions (between them) highlighted.*
*Conservation determined with Clustal Alignment [17]
The top lin-4 target site was the transcript of the lin-14 gene, a heterochronic gene
with developmental roles. Lin-4 binding sites on the lin-14 UTR are shown in figure 5
[7].
8
Figure 5: lin-4 is complimentary to 7 sites in the lin-14 3’ UTR.*
*Figure courtesy of Diya, Et Al [7].
9
References
1. Mcmanus, M. & Sharp, P.A. Gene silencing in mammals by short interfering RNAs.
Nature Reviews 3, 737-47 (2002).
2. Lau, N., Lim, L., Weinstein, E. & Bartel, D. An Abundant Class of Tiny RNAs with
Probable Regulatory Roles in Caenorhabditis elegans. Science 294, 858-62 (2001).
3. Lee, R. & Ambros, V. An Extensive Class of Small RNAs in Caenorhabditis elegans.
Science 294, 862-4 (2001).
4. Nature Publishing Group. Micro RNAs are complementary to 3' UTR sequence motifs
that mediate negative post-transcriptional regulation. Nature Genetics 30, 363-4 (2002).
5. Rhoades, M., Reinhart, B., Lim, L., Burge, C., Bartel, B. & Bartel, D. Prediction of
Plant MicroRNA Targets. Cell DOI: 10.1016/S0092867402008632.
6. Reinhart, B., Weinstein, E., Rhoades, M., Bartel, B. & Bartel, D. MicroRNAs in
plants. Genes and Development 16, 1616-26 (2002).
7. Banerjee, D. & Slack, F. Control of developmental timing by small temporal RNAs: A
paradigm of RNA-mediated regulation of gene expression. Bioessays 24 119-129 (2002).
8. Ross Overbeek's Patscan (http://wwwunix.mcs.anl.gov/compbio/PatScan/HTML/patscan.html
9. Dsouza, M., Larsen, N., Overbeek, R. Searching for patterns in genomic data.
Trends Genet 12 497-8 (1997).
10. Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C. & Saccone, C.
UTRdb and UTRsite: specialized databases of sequences and functional elements of 5'
and 3' untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Research
30, 335-40 (2002).
11. UTR database (http://bighost.area.ba.cnr.it/BIG/UTRHome/).
12. Jacobson, A., Kumar, H. & Zuker, M. Effect of Spermidine on the Conformation of
Bacteriophage MS2 RNA: Electron Microscopy and Computer Modeling.
J. Mol. Biology 181, 517-531 (1985).
13. Michael Zuker's mfold (12) Web Server.
(http://www.bioinfo.rpi.edu/applications/mfold/).
14. Grishok, A., Pasquinelli, A., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D., Fire, A.,
Ruvkun, G. & Mello, C. Genes and mechanisms related to RNA interference regulate
10
expression of the small temporal RNAs that control C. elegans developmental timing.
Cell 106, 23-34 (2004).
15. Pasquinelli, A., Reinhart, B., Slack, F., Martindale, M., Kuroda, M., Maller, B.,
Hayward, D., Ball, E., Degnan, B., Muller, P., Spring, J., Srinivasan, A., Fishman, M.,
Finnerty, J., Corbo, J., Levine, M., Leahy, P., Davidson, E. & Ruvkun, G. Conservation
of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature
408, 86-9 (2000).
16. Christainsen, Tom and Nathan Torkington. Perl Cookbook. Cambridge: O'Reilly,
1998.
17. Clustal Alignment Tool
11
Download