Aadel Chaudhuri Long Term Project Prediction of let-7 Target Sites and Analyses of Possible lin-4 Target Sites in Caenorhabditis Elegans Introduction Micro RNAs (miRNAs) are a large class of naturally derived small RNA sequences that are found in Caenorhabditis Elegans, Drosophila, humans, rodents, plants, and possibly other organisms [3, 5]. These sequences seem to function by regulating gene expression [2, 3]. Much of this regulation, as shown in C. Elegans, occurs during development. Specifically, miRNAs such as let-7 and lin-4 appear to play a decisive role in the timing of the worm's early development [2]. It is still relatively unknown how exactly this proceeds. Accumulating data indicates that miRNAs might form silencing complexes very similar to those formed by small interfering RNAs (siRNAs) [1]. However, evidence also indicates that while siRNAs bind with perfect complementarity to target sequences, miRNAs, with the exception of some plant miRNAs, do not [1, 2, 5, 6]. As a result, locating miRNA target sequences has been a difficult task for biologists and bioinformaticists alike [5]. Here we propose a bioinformatics method of determining miRNA binding sites by using known patterns of binding. As shown by recent research, miRNAs seem to bind to targets with stricter complementarity on their 5' and 3' ends, and tend to exhibit less stringent complementarity with possible looping in the miRNA sequence middle [1, 4]. Additionally, known miRNAs such as let-7 and lin-4 tend to preferentially bind to the 3' UTR region of their target mRNAs [1, 2, 4, 10]. Micro RNAs also appear to regulate mRNAs with numerous binding sites [6]. Using this information, we've constructed a 1 bioinformatics method designed to determine potential miRNA binding sites. We've tested this method using C. elegans, for which the roles of several miRNAs and some binding site sequences are already known [2, 3, 7, 15]. Methods Development of Search Strategy The let-7:lin-41 interaction, which has been determined experimentally, was analyzed for a pattern motif. A loose pattern resembling this interaction was developed (figure 1) [7]. The determined pattern did not require complementarity of either the 5' or the last two 3' miRNA bases. It did require perfect complementarity of the miRNA's six bases on the 3' end (not including the two terminal bases) with the mRNA target; on the 5' end of the miRNA, a moving window with three possible positions of complementarity of four bases, allowing a bulge of one nucleotide. The middle of the miRNA was allowed to contain any sequence between the length of five less than and five greater than the length of the miRNA sequence middle. Guanine:Uridine pairing at any position was allowed [4]. Using the prescribed pattern specifications, a pattern file in accordance with the Patscan sequencing program was made [8, 9]. An invertebrate UTR database containing known and predicted 3' UTR sequences, and deprived of redundancy, was downloaded [10, 11]. Patscan was run on this database using the let-7 pattern specifications. The results were then sorted by number of hits per UTR fasta entry, where entries with the greatest number of hits were placed at the top of the list. UTR fasta entries containing one potential binding site were discarded. 2 Reverse compliment 3’- ACUAUACAACGUACUACCUCA -5’ let-7 ||||||||||||||||||||| 5’- UGAUAUGUUGCAUGAUGGAGU -3’ make a pattern moving window (allow 1 bulge) exact match* let-7 3’-ACUAUACAACGUACUACCUCA-5’ |||||||||||||||||||| 5’-UGAUAUGUUGCAUGAUGGAGU -3’ allow for a loop (~ ±5 nt) Figure 1: Method for making a let-7 search pattern The same approach was taken to predict binding sites for the lin-4 miRNA. The lin-4:lin-41 interaction has not been determined experimentally, but has been predicted computationally [7]. Here the pattern differed from that of let-7. The 3' and 5' terminal bases were not considered in the pattern. Not including these, perfect complementarity for the first four 5' terminus bases was required; complementarity of the last seven 3' terminus bases was also required, with one bulge of one nucleotide allowed; The middle of the miRNA was allowed to contain any sequence between the length of five less than and five greater than the length of the miRNA sequence middle. Unlike before, Guanine:Uridine pairing was not allowed. 3 Testing Randomized miRNAs The above search approach was used to test randomized let-7 and lin-4 sequences, which served as controls. A common randomizing algorithm, the Fisher-Yates shuffle was used to develop ten randomized sequences for let-7 and and five randomized sequences for lin-4 [16]. Potential binding sites were ranked as described above. Measuring Thermodynamics of Binding Sorted search results for all randomized and actual sequences were analyzed for binding strength by running miRNA and potential target sequences through mfold, an RNA folding program that also measures the thermodynamics of binding [11-12]. Binding sites for let-7 were ranked according to the free energy of binding. Final Ranking of Target Sites Top-ranked let-7 binding sites from both ranked lists (thermodynamics and hits per UTR) were compared. Only those that appeared in both lists were chosen as predicted let-7 targets. Functional Analysis of let-7 Binding Sites The functions of the protein products corresponding to the predicted let-7 binding sites were determined using the UTR database. These sites were compared with known experimentally determined target sites. 4 Results Determination of Let-7 Target Sites The let-7 search on the invertebrate 3' UTR database yielded sixty-seven predicted targets out of a total of 26,664 UTR fasta entries. Of these, 31 have known developmental roles, while 36 have other or unknown roles. Of the targets with developmental roles, six are known to be heterochronic, three are involved in neuronal remodelling, five have vulval functions, and five are nuclear hormone receptors. The functions of the top 20 targets are all developmental, with the exception of one which has unknown function (table 1) [14]. Table 1: Predicted let-7 Target Genes in C. Elegans Top 20 of 76 predicted target genes (in ranking order) DAF-12 Hunchback-related protein Majuk protein (DLG-1) Histone H1.1 Lin-41B Nuclear receptor NHR-51 Kinesin-like protein Nuclear receptor NHR-66 hnRNP protein Gene 5N224 Eat-20 LET-413 Lin-14 Nuclear receptor yk452 Nuclear receptor yk487 DAF-16 Tyrosine phosphatase Lin-14 (exon4-13) Notch-like membrane receptor EFL-2 phenotype dauer formation heterochronic neurogenesis differentiation heterochronic development neurogenesis development ?? novel notch-homolog neurogenesis heterochronic development development dauer formation development heterochronic development development repeats* 5 4 4 3 2 3 2 2 2 3 2 3 2 2 2 2 2 2 2 *Number of predicted let-7 binding sites within target gene 5 The binding interaction between let-7 and its two highest-ranked target UTR sites (Daf12 and hunchback) is shown in figure 2. U| GCCC UCA UGCAGCCUA CUACCUC ||| ||||||||| ||||||| AGU AUGUUGGAU GAUGGAG -^ U ---U ---| ACC UAUUU AUGCAAC AC ACCUC ||||||| || ||||| UAUGUUG UG UGGAG AGU^ GA- A U ---| UUU U G UU AUGUAACUU CUG CU UC ||||||||| ||| || || UAUGUUGGA GAU GG AG AGU^ U-- - U U| UG----- UCA UU CUACUUC ||| || ||||||| AGU GA GAUGGAG -^ UAUGUUG U U UA| UG UG UUG AUG UC ACUAUUUU ||| ||| || |||||||| AGU UGU GG UGAUGGAG --^ UA U- A U DAF-12 site 1 let-7 DAF-12 site 2 let-7 DAF-12 site 3 let-7 DAF-12 site 4 let-7 DAF-12 site 5 let-7 CU-| AAU U Hunchback site 1 GUAU GCCU CUACCUC let-7 |||| |||| ||||||| UAUG UGGA GAUGGAG AGU^ U-U U CU-| CC G GUAU ACC AUUACUUU |||| ||| |||||||| UAUG UGG UGAUGGAG AGU^ UA U CU-| C AUAUGCA AC UUGUUUC ||||| || ||||||| UAUGU UG GAUGGAG AGU^ - GAU U C| U---GUUUU UCA ACUUGUUACU ||| |||||||||| AGU UGGAUGAUGG -^ UAUGU AGU Hunchback site 2 let-7 Hunchback site 3 let-7 Hunchback site 4 let-7 Figure 2: Predicted let-7 Binding Interactions of Daf-12 and Hunchback Gene 3' UTR Sequences* *Interactions were predicted by mfold [13]. Thermodynamic analysis of let-7 binding with all possible target sites reveals values of free energy of binding ranging from -58 kJ/mol to -15 kJ/mol (figure 3) [12-13]. Values for free energy of binding of the ten randomized sequences ranged within 25 6 kJ/mol of the wild type interaction, with seven randomized sequences binding (on average) with lower free energy to candidate sites. This indicates that randomized let-7 sequences may bind strongly to their respective targets. Figure 3: Free energy of binding values for wild type let-7 and ten randomized let-7 sequences. Dark black line in graph indicates wild type sequence. Colored lines indicate randomized sequences. The ten randomized sequences are shown above with conserved regions (between them) highlighted.* *Conservation determined with Clustal Alignment [17] Analyses of lin-4 Binding Interactions Thermodynamics of binding between lin-4 and its target sequences revealed values between -105 kJ/mol and -18 kJ/mol (figure 4). The curve correlating free energy and UTR entry is quite a bit steeper for lin-4 than for any of the random sequences. Still, the 40 strongest target sites have free energy values of binding that are on average 35 kJ/mol lower than those of the scrambled sequences. 7 Candidate lin-4 target sequence 0 0 100 200 300 400 500 -20 random3 -40 Sum of dG lin-4 random3 random5 random4 -60 random2 random1 lin-4 -80 -100 -120 Figure 4: Free energy of binding values for wild type lin-4 and ten randomized let-7 sequences. Dark black line in graph indicates wild type sequence. Colored lines indicate randomized sequences. The ten randomized sequences are shown above with conserved regions (between them) highlighted.* *Conservation determined with Clustal Alignment [17] The top lin-4 target site was the transcript of the lin-14 gene, a heterochronic gene with developmental roles. Lin-4 binding sites on the lin-14 UTR are shown in figure 5 [7]. 8 Figure 5: lin-4 is complimentary to 7 sites in the lin-14 3’ UTR.* *Figure courtesy of Diya, Et Al [7]. 9 References 1. Mcmanus, M. & Sharp, P.A. Gene silencing in mammals by short interfering RNAs. Nature Reviews 3, 737-47 (2002). 2. Lau, N., Lim, L., Weinstein, E. & Bartel, D. An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans. Science 294, 858-62 (2001). 3. Lee, R. & Ambros, V. An Extensive Class of Small RNAs in Caenorhabditis elegans. Science 294, 862-4 (2001). 4. Nature Publishing Group. Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation. Nature Genetics 30, 363-4 (2002). 5. Rhoades, M., Reinhart, B., Lim, L., Burge, C., Bartel, B. & Bartel, D. Prediction of Plant MicroRNA Targets. Cell DOI: 10.1016/S0092867402008632. 6. Reinhart, B., Weinstein, E., Rhoades, M., Bartel, B. & Bartel, D. MicroRNAs in plants. Genes and Development 16, 1616-26 (2002). 7. Banerjee, D. & Slack, F. Control of developmental timing by small temporal RNAs: A paradigm of RNA-mediated regulation of gene expression. Bioessays 24 119-129 (2002). 8. Ross Overbeek's Patscan (http://wwwunix.mcs.anl.gov/compbio/PatScan/HTML/patscan.html 9. Dsouza, M., Larsen, N., Overbeek, R. Searching for patterns in genomic data. Trends Genet 12 497-8 (1997). 10. Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C. & Saccone, C. UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Research 30, 335-40 (2002). 11. UTR database (http://bighost.area.ba.cnr.it/BIG/UTRHome/). 12. Jacobson, A., Kumar, H. & Zuker, M. Effect of Spermidine on the Conformation of Bacteriophage MS2 RNA: Electron Microscopy and Computer Modeling. J. Mol. Biology 181, 517-531 (1985). 13. Michael Zuker's mfold (12) Web Server. (http://www.bioinfo.rpi.edu/applications/mfold/). 14. Grishok, A., Pasquinelli, A., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D., Fire, A., Ruvkun, G. & Mello, C. Genes and mechanisms related to RNA interference regulate 10 expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106, 23-34 (2004). 15. Pasquinelli, A., Reinhart, B., Slack, F., Martindale, M., Kuroda, M., Maller, B., Hayward, D., Ball, E., Degnan, B., Muller, P., Spring, J., Srinivasan, A., Fishman, M., Finnerty, J., Corbo, J., Levine, M., Leahy, P., Davidson, E. & Ruvkun, G. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature 408, 86-9 (2000). 16. Christainsen, Tom and Nathan Torkington. Perl Cookbook. Cambridge: O'Reilly, 1998. 17. Clustal Alignment Tool 11