+ miRNA Discovery and Prediction Algorithms George Michopoulos + microRNAs What are they? Why do we care about them? How do we discover them? Biological Methods Computational Methods What limitations do these methods have? + What is microRNA? + miRNA structure Small non-coding RNAs ~22-25 bases long Characterized by their hairpin precursors, composed of the mature, the loop, and the star miRNA + miRNA biogenesis Transcribed in the nucleus Pri-miRNA hairpin gets cut by Drosha enzyme The pre-miRNA then either degrades into miRNA naturally, or gets cleaved by the Dicer enzyme Then the miRNA gets bound by an Argonoute protein into a RNAinduced silencing complex Then the complex binds target mRNA and cleaves it + Why do we care? miRNAs regulate protein expression, including those involved in: Cancer – inhibit proteins responsible for controlling proliferation Neural development – links to schizophrenia Cardiac development – linked to cardiomyopathies DNA methylation and histone modification – can alter the expression of target genes + Why do we care? The use of antagomirs, chemically engineered oligonucleotides, could be used as a therapy for such diseases to silence endogenous microRNA Non-coding RNAs account for a significant portion of the genome, so their homology can be used as tool to assess phylogeny + Detection and Discovery Biological Methods: Can use RT-PCR and QPCR for individual miRNAs Can use microarrays to detect multiple miRNAs Computational Methods: Mining deep-sequencing data and using predictive algorithms to detect miRNA characteristics and compare potential sequences to homologs Bentwich et al. (2005) miRAlign: Wang et al. (2005) miRDeep: Friedländer et al. (2008) miRDeep2: Friedländer et al. (2011) + RT-PCR Reverse transcription polymerase chain reaction, not real time PCR (qPCR) Desired RNA is transcribed and the resulting cDNA is amplified using qPCR Is useful for detecting very low copy numbers of RNA molecules; oldest method, non-specific for miRNA + Northern Blotting Measure levels of RNA expression using probes with partial homology This picture shows a northern blot that has detected 4/5 of the shown microRNAs Lower sensitivity, but higher specificity than RTPCR Fewer false positives + Microarray Detection Microarrays first used to detect miRNAs in 2004 by different groups Probes can be developed and then chip can be ordered through companies (Barad et al.) Everything can be developed and put together using aminebinding slides and an array printer (Miska et al.) Incredibly more efficient for large scale discovery, but limited by the need for prior sequence data for probe development + Barad et al. (2004) Took known miRNA sequences Created DNA chips with probes complementary to those sequences Hybridized miRNA samples onto chips Performed Clustering Analysis Use mirMASA to confirm findings Found that the microarray method has a higher sensitivity and specificity than previous miRNA identification methods + Useful Programs: RNAFold RNAFold is an algorithm that is part of the “Vienna Package” Takes in RNA sequences and calculates their minimum free energy structure, outputting the following results: + Useful Programs: ClustalW ClustalW is a multiple local alignment tool that is frequently used to compare homologous sequences across species, or to compare families of genes. Takes in two sequences, does a pairwise alignment, creates a phylogenetic tree, and then uses that to conduct multiple alignment using other sequences + Bentwich et al. (2005) + Bentwich et al. (2005) Scanning the entire human genome identified 11 million hairpins, including 86% of known microRNA precursors. After microarray sampling, the 359 expressed microRNAs were subjected to confirmation by sequencing Successfully cloned and sequenced 89 human microRNA genes that do not appear in the microRNA registry Using UCSC BlastZ alignment and ClustalW, found that fifty three of these are located in two large non-conserved clusters, including one on chromosome 19 that is only expressed in the placenta and was the largest microRNA cluster ever reported. This cluster comprises 43 new predicted microRNAs which all show similarity to a neighboring miRNA family specifically expressed in human embryonic stem cells The other cluster is on the X chromosome and its miRNAs are only expressed in the testis Homology analysis showed that both clusters are conserved only in chimpanzees and possibly rhesus monkeys + miRAlign: Wang et al. (2005) A novel genome-wide computational approach to detect miRNAs in animals based on both sequence and structure alignment Uses RNAfold to test secondary structures, then CLUSTAL to perform pairwise alignment, unique algorithms to confirm the miRNA’s position on the stem-loop, and finally RNAforester to conduct pairwise structure alignment + miRAlign: Wang et al. (2005) miRAlign outperforms BLAST search in both sensitivity and selectivity, and furthermore, nearly all the known miRNAs found by BLAST can also be detected by miRAlign. The average number of false positives is 7.1 for BLAST and 0.9 for miRAlign Algorithm is dependent on pre-existing data to search against, only useful for finding miRNAs that are closely related to previously annotated ones. + miRDeep: Friedländer et al. (2008) Suite of PERL scripts Uses a probabilistic model of miRNA biogenesis to score compatibility of the position and frequency of sequenced RNA with the secondary structure of the miRNA precursor + Algorithm for P(sequence is a precursor) score = log (P(pre | data) / P(bgr | data) The probability of the sequence being a precursor is given by Bayes’ theorem: P(pre | data) = P(data | pre) P(pre) / P(data) P(pre | data) = P(abs | pre) P(rel | pre) P(sig | pre) P(star | pre) P(nuc | pre) P(pre) / P(data) The same holds for the probability of the sequence being a background hairpin: P(bgr | data) = P(data | bgr) P(bgr) / P(data) P(bgr | data) = P(abs | bgr) P(rel | bgr) P(sig | bgr) P(star | bgr) P(nuc | bgr) P(bgr) / P(data) + miRDeep: Friedländer et al. (2008) Of the 555 known human mature miRNA sequences, 213 were present in the data set. Of these, 154 (72%) were successfully recovered by miRDeep. The total estimated number of false positives was 6 ± 2 This pipeline is much more efficient at finding microRNA expression from deep-sequencing than the previous methods + miRDeep2: Friedländer et al. (2011) Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6–99.9% and reported hundreds of novel miRNAs New package include many more options and graphical outputs that make the software more accessible + miRDeep2: Friedländer et al. (2011) + miRDeep2: Friedländer et al. (2011) + miRDeep2: Friedländer et al. (2011) + miRDeep2: Friedländer et al. (2011) Relative to miRDeep1: Performs excision by scanning the genome for stacks of reads, where a stack is one or more reads that map to the exact same 50 and 30 positions in the genome When identifying miRNAs in data from sea squirts, known to harbor large numbers of non-canonical miRNAs, the first version of miRDeep only reports 46 known and 31 novel miRNAs. In contrast, miRDeep2 reports 313 known and 127 novel ones Can detect anti-sense miRNAs (+/-) Supports single or multiple mismatches. Performs substantially better on the human data, reporting 186 known and 36 novel miRNAs (compared to 154 known and 10 novel in the initial publication) More accurate detection of lowly abundant miRNAs Faster; analyzed 30 million RNAs in less than 5 h and with 3 GB memory More intuitive interface for biologists + Beyond miRDeep2 Remaining challenges in identifying and detecting expression levels of miRNA: miRBase, the primary database used as a source for miRNA annotations used today, is for from pristine Hard to tell whether detected novel miRNAs actually have a biological function, will take a lot of biological experimentation until we know that Algorithms still have room for improvement in terms of accessibility and efficiency + Questions? + References Barad, O., Meiri, E., Avniel, A., Aharonov, R., Barzilai, A., Bentwich, I., Einav, U., et al. (2004). MicroRNA expression detected by oligonucleotide microarrays : System establishment and expression profiling in human tissues. Genome Research, 2486-2494. doi:10.1101/gr.2845604.4 Bentwich, I., Avniel, A., Karov, Y., Aharonov, R., Gilad, S., Barad, O., Barzilai, A., et al. (2005). Identification of hundreds of conserved and nonconserved human microRNAs. Online, 37(7), 766-770. doi:10.1038/ng1590 Friedländer, M. R., Chen, W., Adamidi, C., Maaskola, J., Einspanier, R., Knespel, S., & Rajewsky, N. (2008). Discovering microRNAs from deep sequencing data using miRDeep. Nature biotechnology, 26(4), 407-15. doi:10.1038/nbt1394 Friedländer, M. R., Mackowiak, S. D., Li, N., Chen, W., & Rajewsky, N. (2011). miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades. Nucleic acids research, 1-16. doi:10.1093/nar/gkr688 Krüger, J., & Rehmsmeier, M. (2006). RNAhybrid: microRNA target prediction easy, fast and flexible. Nucleic acids research, 34(Web Server issue), W451-4. doi:10.1093/nar/gkl243 Miska, E. a, Alvarez-Saavedra, E., Townsend, M., Yoshii, A., Sestan, N., Rakic, P., Constantine-Paton, M., et al. (2004). Microarray analysis of microRNA expression in the developing mammalian brain. Genome biology, 5(9), R68. doi:10.1186/gb-2004-5-9-r68 Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., & Li, Y. (2005). MicroRNA identification based on sequence and structure alignment. Bioinformatics (Oxford, England), 21(18), 3610-4. doi:10.1093/bioinformatics/bti562