Computational identification of biologically relevant secondary structures in single-stranded viral genomes Emil P. Tanov1, Brejnev Muhire2, Gordon W. Harkins1, Darren P. Martin2 1South African National Bioinformatics Institute, SA Medical Research Council Unit for Bioinformatics, University of the Western Cape 2Institute of Infectious Diseases and Molecular Medicine, Computational Biology Group, University of Cape Town Abstract Single-stranded RNA and DNA viral species cause a wide range of diseases in vertebrate, insect and plant hosts. The most notable being clinical syndromes such as HIV/AIDS, aseptic meningitis, paralytic poliomyelitis, SARS and hepatitis, amongst other serious health threats. In addition to coding an array of proteins, viral genomes often contain functionally important secondary structures that are vital during the various stages of the viral life cycle. Most potential regulatory structures within viral genomes are currently uncharacterized and this raises the possibility that new categories of RNA structure-mediated regulation remain to be identified. Despite tremendous advances in traditional RNA structure determination approaches, such as; RNA crystallography, magnetic nuclear resonance spectroscopy and chemical modification, they are often still a costly and arduous option. Computational prediction of biologically functional viral secondary structures provides a relatively accurate, rapid and cost effective alternative to traditional approaches and recent advances in next generation sequencing have made abundant viral RNA sequence information readily available. However, the challenge of how to interpret these data still remains, since viral RNAs are structurally dynamic and the underlying RNA sequence contains many layers of information. Current prediction methods focusing on a single minimum free-energy structure may not always identify functionally relevant structures without additional experimental restraints. More specifically, it is expected that nucleotide sites which coincide with the locations of biologically functional genomic secondary structures will present distinct signals of natural selection favouring the maintenance of these structures. Here we present a statistically rigorous computational workflow, to predict, identify and quantitatively rank structural elements according to their biological relevance, using a high-performance computing cluster [1][2]. Our approach employs a range of peer-reviewed, open-source, publically available programs and algorithms, as well as in-house developed tests and software, to detect evolutionary signals which might include; codon usage biases, decreased rates of synonymous substitution, higher rates of reversion substitutions and increased frequencies of complimentarily co-evolving nucleotide pairs. These analyses provide a means of ranking the computationally inferred secondary structures in order of their likely biological relevance, in addition to visualising and examining regions of interest in more detail. Due the algorithms’ computationally intensive nature, high-performance parallel computing clusters are a fundamental prerequisite, to make this type of approach a viable alternative to traditional laboratory identification of secondary structures. References [1] Muhire, B.M., Golden, M., Murrell B., Lefeurve, P., Lett, J.M., Gray, A., Poon, A.Y.F., Ngandu, N.K., Tanov, E.P., Semegni, Y., Monjane, A.L., Harkins, G.W., Varsani, A., Shepherd, D., Martin, D.P. (2014) Evidence of Pervasive Biologically Functional Secondary Structures within the Genomes of Eukaryotic Single-Stranded DNA viruses. Journal of Virology (88), 1972-89 [2] Cloete, L.J., Tanov E.P., Muhire, B.M., Harkins, G.W., Martin D.P. (2014) The influence of secondary structure, selection and recombination on rubella virus nucleotide substitution rate estimates. Virology Journal, doi:10.1186/1743-422X-11-166 emil@sanbi.ac.za 079 097 4060