Tanov, Emil : Computational identification of biologically relevant

advertisement
Computational identification of biologically relevant secondary structures in
single-stranded viral genomes
Emil P. Tanov1, Brejnev Muhire2, Gordon W. Harkins1, Darren P. Martin2
1South
African National Bioinformatics Institute, SA Medical Research Council Unit for Bioinformatics,
University of the Western Cape
2Institute
of Infectious Diseases and Molecular Medicine, Computational Biology Group, University of
Cape Town
Abstract
Single-stranded RNA and DNA viral species cause a wide range of diseases in vertebrate, insect and
plant hosts. The most notable being clinical syndromes such as HIV/AIDS, aseptic meningitis,
paralytic poliomyelitis, SARS and hepatitis, amongst other serious health threats. In addition to coding
an array of proteins, viral genomes often contain functionally important secondary structures that are
vital during the various stages of the viral life cycle. Most potential regulatory structures within viral
genomes are currently uncharacterized and this raises the possibility that new categories of RNA
structure-mediated regulation remain to be identified. Despite tremendous advances in traditional
RNA structure determination approaches, such as; RNA crystallography, magnetic nuclear resonance
spectroscopy and chemical modification, they are often still a costly and arduous option.
Computational prediction of biologically functional viral secondary structures provides a relatively
accurate, rapid and cost effective alternative to traditional approaches and recent advances in next
generation sequencing have made abundant viral RNA sequence information readily available.
However, the challenge of how to interpret these data still remains, since viral RNAs are structurally
dynamic and the underlying RNA sequence contains many layers of information. Current prediction
methods focusing on a single minimum free-energy structure may not always identify functionally
relevant structures without additional experimental restraints. More specifically, it is expected that
nucleotide sites which coincide with the locations of biologically functional genomic secondary
structures will present distinct signals of natural selection favouring the maintenance of these
structures.
Here we present a statistically rigorous computational workflow, to predict, identify and quantitatively
rank structural elements according to their biological relevance, using a high-performance computing
cluster [1][2]. Our approach employs a range of peer-reviewed, open-source, publically available
programs and algorithms, as well as in-house developed tests and software, to detect evolutionary
signals which might include; codon usage biases, decreased rates of synonymous substitution, higher
rates of reversion substitutions and increased frequencies of complimentarily co-evolving nucleotide
pairs. These analyses provide a means of ranking the computationally inferred secondary structures
in order of their likely biological relevance, in addition to visualising and examining regions of interest
in more detail.
Due the algorithms’ computationally intensive nature, high-performance parallel computing clusters
are a fundamental prerequisite, to make this type of approach a viable alternative to traditional
laboratory identification of secondary structures.
References
[1] Muhire, B.M., Golden, M., Murrell B., Lefeurve, P., Lett, J.M., Gray, A., Poon, A.Y.F., Ngandu, N.K.,
Tanov, E.P., Semegni, Y., Monjane, A.L., Harkins, G.W., Varsani, A., Shepherd, D., Martin, D.P.
(2014) Evidence of Pervasive Biologically Functional Secondary Structures within the
Genomes of Eukaryotic Single-Stranded DNA viruses. Journal of Virology (88), 1972-89
[2] Cloete, L.J., Tanov E.P., Muhire, B.M., Harkins, G.W., Martin D.P. (2014) The influence of
secondary structure, selection and recombination on rubella virus nucleotide substitution rate
estimates. Virology Journal, doi:10.1186/1743-422X-11-166
emil@sanbi.ac.za 079 097 4060
Download