RNA-Seq analysis of Mycobacterium smegmatis and its phage

advertisement

RNA-Seq analysis of Mycobacterium smegmatis and its phage pathogen during infection

Nicholas Edgington, Ph.D.

Southern Connecticut State University

Dept. of Biology, New Haven CT, USA

RNA-Seq analysis of Mycobacterium smegmatis and its phage pathogen during infection

Phages are viruses that attack bacteria. They are everywhere and represent an amazing amount of biomass on the planet earth. In fact, it is estimated that there are 10 30 world ’s oceans alone. viruses in the

Mycobacteriophages specifically attack mycobacteria, which includes the important human pathogens that cause leprosy

(Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis), as well as the harmless Mycobacterium smegmatis (M. smeg).

According to the WHO: tuberculosis is second largest killer (after HIV) by infection of a single infectious agent, and one third of the entire Earth ’s human population is infected with latent tuberculosis.

Background

To date, over 500 mycobacteriophage genomes have been sequenced mostly through the HHMI Science Education

Alliance ’s Phage Hunters Advancing Genomics & Evolutionary

Science program (HHMI SEA-PHAGES) in conjunction with Dr.

Graham Hatfull ’s University of Pittsburgh laboratory

(phagesdb.org)

(Pope et al., 2011) .

About 90% of highly conserved mycobacteriophage genes

“phamilies” have no known function!

Let ’s use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations, and determine the temporal pattern of gene expression during infection.

Module Research Goals

Use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations.

Determine the temporal pattern of gene expression of the mycobacteriophage during infection.

Determine the temporal pattern of gene expression of the host M. smegmatis.

Identify the ‘repressor’ gene of temperate phage by analyzing a lysogen.

Croucher, N.J., and Thomson, N.R. (2010).

Student Learning Goals

Be able to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system.

Understand the advantages of using NGS technologies (including RNA-Seq) to elucidate gene expression patterns.

Be able to perform an analytic pipeline in a Galaxy environment in order to discover gene expression patterns in a ‘dual RNA-Seq’ experiment.

Be able to perform and understand the statistical implications of RNA-Seq experiments.

Vision and Change Core Competencies

Ability to apply the process of science:

Perform the analysis of a dual RNA-Seq dataset.

Ability to use quantitative reasoning:

Perform quantitative analysis and apply mathematical reasoning to the analysis of a RNA-

Seq dataset.

Ability to use modeling and simulation:

Be able to explain the complex systems that regulate host-phage interactions

Be able to run simulations of RNA-Seq datasets, and observe the effects of modifying program parameters

Ability to tap into the interdisciplinary nature of science:

NGS technologies represent an interdisciplinary science that intersects with physics, computer science, engineering, statistical inference, and information science.

Ability to communicate and collaborate with other disciplines:

Collaborate to identify the gene expression patterns of a phage and its host, and present the data to their peers

Ability to understand the relationship between science and society:

Understand that bacteriophage profoundly affect global ecosystems, can be used to treat human diseases, and can produce useful biotechnological tools for the benefit of humans.

GCAT-SEEK sequencing requirements

The organism is the Mycobacterium smegmatis mc 2 155 +/- phage infection

The Mycobacterium smegmatis mc 2 155 genome is a single circular chromosome of 6.99Mb with 6,742 genes, and

The ABCat phage genome is 76.131Kb with 145 predicted genes.

Samples would be pelleted and resuspended in the RNeasy Mini Kit

(Qiagen). The suggested kit for rRNA depletion is RiboZero for Gram + bacteria (Epicentre).

Need to get around 200 million reads/sample, with around 160 million reads coming from the host (M.smeg.), and depending on time point, between 0.3-3 million reads from the bacteriophage genome (therefore the phage transcripts would represent

~0.2-2% of the reads).

generate single-end libraries from the TruSeq Illumina kit, without multiplexing.

Computer/program requirements for data analysis

Internet connection, Mac OS, Linux (Ubuntu is nice)

Web browser (excluding IE)

Computer programs:

Galaxy (and pre-compiled RNA-Seq/NGS tools) public, local, or Amazon EC2 instance url: ‘usegalaxy.org’

R stats (if using BioConductor NGS packages)

Python 2.7 (if using ‘bcbio-nextgen’ or

‘biopython’ modules)

Student Assessments

Pre- and post-tests for understanding of statistical methods, calculations, considerations comprehension of RNA-Seq methodology (wet-lab techniques and in silico analysis ability to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system.

Assessment of student confidence in using NGS computational tools and in navigating in a Linux environment.

Timeline

WEEK 1:

Learn to navigate and use “usegalaxy.org” (ie Galaxy) to create a workflow for the analysis of dual RNA-Seq data from Mycobacterium smegmatis mc2 155 and a mycobacteriophage.

Import RNA-Seq datasets that will be received from GCAT-SEEK sequencing facility in late summer into Galaxy.

Convert Genbank files for Mycobacterium smegmatis mc2 155, and the selected

Myccobacteriophage to a GFF file format using the Rätsch lab’s Galaxy Instance or use the “bcbio-nextgen” Python module.

Import “GTF” or “GFF3” formatted reference genomes for Mycobacterium smegmatis mc2 155

Import a fasta file of the genomic sequence of the mycobacteriophage ABCcat.

WEEKS 2-4:

Use the Galaxy RNA-Seq tools to maps reads to the two reference genomes.

bowtie, cufflinks, rsem

Identify genes that are differentially expressed

Discussion & Lecture Topics

ELSI of NGS technologies

Bacterial Host-Pathogen interactions

Bacteriophage replication mechanisms

Lytic versus Lysogenic lifestyles

The connection between mycobacteriophage genome architecture and temporal gene expression patterns

Determining gene phylogeny through sequence comparisons using bioinformatic tools

References

Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNA-seq. Curr Opin

Microbiol 13, 619 –624.

Dedrick, R.M., Marinelli, L.J., Newton, G.L., Pogliano, K., Pogliano, J., and Hatfull, G.F. (2013). Functional requirements for bacteriophage growth: gene essentiality and expression in mycobacteriophage Giles. Mol

Microbiol.

Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I.,

Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451 –

1455.

Goecks, J., Nekrutenko, A., Taylor, J., Galaxy Team (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11, R86.

Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734.

Haas, B.J., and Zody, M.C. (2010). Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421 –423.

Hatfull, G.F. (2012). The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179 –288.

Henry, M., and Debarbieux, L. (2012). Tools from viruses: bacteriophage successes and beyond. Virology 434,

151 –161.

Jacobs-Sera, D., Marinelli, L.J., Bowman, C., Broussard, G.W., Guerrero Bustamante, C., Boyle, M.M., Petrova,

Z.O., Dedrick, R.M., Pope, W.H., Science Education Alliance Phage Hunters Advancing Genomics And

Evolutionary Science Sea-Phages Program, et al. (2012). On the nature of mycobacteriophage diversity and host preference. Virology.

Pope, W.H., Jacobs-Sera, D., Russell, D.A., Peebles, C.L., Al-Atrache, Z., Alcoser, T.A., Alexander, L.M., Alfano,

M.B., Alford, S.T., Amy, N.E., et al. (2011). Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS ONE 6, e16329.

Soneson, C., and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14, 91

Westermann, A.J., Gorski, S.A., and Vogel, J. (2012). Dual RNA-seq of pathogen and host. Nat. Rev. Microbiol.

10, 618 –630.

Download