Southern Connecticut State University
Dept. of Biology, New Haven CT, USA
Phages are viruses that attack bacteria. They are everywhere and represent an amazing amount of biomass on the planet earth. In fact, it is estimated that there are 10 30 world ’s oceans alone. viruses in the
Mycobacteriophages specifically attack mycobacteria, which includes the important human pathogens that cause leprosy
(Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis), as well as the harmless Mycobacterium smegmatis (M. smeg).
According to the WHO: tuberculosis is second largest killer (after HIV) by infection of a single infectious agent, and one third of the entire Earth ’s human population is infected with latent tuberculosis.
To date, over 500 mycobacteriophage genomes have been sequenced mostly through the HHMI Science Education
Alliance ’s Phage Hunters Advancing Genomics & Evolutionary
Science program (HHMI SEA-PHAGES) in conjunction with Dr.
Graham Hatfull ’s University of Pittsburgh laboratory
(phagesdb.org)
(Pope et al., 2011) .
About 90% of highly conserved mycobacteriophage genes
“phamilies” have no known function!
Let ’s use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations, and determine the temporal pattern of gene expression during infection.
Use ‘Dual RNA-Seq’ to validate mycobacteriophage gene annotations.
Determine the temporal pattern of gene expression of the mycobacteriophage during infection.
Determine the temporal pattern of gene expression of the host M. smegmatis.
Identify the ‘repressor’ gene of temperate phage by analyzing a lysogen.
Croucher, N.J., and Thomson, N.R. (2010).
Be able to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system.
Understand the advantages of using NGS technologies (including RNA-Seq) to elucidate gene expression patterns.
Be able to perform an analytic pipeline in a Galaxy environment in order to discover gene expression patterns in a ‘dual RNA-Seq’ experiment.
Be able to perform and understand the statistical implications of RNA-Seq experiments.
Ability to apply the process of science:
Perform the analysis of a dual RNA-Seq dataset.
Ability to use quantitative reasoning:
Perform quantitative analysis and apply mathematical reasoning to the analysis of a RNA-
Seq dataset.
Ability to use modeling and simulation:
Be able to explain the complex systems that regulate host-phage interactions
Be able to run simulations of RNA-Seq datasets, and observe the effects of modifying program parameters
Ability to tap into the interdisciplinary nature of science:
NGS technologies represent an interdisciplinary science that intersects with physics, computer science, engineering, statistical inference, and information science.
Ability to communicate and collaborate with other disciplines:
Collaborate to identify the gene expression patterns of a phage and its host, and present the data to their peers
Ability to understand the relationship between science and society:
Understand that bacteriophage profoundly affect global ecosystems, can be used to treat human diseases, and can produce useful biotechnological tools for the benefit of humans.
The organism is the Mycobacterium smegmatis mc 2 155 +/- phage infection
The Mycobacterium smegmatis mc 2 155 genome is a single circular chromosome of 6.99Mb with 6,742 genes, and
The ABCat phage genome is 76.131Kb with 145 predicted genes.
Samples would be pelleted and resuspended in the RNeasy Mini Kit
(Qiagen). The suggested kit for rRNA depletion is RiboZero for Gram + bacteria (Epicentre).
Need to get around 200 million reads/sample, with around 160 million reads coming from the host (M.smeg.), and depending on time point, between 0.3-3 million reads from the bacteriophage genome (therefore the phage transcripts would represent
~0.2-2% of the reads).
generate single-end libraries from the TruSeq Illumina kit, without multiplexing.
Internet connection, Mac OS, Linux (Ubuntu is nice)
Web browser (excluding IE)
Computer programs:
Galaxy (and pre-compiled RNA-Seq/NGS tools) public, local, or Amazon EC2 instance url: ‘usegalaxy.org’
R stats (if using BioConductor NGS packages)
Python 2.7 (if using ‘bcbio-nextgen’ or
‘biopython’ modules)
Pre- and post-tests for understanding of statistical methods, calculations, considerations comprehension of RNA-Seq methodology (wet-lab techniques and in silico analysis ability to explain host-pathogen interactions & mechanisms in a ‘simple’ bacteria-phage system.
Assessment of student confidence in using NGS computational tools and in navigating in a Linux environment.
WEEK 1:
Learn to navigate and use “usegalaxy.org” (ie Galaxy) to create a workflow for the analysis of dual RNA-Seq data from Mycobacterium smegmatis mc2 155 and a mycobacteriophage.
Import RNA-Seq datasets that will be received from GCAT-SEEK sequencing facility in late summer into Galaxy.
Convert Genbank files for Mycobacterium smegmatis mc2 155, and the selected
Myccobacteriophage to a GFF file format using the Rätsch lab’s Galaxy Instance or use the “bcbio-nextgen” Python module.
Import “GTF” or “GFF3” formatted reference genomes for Mycobacterium smegmatis mc2 155
Import a fasta file of the genomic sequence of the mycobacteriophage ABCcat.
WEEKS 2-4:
Use the Galaxy RNA-Seq tools to maps reads to the two reference genomes.
bowtie, cufflinks, rsem
Identify genes that are differentially expressed
ELSI of NGS technologies
Bacterial Host-Pathogen interactions
Bacteriophage replication mechanisms
Lytic versus Lysogenic lifestyles
The connection between mycobacteriophage genome architecture and temporal gene expression patterns
Determining gene phylogeny through sequence comparisons using bioinformatic tools
Croucher, N.J., and Thomson, N.R. (2010). Studying bacterial transcriptomes using RNA-seq. Curr Opin
Microbiol 13, 619 –624.
Dedrick, R.M., Marinelli, L.J., Newton, G.L., Pogliano, K., Pogliano, J., and Hatfull, G.F. (2013). Functional requirements for bacteriophage growth: gene essentiality and expression in mycobacteriophage Giles. Mol
Microbiol.
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I.,
Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451 –
1455.
Goecks, J., Nekrutenko, A., Taylor, J., Galaxy Team (2010). Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11, R86.
Haas, B.J., Chin, M., Nusbaum, C., Birren, B.W., and Livny, J. (2012). How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes? BMC Genomics 13, 734.
Haas, B.J., and Zody, M.C. (2010). Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421 –423.
Hatfull, G.F. (2012). The secret lives of mycobacteriophages. Adv. Virus Res. 82, 179 –288.
Henry, M., and Debarbieux, L. (2012). Tools from viruses: bacteriophage successes and beyond. Virology 434,
151 –161.
Jacobs-Sera, D., Marinelli, L.J., Bowman, C., Broussard, G.W., Guerrero Bustamante, C., Boyle, M.M., Petrova,
Z.O., Dedrick, R.M., Pope, W.H., Science Education Alliance Phage Hunters Advancing Genomics And
Evolutionary Science Sea-Phages Program, et al. (2012). On the nature of mycobacteriophage diversity and host preference. Virology.
Pope, W.H., Jacobs-Sera, D., Russell, D.A., Peebles, C.L., Al-Atrache, Z., Alcoser, T.A., Alexander, L.M., Alfano,
M.B., Alford, S.T., Amy, N.E., et al. (2011). Expanding the diversity of mycobacteriophages: insights into genome architecture and evolution. PLoS ONE 6, e16329.
Soneson, C., and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinformatics 14, 91
Westermann, A.J., Gorski, S.A., and Vogel, J. (2012). Dual RNA-seq of pathogen and host. Nat. Rev. Microbiol.
10, 618 –630.