Spring 2015 BIOL 312: Microbiology A Town on Fire Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire Instructor: Dr. Tammy Tobin Susquehanna University E-Mail: tobinjan@susqu.edu Overview In 1962, a surface trash fire ignited an anthracite coal seam in an abandoned strip mine in Centralia, Pennsylvania. Repeated efforts to extinguish the fire failed, and in 1984 Congress responded to the resulting high carbon monoxide levels and frequent land collapses by allocating more than $42 million for relocation efforts. Most of the residents have long since moved, and their homes have been demolished, leaving behind a ghost town where a coal mining community once thrived (Fig 1). Figure 1: Above: Centralia, PA prior to the evacuation in 1984. The town had over 1800 residents, several businesses and churches. Right: Old Route 61 through Centralia (taken in 1997) showing steam, rich in carbon monoxide, venting upward through cracks caused by land collapses. As a result of this mine fire, surface soil temperatures in affected areas regularly exceed 60°C and soils surrounding the vents are often rich in combustion products such as sulfur and nitrogen that microbial communities can use and transform as a part of their energy-generating processes. In this case study, you will use information in papers that describe typical geothermal soils and their microbial communities to hypothesize a single bacterial genus that you would expect to find living in Centralia’s fire-affected soils. You will use metagenomic analysis to test your hypothesis and then make a presentation that reports your findings and predicts the types of impacts that members of your genus might be having on the Centralia ecosystem. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 1 Goals Materials As a result of participation in these activities, students will be able to: Recommended Readings: A Metagenomics Primer 1. Explain each step in the generation and analysis of Next Gen metagenomic 16S rRNA sequence data. 2. Discuss the basic biology assumptions that underlie sequence analysis (e.g. evolution, structure and function, conservation = function). 3. Evaluate the strengths and weaknesses of the methods employed in Next Gen sequencing, including the impact that data quality has on bioinformatics analysis. 4. Choose and justify the appropriate methods for a specific Next Gen sequencing application. 5. Apply Next Gen sequencing methodologies to solve their own research questions. Evaluation The final evaluation of this project will be based on the successful completion of Team Application Activities and the Final Presentation. Computer Resources: Quantitative Insights into Microbial Ecology (QIIME) for Macs. Students may also use the Windows version of QIIME, but must also install Virtual Box to run the program. Installation instructions for both platforms can be found at the QIIME website at: http://qiime.org/ Metagenomics Sequence Resources: Centralia Metagenomics files Cen95 and Cen125 are available through the GCAT-SEEK consortium at http://lycofs01.lycoming.edu/~gcatseek/index.html Team Application Activities: Activity #1 Students will learn about the history and biogeochemistry of the Centralia Mine Fire environment and will take the GCAT SEEK pre-test. Activity #2 Students will work in teams in order to familiarize themselves with metagenomics, LINUX and QIIME, and will propose hypotheses regarding the types of microbial species they expect to see in thermophilic versus mesophilic soils in Centralia. Activity #3 Figure 2: Steam from “Anthracite Smokers” in Centralia, PA carries dissolved combustion products, such as nitrogen and sulfur, to the surface through soil fractures. As the steam rises it cools and precipitates chemicals into the surrounding soils where they can be utilized and transformed by nitrogen and sulfur-cycling bacterial communities. Students will use QIIME to test their hypotheses. Activity #4 Students will complete their QIIME analysis and begin to prepare their presentations. Final Presentation Each student team will present their metagenomic findings. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 2 An Introduction to Next Generation Sequencing and Metagenomics Next Generation Sequencing and Pyrosequencing “Next generation (Next Gen) sequencing” is a term that encompasses a variety of DNA sequencing technologies, all of which have a common core approach: they use DNA polymerase to generate thousands or millions of relatively short (compared to traditional sequencing technologies) sequences of a DNA template concurrently. Thus, these sequencing technologies are often referred to as being ‘massively parallel’. They then differ in the manner in which they determine when (and which) base is added to the replicating DNA (that is, in how they actually “read” the sequence). For example, Ion Torrent sequencing uses the tiny pH change that happens each time a new phosphodiester bond is created to determine whether or not a particular base was added. The data that we will be using was generated using a technology called ‘pyrosequencing’ (Figure 3). In this technology, a library is first made by either fractionating genomic DNA into smaller fragments (300-800 base pairs) or, as in our case, a specific gene from an environment can be amplified using PCR, and all of the copies of that gene serve as the template DNA. Short adaptors (shown as A and B in the figure) are then ligated onto the ends of the template fragments. The first adaptor is used to attach the DNA fragments onto streptavidin-coated beads. The second primer is used for amplification and sequencing of the fragments. The DNA library is then treated to make it single-stranded, and immobilized onto the beads at a dilution that ensures that each bead contains only a single, unique DNA fragment (top row, left hand side) The bead-bound library is emulsified with PCR reagents in a water-in-oil mixture. Each bead is captured within its own microreactor where PCR amplification occurs. This results in approximately 10 million copies of a single sequence (top row) attached to each bead. The beads, containing the amplified, singlestranded, template DNA library, are then added to individual wells of a PicoTiterPlate (center row) that contains the DNA Figure 3: Pyrosequencing polymerase, sulfurylase and luciferase enzymes. The latter two enzymes will make a flash of light if DNA polymerase successfully adds a base to the growing end of a daughter DNA strand during the sequencing reactions. (bottom row). The loaded PicoTiterPlate device is placed into the sequencer, which floods all of the wells (each well, as you remember, has a different DNA fragment in it) with sequencing reagents containing buffers, primers and one of the bases. Let’s say G is added to all of the wells first. Since each well has a unique piece of DNA, the G will be complimentary to the first base of the template DNA in some (but not all) of the wells. Thus, DNA polymerase will only add it to the growing daughter strand in those (complementary) wells. Multiple G’s will be added at this time if the template strand has more than one C in a row (e.g. CCC in positions 1, 2 and 3). Addition of one or more nucleotide(s) generates a flash of light, as previously described. The signal strength of the flash is proportional to the number of nucleotides added, so a GGG sequence will Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 3 have a light signal three times as bright as a single G. If the base that is added is not complimentary to the template strand no light will be generated. When an entire plate is flooded with the sequencing reagents in this manner, some of the wells will glow and some will not. The sequencer can detect the light flashes, and will record which of the wells incorporated a G. The wells are then washed, the next base is added (either A, T or C) and the whole process is repeated, sequentially. After each addition, the sequencer ‘reads’ which wells incorporated the new base. This process is then repeated many times, ultimately generating short (up to several hundred base pair) sequences of all of the unique fragments in all of the wells at the same time…massively parallel, indeed! Metagenomic Analysis of Bacterial 16S rRNA genes (Much of this content was pirated shamelessly (but with permission) from Regina Lamandella, Juniata College) The term ‘metagneomics’ was originally coined by Jo Handelsman in the late 1990s and is currently defined as “the application of modern genomics techniques to the study of microbial communities directly in their natural environments”. Metagenomics analyses allow microbiologists to tap into the vast, uncultured/unculturable microbial diversity of our world. Recently, massively parallel next generation sequencing has become cost-effective and informative, allowing taxonomic profiling of microbial communities, and leading to consortia such as the Earth Microbiome Project, the Hospital Microbiome Project, the Human Microbiome Project and others that are tasked with uncovering the distribution of microorganisms within us and in our world. The rRNA operon contains genes that encode structural and functional portions of the ribosome (Figure 4). This operon contains both highly conserved sequences that can be used to design ‘universal’ and taxon-specific PCR primers and highly variable regions that simultaneously allow researchers to distinguish between taxa. Within this operon, the small subunit RNA (16S rRNA) gene has been particularly valuable for phylogenetic analysis. A vast amount of sequence data for this gene exists in a variety of international databases, and this data can be used to design phylogenetically conserved probes that target both individual and closely related groups of microorganisms without cultivation. Some of the most well curated databases of 16S rRNA sequences include Greengenes, the Ribosomal Database Project, and ARB-Silva (see references section for links to these databases). Figure 4. Structure of the rRNA operon in bacteria. Figure from Principles of Biochemistry 4th Edition Pearson Prentice Hall Inc. 2006. In preparation for this case study, soil was collected from 3 boreholes in Centralia, PA (37°C, 52°C and 60°C), and genomic DNA was directly isolated from the samples using the MoBio Powersoil Kit. PCR with universal bacterial16S rRNA primers was then used to make copies of all of the bacterial 16S rRNA genes in each of these samples. These PCR products were then used as the template for Roche 454 pyrosequencing at the Penn State University genomics lab. You will be using this data to test hypotheses regarding the types of bacteria that live in the hot soils overlying the Centralia, PA mine fire. But first, you must learn a bit about the program that you will be using to perform the analyses. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 4