An Introduction to Next Generation Sequencing and Metagenomics

Spring 2015
BIOL 312: Microbiology
A Town on Fire
Metagenomic Analysis of Bacterial
Communities in Soils Overlying the
Centralia, Pennsylvania Mine Fire
Instructor: Dr. Tammy Tobin
Susquehanna University
E-Mail: tobinjan@susqu.edu
Overview
In 1962, a surface trash fire ignited an anthracite coal seam in an abandoned strip mine in Centralia, Pennsylvania. Repeated
efforts to extinguish the fire failed, and in 1984 Congress responded to the resulting high carbon monoxide levels and frequent land
collapses by allocating more than $42 million for relocation efforts. Most of the residents have long since moved, and their homes
have been demolished, leaving behind a ghost town
where a coal mining community once thrived (Fig 1).
Figure 1: Above: Centralia, PA prior to the evacuation in 1984. The town
had over 1800 residents, several businesses and churches. Right: Old Route 61
through Centralia (taken in 1997) showing steam, rich in carbon monoxide,
venting upward through cracks caused by land collapses.
As a result of this mine fire, surface soil temperatures in affected areas
regularly exceed 60°C and soils surrounding the vents are often rich in
combustion products such as sulfur and nitrogen that microbial communities
can use and transform as a part of their energy-generating processes.
In this case study, you will use information in papers that describe typical
geothermal soils and their microbial communities to hypothesize a single bacterial genus that you would expect to find living in
Centralia’s fire-affected soils. You will use metagenomic analysis to test your hypothesis and then make a presentation that reports
your findings and predicts the types of impacts that members of your genus might be having on the Centralia ecosystem.
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia,
Pennsylvania Mine Fire
1
Goals
Materials
As a result of participation in these activities, students will be able to:
Recommended Readings:
A Metagenomics Primer
1.
Explain each step in the generation and analysis of Next Gen
metagenomic 16S rRNA sequence data.
2.
Discuss the basic biology assumptions that underlie sequence analysis
(e.g. evolution, structure and function, conservation = function).
3.
Evaluate the strengths and weaknesses of the methods employed in Next
Gen sequencing, including the impact that data quality has on
bioinformatics analysis.
4.
Choose and justify the appropriate methods for a specific Next Gen
sequencing application.
5.
Apply Next Gen sequencing methodologies to solve their own research
questions.
Evaluation
The final evaluation of this project will be based on the successful completion of
Team Application Activities and the Final Presentation.
Computer Resources:
Quantitative Insights into Microbial
Ecology (QIIME) for Macs.
Students may also use the Windows
version of QIIME, but must also
install Virtual Box to run the program.
Installation instructions for both
platforms can be found at the QIIME
website at: http://qiime.org/
Metagenomics Sequence Resources:
Centralia Metagenomics files Cen95
and Cen125 are available through the
GCAT-SEEK consortium at
http://lycofs01.lycoming.edu/~gcatseek/index.html
Team Application Activities:
Activity #1
Students will learn about the history
and biogeochemistry of the Centralia
Mine Fire environment and will take
the GCAT SEEK pre-test.
Activity #2
Students will work in teams in order
to familiarize themselves with
metagenomics, LINUX and QIIME,
and will propose hypotheses regarding
the types of microbial species they
expect to see in thermophilic versus
mesophilic soils in Centralia.
Activity #3
Figure 2: Steam from “Anthracite Smokers” in Centralia, PA carries dissolved
combustion products, such as nitrogen and sulfur, to the surface through soil
fractures. As the steam rises it cools and precipitates chemicals into the
surrounding soils where they can be utilized and transformed by nitrogen and
sulfur-cycling bacterial communities.
Students will use QIIME to test their
hypotheses.
Activity #4
Students will complete their QIIME
analysis and begin to prepare their
presentations.
Final Presentation
Each student team will present their
metagenomic findings.
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia,
Pennsylvania Mine Fire
2
An Introduction to Next Generation Sequencing and Metagenomics
Next Generation Sequencing and Pyrosequencing
“Next generation (Next Gen) sequencing” is a term that encompasses a variety of DNA sequencing technologies, all of
which have a common core approach: they use DNA polymerase to generate thousands or millions of relatively short
(compared to traditional sequencing technologies) sequences of a DNA template concurrently. Thus, these sequencing
technologies are often referred to as being ‘massively parallel’. They then differ in the manner in which they determine
when (and which) base is added to the replicating DNA (that is, in how they actually “read” the sequence). For example,
Ion Torrent sequencing uses the tiny pH change that happens each time a new phosphodiester bond is created to
determine whether or not a particular base was added.
The data that we will be using was generated using a technology called ‘pyrosequencing’ (Figure 3). In this technology, a
library is first made by either fractionating genomic DNA into smaller fragments (300-800 base pairs) or, as in our case, a
specific gene from an environment can be amplified using PCR, and all of the copies of that gene serve as the template
DNA. Short adaptors (shown as A and B in the figure) are then ligated onto the ends of the template fragments. The first
adaptor is used to attach the DNA fragments
onto streptavidin-coated beads. The second
primer is used for amplification and sequencing
of the fragments. The DNA library is then
treated to make it single-stranded, and
immobilized onto the beads at a dilution that
ensures that each bead contains only a single,
unique DNA fragment (top row, left hand side)
The bead-bound library is emulsified with PCR
reagents in a water-in-oil mixture. Each bead is
captured within its own microreactor where
PCR amplification occurs. This results in
approximately 10 million copies of a single
sequence (top row) attached to each bead.
The beads, containing the amplified, singlestranded, template DNA library, are then
added to individual wells of a PicoTiterPlate
(center row) that contains the DNA
Figure 3: Pyrosequencing
polymerase, sulfurylase and luciferase enzymes.
The latter two enzymes will make a flash of
light if DNA polymerase successfully adds a base to the growing end of a daughter DNA strand during the sequencing
reactions. (bottom row).
The loaded PicoTiterPlate device is placed into the sequencer, which floods all of the wells (each well, as you remember,
has a different DNA fragment in it) with sequencing reagents containing buffers, primers and one of the bases. Let’s say G
is added to all of the wells first. Since each well has a unique piece of DNA, the G will be complimentary to the first base
of the template DNA in some (but not all) of the wells. Thus, DNA polymerase will only add it to the growing daughter
strand in those (complementary) wells. Multiple G’s will be added at this time if the template strand has more than one C
in a row (e.g. CCC in positions 1, 2 and 3). Addition of one or more nucleotide(s) generates a flash of light, as previously
described. The signal strength of the flash is proportional to the number of nucleotides added, so a GGG sequence will
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia,
Pennsylvania Mine Fire
3
have a light signal three times as bright as a single G. If the base that is added is not complimentary to the template strand
no light will be generated.
When an entire plate is flooded with the sequencing reagents in this manner, some of the wells will glow and some will
not. The sequencer can detect the light flashes, and will record which of the wells incorporated a G. The wells are then
washed, the next base is added (either A, T or C) and the whole process is repeated, sequentially. After each addition, the
sequencer ‘reads’ which wells incorporated the new base. This process is then repeated many times, ultimately generating
short (up to several hundred base pair) sequences of all of the unique fragments in all of the wells at the same
time…massively parallel, indeed!
Metagenomic Analysis of Bacterial 16S rRNA genes (Much of this content was pirated
shamelessly (but with permission) from Regina Lamandella, Juniata College)
The term ‘metagneomics’ was originally coined by Jo Handelsman in the late 1990s and is currently defined as “the
application of modern genomics techniques to the study of microbial communities directly in their natural environments”.
Metagenomics analyses allow microbiologists to tap into the vast, uncultured/unculturable microbial diversity of our
world. Recently, massively parallel next generation sequencing has become cost-effective and informative, allowing
taxonomic profiling of microbial communities, and leading to consortia such as the Earth Microbiome Project, the
Hospital Microbiome Project, the Human Microbiome Project and others that are tasked with uncovering the distribution
of microorganisms within us and in our world.
The rRNA operon contains genes that encode structural and functional portions of the ribosome (Figure 4). This operon
contains both highly conserved sequences that can be used to design ‘universal’ and taxon-specific PCR primers and highly
variable regions that simultaneously allow researchers to distinguish between taxa. Within this operon, the small subunit
RNA (16S rRNA) gene has been particularly valuable for phylogenetic analysis. A vast amount of sequence data for this
gene exists in a variety of international databases, and this data can be used to design phylogenetically conserved probes
that target both individual and closely related groups of microorganisms without cultivation. Some of the most well
curated databases of 16S rRNA sequences include Greengenes, the Ribosomal Database Project, and ARB-Silva (see
references section for links to these databases).
Figure 4. Structure of the rRNA operon in bacteria. Figure from Principles of Biochemistry 4th Edition Pearson Prentice
Hall Inc. 2006.
In preparation for this case study, soil was collected from 3 boreholes in Centralia, PA (37°C, 52°C and 60°C), and
genomic DNA was directly isolated from the samples using the MoBio Powersoil Kit. PCR with universal bacterial16S
rRNA primers was then used to make copies of all of the bacterial 16S rRNA genes in each of these samples. These PCR
products were then used as the template for Roche 454 pyrosequencing at the Penn State University genomics lab. You
will be using this data to test hypotheses regarding the types of bacteria that live in the hot soils overlying the Centralia, PA
mine fire. But first, you must learn a bit about the program that you will be using to perform the analyses.
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia,
Pennsylvania Mine Fire
4