Part 4 - Susquehanna University

advertisement
Spring 2015
BIOL 312: Microbiology
A Town on Fire
Metagenomic Analysis of Bacterial
Communities in Soils Overlying the
Centralia, Pennsylvania Mine Fire
Instructor: Dr. Tammy Tobin
Susquehanna University
E-Mail: tobinjan@susqu.edu
Team Application Activity #4: Beta Diversity and Phylogenetic
Analysis of Bacterial Communities.
Names of Team Members:
Introduction: During the last class period, you developed a phylogenetic tree from your metagenomics data, and then performed
alpha diversity analyses. Today, you will complete your metagenomics analyses by using beta diversity and phylogenetic analysis to
track community response to environmental parameters.
Beta Diversity
Beta-diversity metrics assess the differences between microbial communities. The fundamental output of these comparisons is a square
matrix where a “distance” or dissimilarity is calculated between every pair of community samples (S1 vs S2, S1 vs S3, and S2 vs S3),
reflecting the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal
Coordinate Analysis (PCoA). Like alpha diversity, there are many possible metrics which can be calculated with the QIIME pipeline - the
full list of options can be found at beta diversity metrics. Here, we will calculate beta diversity between our 3 microbial communities
using the default beta diversity metrics of weighted and unweighted unifrac, which are phylogenetic measures used extensively in recent
microbial community sequencing projects. To perform this analysis, we will perform the following steps:
1.
Rarify OTU table (for more information, refer to single_rarefaction.py)
Analysis
Bacterial
Communities
in during
Soils Overlying
Centralia,
Pennsylvania
Fire the same general 1
This step Metagenomic
is slightly different
fromofthe
rarefication
performed
the alpha the
diversity
analysis,
although Mine
both have
goal. Rarefaction is an ecological approach that is done to remove sample heterogeneity in OTU tables. It allows users to standardize
the data obtained from samples with different sequencing efforts, and to compare the OTU richness of the samples using a
standardized platform. For instance, if one of your samples yielded 10,000 sequence counts, and another yielded only 1,000 counts,
the species diversity within those samples may be much more influenced by the number of sequences than any underlying biology.
The approach of rarefaction is to randomly sample the same number of OTUs from each sample (in this case 1000), and use this data
to compare the communities. The script for this analysis is:
single_rarefaction.py -i output.biom -d 1000 -o single_rarefaction
2.
Compute Beta Diversity (for more information, refer to beta_diversity.py)
Beta-diversity metrics assess the differences between microbial communities. The distance is calculated between pairs of samples (each
sample represents a organismal community). All taxa found in one or both samples are placed on a phylogenetic tree. A branch leading to
taxa from both samples is marked as "shared" and branches leading to taxa which appears only in one sample are marked as "unshared".
The distance between the two samples is then calculated as (the sum of "unshared" branch lengths)/(the sum of all tree branch lengths (=
shared+unshared)), i.e. the fraction of total branch length which is unshared. There are both weighted and unweighted unifrac
measurements, and we will analyze weighted unifrac, which works well for large datasets such as metagenomic data because it accounts
for the relative abundance of each of the taxa within the communities. The script for this analysis is
beta_diversity.py -i output.biom -o beta_diversity -t centralia_assigned_taxonomy/rep_set1_tax_assignments.txt
Both weighted and unweighted unifrac matrices are generated in this analysis. The weighted unifrac matrix will be the basis for later
analysis steps (principal coordinate analysis, in our case).
3.
Generate Principal Coordinates (for more information, refer to principal_coordinates.py)
Principal Coordinate Analysis (PCoA) is a technique that helps to extract and visualize a few highly informative components of
variation from complex, multidimensional data. The principal coordinates can be plotted in two or three dimensions to provide an
intuitive visualization of the data structure and look at differences between the samples, and look for similarities by sample category.
The script for this analysis is
principal_coordinates.py -i beta_diversity/weighted_unifrac_output.txt -o weighted_pc_matrix
4.
Make Preference Files
This script allows you to specify the coloring of the 2D plots. The script is:
make_prefs_file.py -i mapping_centralia.txt -o prefs_file.txt
5.
Generate 2D PCoA plots
The output of this script is an html file containing the 2D plots that will allow you to see if samples from similar environmental
conditions also have similar compositions. This really works best when there are several samples from each environmental
location. The command for this metric is:
make_2d_plots.py -i weighted_pc_matrix -m mapping_centralia.txt -o 2d_plots_weightedd
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire
2
6.
Now you can open the resulting 2d plots folder in your Centralia Case Study folder. You should see an html file. Open
it and you will see 2d PCoA graphs for every metadata category. The first one is ammonia. If you mouse over the colored dots,
you will see that they represent each of our sampling locations. If this metagenomic dataset had replicates from each borehole,
and if ammonia was important to driving the comparative community diversity between these samples, then we would expect to
see all of the samples from S1 clustering together in the same quadrant.
Phylogenetic Analysis
We will now move to the analysis of our phylogenetic trees using Topiary explorer.
(http://topiaryexplorer.sourceforge.net/user_guide/quickstart.html). Your main goal in this analysis is to see if your taxon of interest is
predominantly associated with a particular environmental parameter...and to get a feel for just how massively huge these metagenomic
trees really are.
1.
cd to the te_1.0 folder, and then open the topiary explorer program by typing
javaws topiaryexplorer.jnlp
Java 7 will now open the Topiary Explorer window. Click on the “I accept this risk” button and then open the application.
2. Click on New Project, navigate to the folder containing your tree file (in filtered sequences file...it has the suffix .tre) and open the
file. Tip Data is your taxanomic assignment file (centralia_assigned_taxonomy/rep_set1_tax_assignments.txt), OTU Data is your
output.biom table, Sample Data is your mapping_centralia.txt file
3. Two new windows will open. These will be referred to as the “Tree window” (below) and the “Topiary Explorer window” (shown in
step 4), respectively. Your tree window will appear something like this:
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire
3
4. Drag the “Collapse tree” slider at the top of the new Tree Window to the right to fully uncollapse the tree. The big ugly wedge should
now resolve itself into branches
5. Coloring branches. Next we’ll use metadata to color the tree’s branches. Move to the Tree Window and expand the Branch panel by
clicking the word Branch in the toolbar on the left (which we’ll refer to as the Tree Toolbar). Click the “Color By...” button, then select
Sample Metadata, and select the metadata category that you hypothesized would have the largest impact on the community in your
borehole (Temp, pH, etc.).
By default each option for the
category will be colored
randomly. To change this, switch
to the Topiary Explorer window.
The Color Key toolbar on the left
is used to choose colors on a percategory basis. To change the
color for a given value click the
small colored box, which will
open the “Pick a Color” window.
To sort the values, click on the
header of the color key toolbar with the name of your chosen metadata category. Choose any colors that make you happy, then click the
Apply Colors button. The branches should now be colored by environmental parameter.
Next uncheck the majority coloring checkbox in the Tree Edit Toolbar.
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire
4
6. Labeling Branch Tips. Expand the Nodes panel by clocking on the word Node in the Tree Toolbar, and then click on the label tips
box. Choose Tip. data and Category 1, which is the taxanomic assignment of the OTU.
7. Search for your taxon by typing its exact name (Geobacillus or Firmicutes, etc) in the search box at the bottom right of the program
window. Each time your taxon appear in the tree it will now be highlighted with a red line.
9. In order to study the tree more carefully, you need to focus on smaller subtrees of a larger tree. To view a subtree more closely, right
click (control click) on the root node of the subtree of interest (the one that contains your taxon of interest) as shown to the left and then
click "View Subtree in new Window". Drag the “Collapse tree” slider at the top of the new Tree Window to the right to fully uncollapse
the tree.
10. Is your taxon associated with your particular environmental
parameter? Play around with the branch color designations. Is it
associated with any of the other environmental parameters?
11. To save your modified metadata and trees, click the Save
Project button at the top of the Topiary Explorer window. This
will create a new .tep file that will allow you to pick up where
you’ve left off.
12. To save this view of the tree as a PDF, choose ‘File > Export
Tree Image’ in the Topiary Explorer window. Type a name for the
tree in the ‘Save as...’ field. The extension .pdf will be
automatically added. Just enter a file name - not a path. Click the
Export button, and the PDF will be opened in your default PDF
viewer. For there you can save the file to where ever you’d like in your file system. After clicking the Export button it may take several
seconds before the resulting PDF is generated and opened.
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire
5
Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire
6
Download