Spring 2015 BIOL 312: Microbiology A Town on Fire Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire Instructor: Dr. Tammy Tobin Susquehanna University E-Mail: tobinjan@susqu.edu Team Application Activity #4: Beta Diversity and Phylogenetic Analysis of Bacterial Communities. Names of Team Members: Introduction: During the last class period, you developed a phylogenetic tree from your metagenomics data, and then performed alpha diversity analyses. Today, you will complete your metagenomics analyses by using beta diversity and phylogenetic analysis to track community response to environmental parameters. Beta Diversity Beta-diversity metrics assess the differences between microbial communities. The fundamental output of these comparisons is a square matrix where a “distance” or dissimilarity is calculated between every pair of community samples (S1 vs S2, S1 vs S3, and S2 vs S3), reflecting the dissimilarity between those samples. The data in this distance matrix can be visualized with analyses such as Principal Coordinate Analysis (PCoA). Like alpha diversity, there are many possible metrics which can be calculated with the QIIME pipeline - the full list of options can be found at beta diversity metrics. Here, we will calculate beta diversity between our 3 microbial communities using the default beta diversity metrics of weighted and unweighted unifrac, which are phylogenetic measures used extensively in recent microbial community sequencing projects. To perform this analysis, we will perform the following steps: 1. Rarify OTU table (for more information, refer to single_rarefaction.py) Analysis Bacterial Communities in during Soils Overlying Centralia, Pennsylvania Fire the same general 1 This step Metagenomic is slightly different fromofthe rarefication performed the alpha the diversity analysis, although Mine both have goal. Rarefaction is an ecological approach that is done to remove sample heterogeneity in OTU tables. It allows users to standardize the data obtained from samples with different sequencing efforts, and to compare the OTU richness of the samples using a standardized platform. For instance, if one of your samples yielded 10,000 sequence counts, and another yielded only 1,000 counts, the species diversity within those samples may be much more influenced by the number of sequences than any underlying biology. The approach of rarefaction is to randomly sample the same number of OTUs from each sample (in this case 1000), and use this data to compare the communities. The script for this analysis is: single_rarefaction.py -i output.biom -d 1000 -o single_rarefaction 2. Compute Beta Diversity (for more information, refer to beta_diversity.py) Beta-diversity metrics assess the differences between microbial communities. The distance is calculated between pairs of samples (each sample represents a organismal community). All taxa found in one or both samples are placed on a phylogenetic tree. A branch leading to taxa from both samples is marked as "shared" and branches leading to taxa which appears only in one sample are marked as "unshared". The distance between the two samples is then calculated as (the sum of "unshared" branch lengths)/(the sum of all tree branch lengths (= shared+unshared)), i.e. the fraction of total branch length which is unshared. There are both weighted and unweighted unifrac measurements, and we will analyze weighted unifrac, which works well for large datasets such as metagenomic data because it accounts for the relative abundance of each of the taxa within the communities. The script for this analysis is beta_diversity.py -i output.biom -o beta_diversity -t centralia_assigned_taxonomy/rep_set1_tax_assignments.txt Both weighted and unweighted unifrac matrices are generated in this analysis. The weighted unifrac matrix will be the basis for later analysis steps (principal coordinate analysis, in our case). 3. Generate Principal Coordinates (for more information, refer to principal_coordinates.py) Principal Coordinate Analysis (PCoA) is a technique that helps to extract and visualize a few highly informative components of variation from complex, multidimensional data. The principal coordinates can be plotted in two or three dimensions to provide an intuitive visualization of the data structure and look at differences between the samples, and look for similarities by sample category. The script for this analysis is principal_coordinates.py -i beta_diversity/weighted_unifrac_output.txt -o weighted_pc_matrix 4. Make Preference Files This script allows you to specify the coloring of the 2D plots. The script is: make_prefs_file.py -i mapping_centralia.txt -o prefs_file.txt 5. Generate 2D PCoA plots The output of this script is an html file containing the 2D plots that will allow you to see if samples from similar environmental conditions also have similar compositions. This really works best when there are several samples from each environmental location. The command for this metric is: make_2d_plots.py -i weighted_pc_matrix -m mapping_centralia.txt -o 2d_plots_weightedd Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 2 6. Now you can open the resulting 2d plots folder in your Centralia Case Study folder. You should see an html file. Open it and you will see 2d PCoA graphs for every metadata category. The first one is ammonia. If you mouse over the colored dots, you will see that they represent each of our sampling locations. If this metagenomic dataset had replicates from each borehole, and if ammonia was important to driving the comparative community diversity between these samples, then we would expect to see all of the samples from S1 clustering together in the same quadrant. Phylogenetic Analysis We will now move to the analysis of our phylogenetic trees using Topiary explorer. (http://topiaryexplorer.sourceforge.net/user_guide/quickstart.html). Your main goal in this analysis is to see if your taxon of interest is predominantly associated with a particular environmental parameter...and to get a feel for just how massively huge these metagenomic trees really are. 1. cd to the te_1.0 folder, and then open the topiary explorer program by typing javaws topiaryexplorer.jnlp Java 7 will now open the Topiary Explorer window. Click on the “I accept this risk” button and then open the application. 2. Click on New Project, navigate to the folder containing your tree file (in filtered sequences file...it has the suffix .tre) and open the file. Tip Data is your taxanomic assignment file (centralia_assigned_taxonomy/rep_set1_tax_assignments.txt), OTU Data is your output.biom table, Sample Data is your mapping_centralia.txt file 3. Two new windows will open. These will be referred to as the “Tree window” (below) and the “Topiary Explorer window” (shown in step 4), respectively. Your tree window will appear something like this: Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 3 4. Drag the “Collapse tree” slider at the top of the new Tree Window to the right to fully uncollapse the tree. The big ugly wedge should now resolve itself into branches 5. Coloring branches. Next we’ll use metadata to color the tree’s branches. Move to the Tree Window and expand the Branch panel by clicking the word Branch in the toolbar on the left (which we’ll refer to as the Tree Toolbar). Click the “Color By...” button, then select Sample Metadata, and select the metadata category that you hypothesized would have the largest impact on the community in your borehole (Temp, pH, etc.). By default each option for the category will be colored randomly. To change this, switch to the Topiary Explorer window. The Color Key toolbar on the left is used to choose colors on a percategory basis. To change the color for a given value click the small colored box, which will open the “Pick a Color” window. To sort the values, click on the header of the color key toolbar with the name of your chosen metadata category. Choose any colors that make you happy, then click the Apply Colors button. The branches should now be colored by environmental parameter. Next uncheck the majority coloring checkbox in the Tree Edit Toolbar. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 4 6. Labeling Branch Tips. Expand the Nodes panel by clocking on the word Node in the Tree Toolbar, and then click on the label tips box. Choose Tip. data and Category 1, which is the taxanomic assignment of the OTU. 7. Search for your taxon by typing its exact name (Geobacillus or Firmicutes, etc) in the search box at the bottom right of the program window. Each time your taxon appear in the tree it will now be highlighted with a red line. 9. In order to study the tree more carefully, you need to focus on smaller subtrees of a larger tree. To view a subtree more closely, right click (control click) on the root node of the subtree of interest (the one that contains your taxon of interest) as shown to the left and then click "View Subtree in new Window". Drag the “Collapse tree” slider at the top of the new Tree Window to the right to fully uncollapse the tree. 10. Is your taxon associated with your particular environmental parameter? Play around with the branch color designations. Is it associated with any of the other environmental parameters? 11. To save your modified metadata and trees, click the Save Project button at the top of the Topiary Explorer window. This will create a new .tep file that will allow you to pick up where you’ve left off. 12. To save this view of the tree as a PDF, choose ‘File > Export Tree Image’ in the Topiary Explorer window. Type a name for the tree in the ‘Save as...’ field. The extension .pdf will be automatically added. Just enter a file name - not a path. Click the Export button, and the PDF will be opened in your default PDF viewer. For there you can save the file to where ever you’d like in your file system. After clicking the Export button it may take several seconds before the resulting PDF is generated and opened. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 5 Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 6