Part 3 - Susquehanna University

Spring 2015 BIOL 312: Microbiology A Town on Fire Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire Instructor: Dr. Tammy Tobin University E-Mail: tobinjan@susqu.edu Susquehanna Team Application Activity #3: Statistical Analysis of Microbial Community Diversity and Composition. Names of Team Members: Introduction: During the last class period, you assigned your metagenomic sequences to OTUs, picked representative sequences for each OTU, made a .biom table to summarize your OTU data, and then processed the .biom tables into bar graphs in order to better visualize whether or not the results supported your hypothesis regarding the presence or absence of a single species in a Centralia soil sample. You were then asked to hypothesize which environmental parameter, from the ones in the mapping file (temperature, pH, ammonia, nitrate, sulfate or total sulfur concentration) you believe plays the largest role in determining microbial community diversity and structure in Centralia. In this activity you will use QIIME to prepare your sequence data for phylogenetic analysis, to generate a phylogenetic tree from your sequence data (we will analyze that tree during the last class period), and to analyze how microbial community diversity varies with environmental conditions. Diversity Analyses We will be performing two different types of diversity analysis in this case study. Alpha diversity looks at the diversity within samples, in this case, the OTU diversity within each borehole. Beta diversity describes the differences between samples. Both types of diversity analysis can be computed using QIIME. Since some of the diversity metrics we will be using require the existence of a phylogenetic tree, we will construct that first. We will not actually analyze the tree until the next session. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 1 Using QIIME to Construct Phylogenetic Trees The underlying assumption for all DNA sequenced-based phylogenetic analyses is that the more closely related two species are, evolutionarily, the more closely related their DNA sequences will be. This underlying assumption does have some flaws that need to be kept in mind. As you have already learned, horizontal gene transfer between species can make two species look more (or less) related than they truly are. Also, not all DNA changes are equal in terms of phenotypic outcome. Some mutations are selectively neutral while others are not. Thus, selective pressures will have an impact on the rate of nucleotide changes observed in different parts of the genome over time. Some analysis metrics take this latter situation into account by weighting base changes in different codon positions differently (to account for silent mutations, etc.). This is not done in 16S rRNA sequence analysis because there are no codons (no protein is produced). 1. Aligning sequences. In order to construct a phylogenetic tree, all of the 16S rRNA sequences in our quality-filtered fasta files will first need to be aligned to make sure that the base changes observed between sequences are due to mutations to that site in the gene, rather than to comparison of two completely different parts of the gene. QIIME will also insert gaps, as needed, to account for the fact that insertions and deletions of bases also occur during evolution. By way of example, take the three related phrases below: AFATCAT AFFATCAT TINYRATFEAREDAFFATCAT If these phrases were compared without adjusting the default alignment (left justified) above, they would show almost no similarity at all. Even phrases 1 and 2, which are obviously very similar, would have only two letters in common as they are currently aligned: the first A and F. The third sequence (TI) would not match at all. After those first two letters, almost every subsequent letter is different. Aligned versions of these phrases are shown below: A_FATCAT AFFATCAT TINYRATFEAREDAFFATCAT In this scenario, QIIME has shifted phrases 1 and 2 over to the right, so they match the corresponding phrase in 3, and has also inserted a gap in phrase 1, to account for the additional F’s in phrases 2 and 3. This will give a much more accurate picture of the overall sequence identity. The command for aligning sequences in QIIME is align_seqs.py, with an input of your representative OTU fasta files, and an output directory of centralia_repset1_aligned. This step will take several minutes, so be patient and do not hit return or enter another command until you see the $ prompt. The total command is: align_seqs.py -i rep_set1.fna -o aligned_sequences 2. Next you will filter out uninformative sequence data. This script will remove positions that are gaps in every sequence (which can happen with some alignment programs), as well as those, such as ‘TINYRATFEARED’ in phrase 3 above that are non-conserved positions (it is only present in one phrase), and thus are uninformative for tree building. The command for this is: filter_alignment.py. The input file will be the aligned sequence file, and the output will be a folder entitled ‘filtered_alignments’. The total command is: filter_alignment.py -i aligned_sequences/rep_set1_aligned.fasta -o filtered_alignments Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 2 3. Finally, you will ask QIIME to generate a phylogenetic tree from your filtered alignments using FastTree, a modified NeighborJoining method (uses pairwise sequence comparisons to build the tree) that is much faster at metagenomics level analyses than many other methods, but is still reliable (see article in References section for more details). The command is: make_phylogeny.py -i filtered_alignments/rep_set1_aligned_pfiltered.fasta You will be able to visualize trees in three basic ways, as shown below. The first two trees are ‘rooted’, while that last is not. Remember that no matter which way the tree is diagrammed, they show phylogenetic relationships in the same way. 1. In the first tree above, which taxon is most closely related to Nimravidae? 2. In the second tree above, which taxon is most closely related to Spirochetes? That’s it for phylogeny for now. We can next move on to calculating alpha diversity. Calculating Alpha Diversity To calculate alpha diversity, QIIME must first generate alpha rarefaction tables (in biom format). As you know from your readings, rarefaction data will not only provide information regarding the amount of diversity present within each sample, but will also help you determine if you have sampled at a sufficient depth to reveal an acceptable level of the diversity present in your sample. We will use three different methods to actually analyze alpha diversity. Faith's Phylogenetic Diversity, Heip's Evenness and total number of OTUs. Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 3 Faith's Phylogenetic Diversity is based on the phylogenetic tree you generated in the first part of this activity. This method adds up all the branch lengths in the tree as a measure of alpha diversity. Branch lengths are roughly analogous to the number of nucleotide changes represented in the tree. So, if you add a new OTU to a dataset that is closely related to another OTU in the sample (very few nucleotide changes between them), it will only cause a small increase in diversity. However, if you add a new OTU that comes from a totally different lineage than anything else in the sample, it will cause a much larger increase in the diversity. Heip's Evenness measures how close in numbers the OTUs an environment are. If a sample contains 100 sequences, and there are roughly equal numbers of sequences from each OTU, then Heip’s evenness will be close to one. If, however, only one of the sequences came from one OTU, and 99 came from another, then the score will be very close to zero. Total number of OTUs is exactly what it sounds like. If there are more OTUs, then the samples will be considered to be more diverse. Alpha diversity is generated in four steps: 1. Generating rarefaction tables. In this step, QIIME will subsample the original OTU table at a variety of specified sequence depths, and will report the number of OTUs revealed at each depth. In the script we will use, the OTU will first be subsampled 10 times at a depth of 10 sequences/sample, then 10 times at a depth of 120 sequences/sample, and so on until the maximum rarefaction depth is reached (we will stop at 1000 sequences/sample, although we could well choose to look at even more sequences if our data indicates this would be beneficial). The step size is 110, which means that each sampling depth will be increased by 110 until 1000 sequences/sample is reached. Since each subsampling depth will be repeated 10 times, a total of 100 subsampled OTU tables will be generated in our output (multiple_rarefactions) folder. The script for this analysis is: multiple_rarefactions.py -i output.biom -o multiple_rarefactions -m 10 -x 1000 -n 10 -s 110 2. Next, the alpha diversity of the rarefied samples will be computed using the three different metrics. alpha_diversity.py -i multiple_rarefactions/ -o alpha_diversity -m PD_whole_tree,observed_species,heip_e -t filtered_alignments/rep_set1_aligned_pfiltered.tre 3. At this stage there are still a ton of separate files, so those will need to be collated into a single file for graphing purposes. The command is: collate_alpha.py -i alpha_diversity/ -o collated_alpha_diversity 4. Now we can plot the results, including the original mapping data so that we can see if alpha diversity varies with sample site, chemical or temperature parameters. make_rarefaction_plots.py -i collated_alpha_diversity/ -m mapping_centralia.txt -o rarefaction_plots Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 4 5. Open the rarification_plots folder in your Centralia_Case_Study folder. Double click on rarefication_plots.html to open it. You should see a window that looks like this: From the “Select a Metric” drop-down, choose PD_whole_tree and from the ‘Select a category” choose “Sample ID”. You will now see Faith’s Phylogenetic Diversity rarefication curves for all three sample sites. Which sample has the highest species richness? Do you believe we have sufficiently sampled this location in order to see all of its microbial diversity? Justify your answer. 6. Change the metric to “Observed Species”. Does this metric support your conclusions in question 6? Explain. 7. Finally, change the metric to “Heip’s”. Which microbial community shows the greatest evenness? How does species evenness differ from species richness? Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 5 Metagenomic Analysis of Bacterial Communities in Soils Overlying the Centralia, Pennsylvania Mine Fire 6

Part 3 - Susquehanna University

Related documents

Products

Support

Part 3 - Susquehanna University

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib