Creosote and Black Grama root endophytes 454 Pyrosequencing / QIIME Exercise Fall 2015 Background This exercise is adapted from one that Dan Colman created for an Antarctic bacterial 16S ribosomal RNA dataset from Tina Vesbach’s laboratory. Those data were used in a publication by Dave Van Horn et al. that has been uploaded to the class website. The goal here is to compare our class root fungal endophyte ITS sequences from black grama and creosote. There are three types of samples: 1) creosote from a stand near the south end of McKenzie flats, 2) black grama from within the creosote stand, and 3) black grama outside the creosote stand (a few km away). The collections were designed to determine the effect of plant species, and to some extent location, in shaping the fungal root endophyte community. The sample names are abbreviated CrB (creosote), GiB (grass in) and GoB (grass out). There are 17 samples total. This exercise uses a fairly large number of independent programs that have been put together in a package referred to as QIIME (pronounced chime), which stands for Quantitative Insights Into Microbial Ecology. It is perhaps the most used software package for microbial ecological studies that use nextgen sequence data. The set of exercises that we will be doing constitute what is generally referred to as an analysis pipeline. The files The fasta file that we will start with has a 454 dataset across 17 samples. (100715DNitspr.fasta). The mapping file is called creosoteB_mappingV3. You can open the mapping file in Excel to view the sample IDs and locations, as well as the type of plant the sample came from. Log into your CARC account, then use the following command to limit the number of nodes you are using: qsub -I -lwalltime=24:00:00 -lnodes=2:ppn=8 First we need to make some changes to the fasta data file. It has a term in the sequence names (::) that a program in this pipeline does not like. We want to remove this without modifying the original file. The first command creates a copy of the original fasta file uner a new name. The second command replaces all instances of :: in the new file with an underscore. Be sure to use an underscore in the second command, not a hyphen. cp 100715DNits-pr.fasta 100715DNits-prV2.fasta 1 sed -i 's/::/_/g' 100715DNits-prV2.fasta Wait for the prologue statement and then type qiime at the prompt. 1. Collapse the dataset into OTUs (operational taxonomic units, used as first approximation of species) pick_otus.py -i 100715DNits-prV2.fasta -o otu_output_creosoteB 2. Pick a representative sequence from each OTU pick_rep_set.py -f 100715DNits-prV2.fasta –i otu_output_creosoteB/100715DNitsprV2_otus.txt -o creosoteB_rep_set.fasta 3. Assign taxonomic lineages to the sequences (currently, this works much better for bacteria and other fungal data than it does for data from the Sevilleta) assign_taxonomy.py –i creosoteB_rep_set.fasta -o creosoteB_taxonomy_assignments –t sh_taxonomy_qiime_ver7_97_s_01.08.2015.txt –r sh_refs_qiime_ver7_97_s_01.08.2015.fasta 4. Create an 'OTU table' make_otu_table.py -i otu_output_creosoteB/100715DNits-prV2_otus.txt -t creosoteB_taxonomy_assignments/creosoteB_rep_set_tax_assignments.txt -o creosoteB_otu_table.biom Although the table created by this command is coded in a way that makes it difficult to interpret when opened as a txt file, the following is an example of the kind of information that is included in this table. It records the number of times a given OTU occurs in each sample. 2 5. Look at differences in taxonomic composition among samples summarize_taxa_through_plots.py -i creosoteB_otu_table.biom -o creosoteB_taxonomic_summary –m creosoteB_mappingV3.txt –s Note that a significant amount of the sequences are 'Unclassified' meaning that most of them could not be identified to close relatives in the existing database used as a reference. 6. Statistically analyze the differences in taxonomic composition a. Subsample the data single_rarefaction.py -i creosoteB_otu_table.biom -o otu_subsample_creosoteB -d 898 (898 is the number of sequences from the smallest sample) b. Calculate distances between samples beta_diversity.py -i otu_subsample_creosoteB -m bray_curtis -o creosoteB_beta_diversity c. Turn the distance matrix into an ordination graph principal_coordinates.py –i creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -o creosoteB_pcoa_output.txt Examine by sample type ('Type' in the mapping file) and by where the fungal sequences came from (e.g. creosote or grama: 'Plant' in the mapping file) make_2d_plots.py -i creosoteB_pcoa_output.txt -o creosoteB_pcoa_by_type_and_plant m creosoteB_mappingV3.txt -b Type,Plant Download the folder created to your desktop and open the html file in a browser. 8. Create a UPGMA tree that shows sample relatedness as in a phylogenetic tree: upgma_cluster.py -i creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -o upgma_tree.tre This file can be opened in the phylogenetic tree viewer (FigTree) that you downloaded earlier. The result reiterates what the 2-d ordination plots suggest, but gives more information on how individual samples are related to one another, and whether there is distinct grouping of certain types of samples. As in phylogenetic analyses, branches that are more closely connected to one another are closer in terms of community compostion. 3 Recall that Creosote samples are labeled CrB, Grama-in are labeled GiB and Grama-out are labeled GoB. 9. Test whether the type of sample these are (Black Grama in Creosote stand, out of Creosote stand, or just Creosote) is more statistically supported as a grouping category or if it only matters that the samples are either from Grama or Creosote plants compare_categories.py --method permanova -i creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -m creosoteB_mappingV3.txt -c Type -o significance_by_type This will output the results of a 'perMANOVA' test, which assesses whether the different types of samples are statistically different from one another in community composition. Open the output file in the significance_by_type folder. There is a p value which lets you know if the grouping categories are statistically supported. The higher the 'test statistic', the more significant the result is (and thus also the lower the P value will be.) Now re run the command comparing samples by the plant that they come from: compare_categories.py --method permanova -i creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -m creosoteB_mappingV3.txt -c Plant -o significance_by_plant Compare the results from the two tests. Which grouping is more statistically significant? Grouping by the plant they come from or the sample type? What does that say about fungal communities and their relationships to plants at the Sevilleta? 4