QIIME Exercise

advertisement
Creosote and Black Grama root endophytes
454 Pyrosequencing / QIIME Exercise
Fall 2015
Background
This exercise is adapted from one that Dan Colman created for an Antarctic bacterial 16S
ribosomal RNA dataset from Tina Vesbach’s laboratory. Those data were used in a
publication by Dave Van Horn et al. that has been uploaded to the class website. The goal
here is to compare our class root fungal endophyte ITS sequences from black grama and
creosote. There are three types of samples: 1) creosote from a stand near the south end of
McKenzie flats, 2) black grama from within the creosote stand, and 3) black grama
outside the creosote stand (a few km away). The collections were designed to determine
the effect of plant species, and to some extent location, in shaping the fungal root
endophyte community. The sample names are abbreviated CrB (creosote), GiB (grass in)
and GoB (grass out). There are 17 samples total.
This exercise uses a fairly large number of independent programs that have been put
together in a package referred to as QIIME (pronounced chime), which stands for
Quantitative Insights Into Microbial Ecology. It is perhaps the most used software
package for microbial ecological studies that use nextgen sequence data.
The set of exercises that we will be doing constitute what is generally referred to as an
analysis pipeline.
The files
The fasta file that we will start with has a 454 dataset across 17 samples. (100715DNitspr.fasta).
The mapping file is called creosoteB_mappingV3. You can open the mapping file in
Excel to view the sample IDs and locations, as well as the type of plant the sample came
from.
Log into your CARC account, then use the following command to limit the number
of nodes you are using:
qsub -I -lwalltime=24:00:00 -lnodes=2:ppn=8
First we need to make some changes to the fasta data file. It has a term in the sequence
names (::) that a program in this pipeline does not like. We want to remove this without
modifying the original file. The first command creates a copy of the original fasta file
uner a new name. The second command replaces all instances of :: in the new file with
an underscore. Be sure to use an underscore in the second command, not a hyphen.
cp 100715DNits-pr.fasta 100715DNits-prV2.fasta
1
sed -i 's/::/_/g' 100715DNits-prV2.fasta
Wait for the prologue statement and then type qiime at the prompt.
1. Collapse the dataset into OTUs (operational taxonomic units, used as first
approximation of species)
pick_otus.py -i 100715DNits-prV2.fasta -o otu_output_creosoteB
2. Pick a representative sequence from each OTU
pick_rep_set.py -f 100715DNits-prV2.fasta –i otu_output_creosoteB/100715DNitsprV2_otus.txt -o creosoteB_rep_set.fasta
3. Assign taxonomic lineages to the sequences (currently, this works much better for
bacteria and other fungal data than it does for data from the Sevilleta)
assign_taxonomy.py –i creosoteB_rep_set.fasta -o creosoteB_taxonomy_assignments
–t sh_taxonomy_qiime_ver7_97_s_01.08.2015.txt –r
sh_refs_qiime_ver7_97_s_01.08.2015.fasta
4. Create an 'OTU table'
make_otu_table.py -i otu_output_creosoteB/100715DNits-prV2_otus.txt -t
creosoteB_taxonomy_assignments/creosoteB_rep_set_tax_assignments.txt -o
creosoteB_otu_table.biom
Although the table created by this command is coded in a way that makes it difficult to
interpret when opened as a txt file, the following is an example of the kind of information
that is included in this table. It records the number of times a given OTU occurs in each
sample.
2
5. Look at differences in taxonomic composition among samples
summarize_taxa_through_plots.py -i creosoteB_otu_table.biom -o
creosoteB_taxonomic_summary –m creosoteB_mappingV3.txt –s
Note that a significant amount of the sequences are 'Unclassified' meaning that most of
them could not be identified to close relatives in the existing database used as a reference.
6. Statistically analyze the differences in taxonomic composition
a. Subsample the data
single_rarefaction.py -i creosoteB_otu_table.biom -o otu_subsample_creosoteB -d 898
(898 is the number of sequences from the smallest sample)
b. Calculate distances between samples
beta_diversity.py -i otu_subsample_creosoteB -m bray_curtis -o
creosoteB_beta_diversity
c. Turn the distance matrix into an ordination graph
principal_coordinates.py –i
creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -o
creosoteB_pcoa_output.txt
Examine by sample type ('Type' in the mapping file) and by where the fungal sequences
came from (e.g. creosote or grama: 'Plant' in the mapping file)
make_2d_plots.py -i creosoteB_pcoa_output.txt -o creosoteB_pcoa_by_type_and_plant m creosoteB_mappingV3.txt -b Type,Plant
Download the folder created to your desktop and open the html file in a browser.
8. Create a UPGMA tree that shows sample relatedness as in a phylogenetic tree:
upgma_cluster.py -i creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt
-o upgma_tree.tre
This file can be opened in the phylogenetic tree viewer (FigTree) that you downloaded
earlier. The result reiterates what the 2-d ordination plots suggest, but gives more
information on how individual samples are related to one another, and whether there is
distinct grouping of certain types of samples. As in phylogenetic analyses, branches that
are more closely connected to one another are closer in terms of community compostion.
3
Recall that Creosote samples are labeled CrB, Grama-in are labeled GiB and Grama-out
are labeled GoB.
9. Test whether the type of sample these are (Black Grama in Creosote stand, out of
Creosote stand, or just Creosote) is more statistically supported as a grouping
category or if it only matters that the samples are either from Grama or Creosote
plants
compare_categories.py --method permanova -i
creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -m
creosoteB_mappingV3.txt -c Type -o significance_by_type
This will output the results of a 'perMANOVA' test, which assesses whether the different
types of samples are statistically different from one another in community composition.
Open the output file in the significance_by_type folder. There is a p value which lets you
know if the grouping categories are statistically supported. The higher the 'test statistic',
the more significant the result is (and thus also the lower the P value will be.) Now re run
the command comparing samples by the plant that they come from:
compare_categories.py --method permanova -i
creosoteB_beta_diversity/bray_curtis_otu_subsample_creosoteB.txt -m
creosoteB_mappingV3.txt -c Plant -o significance_by_plant
Compare the results from the two tests. Which grouping is more statistically significant?
Grouping by the plant they come from or the sample type? What does that say about
fungal communities and their relationships to plants at the Sevilleta?
4
Download