Qiime_workshop_130123

advertisement
Agenda
Aim: follow the workflow as given in the Qiime overview tutorial, but on a different
dataset. We will use a real-life example, running into (and fixing) the type of problems
you are likely to run into yourself, and we will look into the output of the different
steps, allowing us to evaluate our progress (and fix mistakes). The parts marked
“Intermezzo” are Jos demonstrating bits in a plenary fashion; the results (i.e. modified
input files) are also provided, relevant command for applying changes is given as
“alternative” in the workflow.
Input (on the USB stick / provide as part of VM file):
-
-
16S FLX sequencing data
o sequences.fna
o sequences.qual
Sample description: Excel-file-like (saved as “tab-delimited text”)
o mapping.txt
Workshop legend: command to be executed (from the “workshop” folder) are given in
monospace:
solid border: essential steps for following the workshop
dashed border : essential steps if not manually correcting input files
no border: non-essential, opens results files for viewing
VirtualBox notes:
-
You will need a 64 bit machine
You need admin privileges to install VirtualBox and extension pack (but not to run
VM)
you may need to enable virtualization support in your BIOS (e.g. HP machines)
Extension pack: double-click on the extension pack file after installing VirtualBox
Guest Additions: “install guest additions” in VM
Copy-past: “bidirectional clipboard” in VM
File sharing options (note: the last two options require an ssh server (“sudo aptget install ssh” in the VM) and port-forwarding (in VirtualBox under “settings” ->
“network”))
o Shared folders
o SCP
o ExpanDrive (free 30-day trial)
Qiime workshop NBIC Metagenomics course
Jos Boekhorst, NIZO Food Research
p. 1
QIIME VM notes:
-
Lot of info in the two folders on the Desktop
Password: qiime (account has full sudo rights)
Qiime error messages can be rather verbose, but they tend to be informative
Workshop
First we put the input files in our working directory:
cp input_files/sequences* ./; cp input_files/mapping.txt ./
Check the mapping file: is the sample description info we have in the correct format?
check_id_map.py -m mapping.txt -o mapping_output
There are syntax errors in the mapping files. Check log file: less
mapping_output/mapping.log
Intermezzo: fix errors in Excel.
Alternative:
cp input_files/mapping_corrected.txt ./mapping.txt
Remove previous output:
rm -rf mapping_output
Check again: we now should get no more errors.
check_id_map.py -m mapping.txt -o mapping_output
Demultiplex (“what read belongs to what sample”) (note that in contrast to the qiime
tutorial workflow we use the “-c” flag to disable barcode correction):
split_libraries.py -m mapping.txt -f sequences.fna -q
sequences.qual -o split_libraries_output -c
We got an error message: we did not specify the barcode length.
rm -rf split_libraries_output;split_libraries.py -m mapping.txt -f
sequences.fna -q sequences.qual -o split_libraries_output -b 6 -c
Check the output: less split_libraries_output/split_library_log.txt
We have an additional barcode, while a single sample lacks reads… “forensic
bioinformatics”
Qiime workshop NBIC Metagenomics course
Jos Boekhorst, NIZO Food Research
p. 2
Intermezzo: fix barcode of S119.
Alternative:
cp input_files/mapping_corrected2.txt ./mapping.txt
Demultiplex again:
rm -rf split_libraries_output; split_libraries.py -m mapping.txt
-f sequences.fna -q sequences.qual -o split_libraries_output -b 6
-c
Pick OTUs:
pick_otus_through_otu_table.py -i split_libraries_output/seqs.fna
-o otus
Intermezzo: this tends to be the slowest step. While waiting, let’s have a closer look at
the split_library output: histograms.txt in Excel (length distributions etc)
This step generated the arguably most important output files: “otus/otu_table.biom”
and “otus/rdp_assigned_taxonomy/seqs_rep_set_tax_assignments.txt”; they answer the
question “what bacteria are found in what samples?”
Unfortunately, the table is rather non-human-readable. Convenient script:
convert_biom.py -b -i otus/otu_table.biom -o otus/otu_table.txt
Intermezzo: look at out_table.txt in Excel. We combine this data with the tax
assignments: otus/rdp_assigned_taxonomy/seqs_rep_set_tax_assignments.txt. I use
Excel, relevant formula: VLOOKUP
Qiime also generates this type of figures:
make_otu_heatmap_html.py -i otus/otu_table.biom -o
otus/OTU_Heatmap
summarize_taxa_through_plots.py -i otus/otu_table.biom -o
wf_taxa_summary -m mapping.txt
Output is in html format: use firefox otus/OTU_Heatmap/otu_table.html and firefox
wf_taxa_summary/taxa_summary_plots/bar_charts.html
Networks are a (in my opinion often over-rated) way of looking at the data:
make_otu_network.py -m mapping.txt -i otus/otu_table.biom -o
otus/OTU_Network
Qiime workshop NBIC Metagenomics course
Jos Boekhorst, NIZO Food Research
p. 3
(visualization: Cytoscape, requires in my experience additional filtering / reduction, not
in the scope of this workshop.
Alpha diversity: how much variation is there in each individual sample?
echo "alpha_diversity:metrics
shannon,PD_whole_tree,chao1,observed_species" > alpha_params.txt;
alpha_rarefaction.py -i otus/otu_table.biom -m mapping.txt -o
wf_arare/ -p alpha_params.txt -t otus/rep_set.tre
Output is in html: firefox wf_arare/alpha_rarefaction_plots/rarefaction_plots.html
Beta diversity: how much diversity is there between samples?
beta_diversity_through_plots.py -i otus/otu_table.biom -m
mapping.txt -o wf_bdiv_even/ -t otus/rep_set.tre -e 95
This step produces distance matrices, e.g. wf_bdiv_even/weighted_unifrac_dm.txt
intermezzo: Excel heatmaps, within-person without-person distance. picture: boxplot
Check the robustness of the clustering:
jackknifed_beta_diversity.py -i otus/otu_table.biom -t
otus/rep_set.tre -m mapping.txt -o wf_jack -e 70;
make_bootstrapped_tree.py -m
wf_jack/unweighted_unifrac/upgma_cmp/master_tree.tre -s
wf_jack/unweighted_unifrac/upgma_cmp/jackknife_support.txt -o
wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf
Main result is visualized in a PDF file:
evince wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf
intermezzo: fancy tree through iTOL: names, colors, bars etc
If time permits bonus: supervised classification.
supervised_learning.py -i otus/otu_table.biom -m mapping.txt -c
volunteer -o random_forest
Qiime workshop NBIC Metagenomics course
Jos Boekhorst, NIZO Food Research
p. 4
Download