Agenda Aim: follow the workflow as given in the Qiime overview tutorial, but on a different dataset. We will use a real-life example, running into (and fixing) the type of problems you are likely to run into yourself, and we will look into the output of the different steps, allowing us to evaluate our progress (and fix mistakes). The parts marked “Intermezzo” are Jos demonstrating bits in a plenary fashion; the results (i.e. modified input files) are also provided, relevant command for applying changes is given as “alternative” in the workflow. Input (on the USB stick / provide as part of VM file): - - 16S FLX sequencing data o sequences.fna o sequences.qual Sample description: Excel-file-like (saved as “tab-delimited text”) o mapping.txt Workshop legend: command to be executed (from the “workshop” folder) are given in monospace: solid border: essential steps for following the workshop dashed border : essential steps if not manually correcting input files no border: non-essential, opens results files for viewing VirtualBox notes: - You will need a 64 bit machine You need admin privileges to install VirtualBox and extension pack (but not to run VM) you may need to enable virtualization support in your BIOS (e.g. HP machines) Extension pack: double-click on the extension pack file after installing VirtualBox Guest Additions: “install guest additions” in VM Copy-past: “bidirectional clipboard” in VM File sharing options (note: the last two options require an ssh server (“sudo aptget install ssh” in the VM) and port-forwarding (in VirtualBox under “settings” -> “network”)) o Shared folders o SCP o ExpanDrive (free 30-day trial) Qiime workshop NBIC Metagenomics course Jos Boekhorst, NIZO Food Research p. 1 QIIME VM notes: - Lot of info in the two folders on the Desktop Password: qiime (account has full sudo rights) Qiime error messages can be rather verbose, but they tend to be informative Workshop First we put the input files in our working directory: cp input_files/sequences* ./; cp input_files/mapping.txt ./ Check the mapping file: is the sample description info we have in the correct format? check_id_map.py -m mapping.txt -o mapping_output There are syntax errors in the mapping files. Check log file: less mapping_output/mapping.log Intermezzo: fix errors in Excel. Alternative: cp input_files/mapping_corrected.txt ./mapping.txt Remove previous output: rm -rf mapping_output Check again: we now should get no more errors. check_id_map.py -m mapping.txt -o mapping_output Demultiplex (“what read belongs to what sample”) (note that in contrast to the qiime tutorial workflow we use the “-c” flag to disable barcode correction): split_libraries.py -m mapping.txt -f sequences.fna -q sequences.qual -o split_libraries_output -c We got an error message: we did not specify the barcode length. rm -rf split_libraries_output;split_libraries.py -m mapping.txt -f sequences.fna -q sequences.qual -o split_libraries_output -b 6 -c Check the output: less split_libraries_output/split_library_log.txt We have an additional barcode, while a single sample lacks reads… “forensic bioinformatics” Qiime workshop NBIC Metagenomics course Jos Boekhorst, NIZO Food Research p. 2 Intermezzo: fix barcode of S119. Alternative: cp input_files/mapping_corrected2.txt ./mapping.txt Demultiplex again: rm -rf split_libraries_output; split_libraries.py -m mapping.txt -f sequences.fna -q sequences.qual -o split_libraries_output -b 6 -c Pick OTUs: pick_otus_through_otu_table.py -i split_libraries_output/seqs.fna -o otus Intermezzo: this tends to be the slowest step. While waiting, let’s have a closer look at the split_library output: histograms.txt in Excel (length distributions etc) This step generated the arguably most important output files: “otus/otu_table.biom” and “otus/rdp_assigned_taxonomy/seqs_rep_set_tax_assignments.txt”; they answer the question “what bacteria are found in what samples?” Unfortunately, the table is rather non-human-readable. Convenient script: convert_biom.py -b -i otus/otu_table.biom -o otus/otu_table.txt Intermezzo: look at out_table.txt in Excel. We combine this data with the tax assignments: otus/rdp_assigned_taxonomy/seqs_rep_set_tax_assignments.txt. I use Excel, relevant formula: VLOOKUP Qiime also generates this type of figures: make_otu_heatmap_html.py -i otus/otu_table.biom -o otus/OTU_Heatmap summarize_taxa_through_plots.py -i otus/otu_table.biom -o wf_taxa_summary -m mapping.txt Output is in html format: use firefox otus/OTU_Heatmap/otu_table.html and firefox wf_taxa_summary/taxa_summary_plots/bar_charts.html Networks are a (in my opinion often over-rated) way of looking at the data: make_otu_network.py -m mapping.txt -i otus/otu_table.biom -o otus/OTU_Network Qiime workshop NBIC Metagenomics course Jos Boekhorst, NIZO Food Research p. 3 (visualization: Cytoscape, requires in my experience additional filtering / reduction, not in the scope of this workshop. Alpha diversity: how much variation is there in each individual sample? echo "alpha_diversity:metrics shannon,PD_whole_tree,chao1,observed_species" > alpha_params.txt; alpha_rarefaction.py -i otus/otu_table.biom -m mapping.txt -o wf_arare/ -p alpha_params.txt -t otus/rep_set.tre Output is in html: firefox wf_arare/alpha_rarefaction_plots/rarefaction_plots.html Beta diversity: how much diversity is there between samples? beta_diversity_through_plots.py -i otus/otu_table.biom -m mapping.txt -o wf_bdiv_even/ -t otus/rep_set.tre -e 95 This step produces distance matrices, e.g. wf_bdiv_even/weighted_unifrac_dm.txt intermezzo: Excel heatmaps, within-person without-person distance. picture: boxplot Check the robustness of the clustering: jackknifed_beta_diversity.py -i otus/otu_table.biom -t otus/rep_set.tre -m mapping.txt -o wf_jack -e 70; make_bootstrapped_tree.py -m wf_jack/unweighted_unifrac/upgma_cmp/master_tree.tre -s wf_jack/unweighted_unifrac/upgma_cmp/jackknife_support.txt -o wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf Main result is visualized in a PDF file: evince wf_jack/unweighted_unifrac/upgma_cmp/jackknife_named_nodes.pdf intermezzo: fancy tree through iTOL: names, colors, bars etc If time permits bonus: supervised classification. supervised_learning.py -i otus/otu_table.biom -m mapping.txt -c volunteer -o random_forest Qiime workshop NBIC Metagenomics course Jos Boekhorst, NIZO Food Research p. 4