Supporting Information 2: SRA observations SRA Toolkit Download from ‘www.ncbi.nlm.nih.gov/Traces/sra/?view=software’ To uncompress the toolkit, execute ‘tar xfz sratoolkit.2.4.2-ubuntu64.tar.gz’ in your desired directory (e.g. ‘/usr/bin’ or ‘/bin’) To put in your executable path (using bash shell): o export PATH=$PATH:/bin/sratoolkit.2.4.2-ubuntu64/bin ‘latf-load’ , ‘blastn_vdb’, ‘fastq-dump’, ‘prefetch’, ‘vdb-config’, ‘align-info’, and ‘kar’ should now be in your path (the versioned ‘fastq-dump’ did not work) You can specify your SRA toolkit set using ‘vdb-config -i’ althought this may not be necessary. For example, you can specify a directory for public data during this configuration if you need to use a partition with more disk space (default is /home/<user>/ncbi/public’). Specified directories are created automatically if not in existence. All SRA toolkit commands have a ‘--help’ option. latf-load latf-load may require sufficient ‘tmp’ space which can be a specified directory (e.g. /blast/meta/tmp). Ideally this is a local file system. You can specify this location using the ‘--tmpfs’ option (e.g.--tmpfs /blast/meta/tmp). latf-load is not friendly with SRA fastq-dump output defline. For example, this fastq defline will not load: o @ERR091571.1 HSQ1009_86:5:1101:1495:2027 length=101 While this will load: o @HSQ1009_86:5:1101:1495:2027 If you have two reads you should use: o @HSQ1009_86:5:1101:1495:2027/1 o @HSQ1009_86:5:1101:1495:2027/2 latf-load does not like output names like ‘test’ and currently needs a name like ‘SRS015381.sra’ or ‘SRR015381’. latf-load creates a directory which can be ‘karred’ with: o kar -c SRS015381kar -d SRS015381.sra Original spot names are discarded by latf-load blastn_vdb • • • blastn_vdb requires [DES]RR at start of SRA archive used as the subject database blastn_vdb gets confused if you have a suffix other than .sra on the subject archive (e.g. SRR000001.foo). Use ‘SRR000001foo’ or ‘SRR000001’ instead. blastn_vdb is more effective for non-aligned SRA archives rather than archives using reference-based compression (called ‘csra’). A more effective solution for alignmentbased archives is currently being developed. To determine if you have a csra, look for an alignment tab at the SRA Run Browser or run ‘align-info <archive>| head’. • • • • • You can use ‘prefetch’ to pull an archive before running blastn_vdb and the result will be in the ‘sra’ sub-directory of your public or dbGap folder (e.g. /home/bob/ncbi/public/sra). Or you can execute blastn_vdb without prefetch and it will run as the download is occurring. Downloaded archives are already in a ‘kar’ format blastn_vdb can be multi-threaded using the ‘-num_threads’ option blastn_vdb results cannot be reformatted with blast_formatter One blastn_vdb execution completed (i.e. indicated letters and sequences in database) and then presented an error. This result will be investigated.