blastn_vdb

advertisement
Supporting Information 2: SRA observations
SRA Toolkit






Download from ‘www.ncbi.nlm.nih.gov/Traces/sra/?view=software’
To uncompress the toolkit, execute ‘tar xfz sratoolkit.2.4.2-ubuntu64.tar.gz’ in your
desired directory (e.g. ‘/usr/bin’ or ‘/bin’)
To put in your executable path (using bash shell):
o export PATH=$PATH:/bin/sratoolkit.2.4.2-ubuntu64/bin
‘latf-load’ , ‘blastn_vdb’, ‘fastq-dump’, ‘prefetch’, ‘vdb-config’, ‘align-info’, and ‘kar’
should now be in your path (the versioned ‘fastq-dump’ did not work)
You can specify your SRA toolkit set using ‘vdb-config -i’ althought this may not be
necessary. For example, you can specify a directory for public data during this
configuration if you need to use a partition with more disk space (default is
/home/<user>/ncbi/public’). Specified directories are created automatically if not in
existence.
All SRA toolkit commands have a ‘--help’ option.
latf-load







latf-load may require sufficient ‘tmp’ space which can be a specified directory (e.g.
/blast/meta/tmp). Ideally this is a local file system. You can specify this location using the
‘--tmpfs’ option (e.g.--tmpfs /blast/meta/tmp).
latf-load is not friendly with SRA fastq-dump output defline. For example, this fastq
defline will not load:
o @ERR091571.1 HSQ1009_86:5:1101:1495:2027 length=101
While this will load:
o @HSQ1009_86:5:1101:1495:2027
If you have two reads you should use:
o @HSQ1009_86:5:1101:1495:2027/1
o @HSQ1009_86:5:1101:1495:2027/2
latf-load does not like output names like ‘test’ and currently needs a name like
‘SRS015381.sra’ or ‘SRR015381’.
latf-load creates a directory which can be ‘karred’ with:
o kar -c SRS015381kar -d SRS015381.sra
Original spot names are discarded by latf-load
blastn_vdb
•
•
•
blastn_vdb requires [DES]RR at start of SRA archive used as the subject database
blastn_vdb gets confused if you have a suffix other than .sra on the subject archive (e.g.
SRR000001.foo). Use ‘SRR000001foo’ or ‘SRR000001’ instead.
blastn_vdb is more effective for non-aligned SRA archives rather than archives using
reference-based compression (called ‘csra’). A more effective solution for alignmentbased archives is currently being developed. To determine if you have a csra, look for an
alignment tab at the SRA Run Browser or run ‘align-info <archive>| head’.
•
•
•
•
•
You can use ‘prefetch’ to pull an archive before running blastn_vdb and the result will be
in the ‘sra’ sub-directory of your public or dbGap folder (e.g. /home/bob/ncbi/public/sra).
Or you can execute blastn_vdb without prefetch and it will run as the download is
occurring.
Downloaded archives are already in a ‘kar’ format
blastn_vdb can be multi-threaded using the ‘-num_threads’ option
blastn_vdb results cannot be reformatted with blast_formatter
One blastn_vdb execution completed (i.e. indicated letters and sequences in database)
and then presented an error. This result will be investigated.
Download