commands.

advertisement
Box 1:
Evaluation of general quality assessment tools:
BIGpre (v 2.0.2):
solqs. pl -i se_input.fq -d T -o se_output
FastQC (v 0.10.1):
fastqc se_input.fq
HTQC (v 0.14.1):
ht-stat -e illumina -f pri_casava1.8 -S -l 93 -o se_output -q -m F se_input.fq
FASTX Statistics (v 0.0.13.2):
fastx_quality_stats -i se_input.seq -o se_output
NGS_QC_Toolkit (v 2.3.1):
IlluQC.pl -se se_input.fq N 4 –onlyStat –o se_output
Fastq-Utils (v 0.5.2b):
fastqutils stats se_input.fq
Prinseq (v 0.20.3):
prinseq-lite.pl -phred64 -fastq se_input.fq -graph_data se_output.gd -graph_stats ld,gc,qd,ns,de out_good null -out_bad null
SolexaQA (v 2.2):
SolexaQA.pl se_input.fq -s 750000 -d se_output
Box 3:
Sequencing reads simulation:
pIRS (v 1.10):
pirs simulate -i hg19.fa.masked -l 100 -m 100 -v 10 -x 0.1 -Q 33 -c 0
#Note: To simulate adapter contamination, the pIRS was modified so that when selected genomic
sequences are shorter than the preset read length (100bp in this study), adapter sequences will be
concatenated at ends of genomic sequences before subsequent simulation of sequencing errors.
de novo discovery of adapter contamination:
Kraken (v 13-274):
minion search-adapter se_input.fq -show 3 -write-fasta denovo_adapter.fasta
swan -r reference_adapters.fa -q denovo_adapter.fasta
Evaluation of adapter trimming tools for SE data:
AdaptorRemoval (v 1.5.2):
AdapterRemoval --file se_input.fq --basename se_output --stats --minlength 1 --pcr1
[adapter_sequence]
AlienTrimmer (v 0.3.2):
java -jar AlienTrimmer.jar -i se_input.fq -o se_output.fq -c adapter.fa
Btrim (release data: 09/09/2011):
btrim64-static -p adapter.fa -t se_input.fq -o se_output.fq -3 -s se_output.stat
Cutadapt (v 1.3):
cutadapt -a [adapter_sequence] -o se_output.fq se_input.fq
ea-utils (v 1.1.2-537):
fastq-mcf -o se_output.fq -q 0 -m 1 adapter.fa se_input.fq
FASTX_Clipper (v 0.0.13.2):
fastx_clipper -Q33 -a [adapter_sequence] -l 1 -i se_input.fq -o se_output.fq
Flexbar (v 2.4):
flexbar -r se_input.fq -t se_output -f i1.8 -a adapter.fa -m 1
Reaper (v 13-274):
reaper -i se_input.fq -basename se_output -geom no-bc -3pa [adapter_sequence] -3p-global
100/0/0/0 -3p-prefix 1/2/0/2
Scythe (v 0.991beta):
scythe -o se_output.fq -n 1 -a adapter.fa se_input.fq
Trimmomatic (v 0.30):
java -jar trimmomatic-0.30.jar SE -phred33 -trimlog se_output.log se_input.fq se_output.fq
ILLUMINACLIP:adapter.fa:2:7:10
Evaluation of adaptor trimming tools for PE data:
AdaptorRemoval (v 1.5.2):
AdapterRemoval --file pe_input_fwd.fq --file2 pe_input_rev.fq --basename pe_output --stats -minlength 1 --pcr1 [adapter_sequence_fwd] --pcr2 [adapter_sequence_rev]
AlienTrimmer (v 0.3.2):
java -jar AlienTrimmer.jar -if pe_input_fwd.fq -ir pe_input_rev.fq -of pe_output_fwd.fq -or
pe_output_rev.fq -os pe_output_single.fq -cf adapter_fwd.fa -cr adapter_rev.fa
ea-utils (v 1.1.2-537):
fastq-mcf -o pe_output_fwd.fq -o pe_output_rev.fq -q 0 -m 1 adapter_both.fa pe_input_fwd.fq
pe_input_rev.fq
Flexbar (v 2.4):
flexbar -r pe_input_fwd.fq -p pe_input_rev.fq -t pe_output -f i1.8 -a adapter_both.fa -m 1
SeqPrep (release date: 05/31/2013):
SeqPrep -f pe_input_fwd.fq -r pe_input_rev.fq -1 pe_output_fwd.fq.gz -2 pe_output_rev.fq.gz -A
adapter_sequence_fwd -B adapter_sequence_rev
Trimmomatic (v 0.30):
java -jar trimmomatic-0.30.jar PE -phred33 -trimlog pe_input.log pe_input_fwd.fq
pe_input_rev.fq pe_output_fwd_paired.fq pe_out_fwd_unpaired.fq pe_output_rev_paired.fq
pe_out_rev_unpaired.fq ILLUMINACLIP:adapter.fa:2:30:7:1:true
Evaluation of the Impact of adapter contamination on sequence read alignment
Bowtie2 (v 2.1.0; “global” mode):
bowtie2 -x dmel-all-chromosome-r5.52 -U SRR611832.fastq -S SRR611832.global.sam
Bowtie2 (v 2.1.0; “local” mode):
bowtie2 --local -x dmel-all-chromosome-r5.52 -U SRR611832.fastq -S SRR611832.local.sam
BWA (v 0.7.5a):
bwa aln -f SRR611832.sai dmel-all-chromosome-r5.52 SRR611832.fastq
bwa samse -n 1 -f SRR611832.sam dmel-all-chromosome-r5.52 SRR611832.sai
SRR611832.fastq
BWA (v 0.7.5a; “soft clipping”):
bwa mem -v 1 dmel-all-chromosome-r5.52 SRR611832.fastq > SRR611832.sam
Evaluation of the Impact of adapter contamination on genome assembly:
#in the complete SRR071758 dataset, there are more than 10 million reads with average
quality scores below 5. We first filtered all such low quality reads and refer to the
remaining reads as the “original” dataset.
Adapter trimming:
SeqPrep -f SRR071758_1.original.fq -r SRR071758_2.original.fq -1 SRR071758_1.ar.fq.gz
-2 SRR071758_2.ar.fq.gz -A adapter_sequence_fwd -B adapter_sequence_rev
#forward and reverse adapter sequences were inferred from respective datasets using Minion
from the Kraken package.
Quality trimming:
java –jar trimmomatic-0.30.jar PE -phred33 SRR071758_1.original.fastq
SRR071758_2.original.fastq SRR071758_1.qt.paired.fastq SRR071758_1.qt.unpaired.fastq
SRR071758_2.qt.paired.fastq SRR071758_2.qt.unpaired.fastq LEADING:3 TRAILING:3
SLIDINGWINDOW:4:15 MINLEN:30
Error correction:
quake.py -f SRR071758.list -k 15 -p 8
de novo genome assembly:
All de novo genomes assemblies were generated using SOAPdenovo (v 1.05).
Box 4:
Impact of 5’-end trimming on the completeness of de novo transcriptome assembly
Trimming 13bp from the 5’-end of all reads:
fastx_trimmer -Q33 -f 13 –i input.fq –o output.fq
de novo transcriptome assembly:
All de novo transcriptome assemblies were generated using SOAPdenovo-Trans (v 1.0.3)
and following the same procedure reported in (Xie et al. 2013).
Download