Box 1: Evaluation of general quality assessment tools: BIGpre (v 2.0.2): solqs. pl -i se_input.fq -d T -o se_output FastQC (v 0.10.1): fastqc se_input.fq HTQC (v 0.14.1): ht-stat -e illumina -f pri_casava1.8 -S -l 93 -o se_output -q -m F se_input.fq FASTX Statistics (v 0.0.13.2): fastx_quality_stats -i se_input.seq -o se_output NGS_QC_Toolkit (v 2.3.1): IlluQC.pl -se se_input.fq N 4 –onlyStat –o se_output Fastq-Utils (v 0.5.2b): fastqutils stats se_input.fq Prinseq (v 0.20.3): prinseq-lite.pl -phred64 -fastq se_input.fq -graph_data se_output.gd -graph_stats ld,gc,qd,ns,de out_good null -out_bad null SolexaQA (v 2.2): SolexaQA.pl se_input.fq -s 750000 -d se_output Box 3: Sequencing reads simulation: pIRS (v 1.10): pirs simulate -i hg19.fa.masked -l 100 -m 100 -v 10 -x 0.1 -Q 33 -c 0 #Note: To simulate adapter contamination, the pIRS was modified so that when selected genomic sequences are shorter than the preset read length (100bp in this study), adapter sequences will be concatenated at ends of genomic sequences before subsequent simulation of sequencing errors. de novo discovery of adapter contamination: Kraken (v 13-274): minion search-adapter se_input.fq -show 3 -write-fasta denovo_adapter.fasta swan -r reference_adapters.fa -q denovo_adapter.fasta Evaluation of adapter trimming tools for SE data: AdaptorRemoval (v 1.5.2): AdapterRemoval --file se_input.fq --basename se_output --stats --minlength 1 --pcr1 [adapter_sequence] AlienTrimmer (v 0.3.2): java -jar AlienTrimmer.jar -i se_input.fq -o se_output.fq -c adapter.fa Btrim (release data: 09/09/2011): btrim64-static -p adapter.fa -t se_input.fq -o se_output.fq -3 -s se_output.stat Cutadapt (v 1.3): cutadapt -a [adapter_sequence] -o se_output.fq se_input.fq ea-utils (v 1.1.2-537): fastq-mcf -o se_output.fq -q 0 -m 1 adapter.fa se_input.fq FASTX_Clipper (v 0.0.13.2): fastx_clipper -Q33 -a [adapter_sequence] -l 1 -i se_input.fq -o se_output.fq Flexbar (v 2.4): flexbar -r se_input.fq -t se_output -f i1.8 -a adapter.fa -m 1 Reaper (v 13-274): reaper -i se_input.fq -basename se_output -geom no-bc -3pa [adapter_sequence] -3p-global 100/0/0/0 -3p-prefix 1/2/0/2 Scythe (v 0.991beta): scythe -o se_output.fq -n 1 -a adapter.fa se_input.fq Trimmomatic (v 0.30): java -jar trimmomatic-0.30.jar SE -phred33 -trimlog se_output.log se_input.fq se_output.fq ILLUMINACLIP:adapter.fa:2:7:10 Evaluation of adaptor trimming tools for PE data: AdaptorRemoval (v 1.5.2): AdapterRemoval --file pe_input_fwd.fq --file2 pe_input_rev.fq --basename pe_output --stats -minlength 1 --pcr1 [adapter_sequence_fwd] --pcr2 [adapter_sequence_rev] AlienTrimmer (v 0.3.2): java -jar AlienTrimmer.jar -if pe_input_fwd.fq -ir pe_input_rev.fq -of pe_output_fwd.fq -or pe_output_rev.fq -os pe_output_single.fq -cf adapter_fwd.fa -cr adapter_rev.fa ea-utils (v 1.1.2-537): fastq-mcf -o pe_output_fwd.fq -o pe_output_rev.fq -q 0 -m 1 adapter_both.fa pe_input_fwd.fq pe_input_rev.fq Flexbar (v 2.4): flexbar -r pe_input_fwd.fq -p pe_input_rev.fq -t pe_output -f i1.8 -a adapter_both.fa -m 1 SeqPrep (release date: 05/31/2013): SeqPrep -f pe_input_fwd.fq -r pe_input_rev.fq -1 pe_output_fwd.fq.gz -2 pe_output_rev.fq.gz -A adapter_sequence_fwd -B adapter_sequence_rev Trimmomatic (v 0.30): java -jar trimmomatic-0.30.jar PE -phred33 -trimlog pe_input.log pe_input_fwd.fq pe_input_rev.fq pe_output_fwd_paired.fq pe_out_fwd_unpaired.fq pe_output_rev_paired.fq pe_out_rev_unpaired.fq ILLUMINACLIP:adapter.fa:2:30:7:1:true Evaluation of the Impact of adapter contamination on sequence read alignment Bowtie2 (v 2.1.0; “global” mode): bowtie2 -x dmel-all-chromosome-r5.52 -U SRR611832.fastq -S SRR611832.global.sam Bowtie2 (v 2.1.0; “local” mode): bowtie2 --local -x dmel-all-chromosome-r5.52 -U SRR611832.fastq -S SRR611832.local.sam BWA (v 0.7.5a): bwa aln -f SRR611832.sai dmel-all-chromosome-r5.52 SRR611832.fastq bwa samse -n 1 -f SRR611832.sam dmel-all-chromosome-r5.52 SRR611832.sai SRR611832.fastq BWA (v 0.7.5a; “soft clipping”): bwa mem -v 1 dmel-all-chromosome-r5.52 SRR611832.fastq > SRR611832.sam Evaluation of the Impact of adapter contamination on genome assembly: #in the complete SRR071758 dataset, there are more than 10 million reads with average quality scores below 5. We first filtered all such low quality reads and refer to the remaining reads as the “original” dataset. Adapter trimming: SeqPrep -f SRR071758_1.original.fq -r SRR071758_2.original.fq -1 SRR071758_1.ar.fq.gz -2 SRR071758_2.ar.fq.gz -A adapter_sequence_fwd -B adapter_sequence_rev #forward and reverse adapter sequences were inferred from respective datasets using Minion from the Kraken package. Quality trimming: java –jar trimmomatic-0.30.jar PE -phred33 SRR071758_1.original.fastq SRR071758_2.original.fastq SRR071758_1.qt.paired.fastq SRR071758_1.qt.unpaired.fastq SRR071758_2.qt.paired.fastq SRR071758_2.qt.unpaired.fastq LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:30 Error correction: quake.py -f SRR071758.list -k 15 -p 8 de novo genome assembly: All de novo genomes assemblies were generated using SOAPdenovo (v 1.05). Box 4: Impact of 5’-end trimming on the completeness of de novo transcriptome assembly Trimming 13bp from the 5’-end of all reads: fastx_trimmer -Q33 -f 13 –i input.fq –o output.fq de novo transcriptome assembly: All de novo transcriptome assemblies were generated using SOAPdenovo-Trans (v 1.0.3) and following the same procedure reported in (Xie et al. 2013).