A Quick Guide to Velvet and Oases VELVET is a simple yet efficient genomic denovo assembler based on de Bruijn graphs, taking advantage of short and longer reads as well. This widely used package in Next Generation Sequencing applications can even tackle transcriptome assemblies in combination with Oases software. Due to the nature of its de Bruijn graph algorithm, probably the most important parameter in Velvet is the Kmer length, directly influencing specificity and sensitivity of the assembly. This quick guide assumes knowledge of UNIX operating systems and does not substitutes to the complete Velvet (v1.2.10) manual. Command line features are underlined. INSTALLATION Download from https://github.com/dzerbino/velvet/archive/master.zip, unzip the velvet-master.zip archive and compile with make followed by make install for basic installation. To benefit from multithreading and longer Kmer lengths (default max. Kmer = 31), compile using 'OPENMP=1' and 'MAXKMERLENGTH=X', where X has to be an odd integral, and the number of CPU’s used by Velvet will be read from the unit-incremented value of OMP_NUM_THREADS environment variable. GENERAL WORKFLOW velveth first establishes hash-tables consisting of all possible Kmeric sub-sequences found in the sequencing read dataset. This is were read type and Kmer length(s) are defined. velvetg then builds de Bruijn graphs based on velveth step, removes errors and solves repeats to eventually yields contig sequences. oases may be finally used for processing contigs into locus and their associated transcripts, when dealing with RNA-Seq experiments. VELVETH PARAMETERS Kmer length: - Unique Kmer length: velveth <out_dir> K <input_files> [opts] - Range k ≤ Kmers < K, with a step of s: velveth <out_dir> k,K,s <input_files> [opts] Input format: -fasta or -fasta.gz Classical FASTA format or its compressed version. -fastq or -fastq.gz FASTA with embedded calling quality or its compressed version. -sam or -bam Sequence Alignment/Mapping or its bin version. Read categories: Depending on the sequencing platform, read category may be -short (default), -shortPaired, -long or -longPaired. Use the suffix 2, and so forth, for distinct libraries of the same read category. For example: -short lib1reads.fq –short2 lib2reads.fq velvetg dir/ -read_trkg yes -ins_length 100 -exp_cov 10 Velveth options: -strand_specific -noHash For strand specific sequencing reads. Only prepare sequences for hashing. -reuse_Sequences Use the preprocessed sequences by -noHash. -create_binary -interleaved Binary output of velveth to speed up velvetg. Paired-end reads are interleaved in one file (default). Paired-end reads are in two different files. -separate VELVETG PARAMETERS (default in bold) Velvetg must be run separately on each velveth directory having a different K-mer length. A simple bash loop for i in <dir>*; do velvtg <dir> <options>; done is a way to sequentially accomplish this. Long single-end & short paired-end, multiple Kmers: velveth dir 23,55,8 -fastq -shortPaired -separate \ shortR1.fq shortR2.fq -long 454reads.fq for i in dir*; do velvetg dir_$i -read_trkg yes \ -ins_length 200; done Assembly of assemblies, short reads with multiple Kmers: velveth dir 37,63,4 -fastq -short reads.fastq for i in dir*; do velvetg dir_$i -read_trkg yes; done velveth MergeAssembly 27 -long dir*/contigs.fa -cov_cutoff Exclude low coverage nodes (float | auto). -max_coverage Exclude too highly covered contigs from the assembly (float). -exp_cov Expected coverage, for standard genomic data only (integer | auto). Expected coverage value greatly improves assembly. It is then highly advised to provide one. -min_contig_lgth Minimum contig length (int | Kmer-len.*2). OASES software is the continuation of Velvet assembler to process -ins_length Paired-end total insert size in base pairs, i.e. insert and both reads included (integer). -ins_length_sd Insert size standard deviation (integer). -scaffolding Scaffolding with N's of contigs that do not overlap (yes | no) denovo transcriptome assemblies (RNA-Seq). A Velvet installation is therefore required as well as a final output directory from a Velvet run. This Quick Guide is intended for mainstream applications of Oases v0.2.08 along with most of its parameters, without substituting to the complete manual. -read_trkg Produces a more detailed assembly (yes | no), usually required. -unused_reads Outputs unused reads in the assembly (yes | no). Keep contigs with long reads in them (yes | no) -conserveLong velvetg MergeAssembly/ -read_trkg yes -conserveLong yes NB: A Kmer length of 27 works well for most organisms when assembling serveral assemblies. Also note that contigs from initial assemblies are provided as -long for the merging assembly. INSTALLATION Get the code either by downloading the oases_x.x.x.tgz or via git (git clone git://github.com/dzerbino/ oases.git). Prior to make install, the make compilation step must be indictated with the same parameters as for Velvet compilation (Cfr. Velvet Installation). OASES PARAMETERS Note that Velvet expects all coverage values to be given in K-mer coverage. The relation between common nucleotide coverage (C) and Kmer coverage (Ck) is: -ins_length Specify paired-end insert size in base-pairs. Ck = C * ( L – k + 1 ) / L, where k is the hash length and L the read length. -cov_cutoff Minimum coverage for a transcript (int | 3). -min_pair_count Minimum number of bridging reads to confirm distance between two long contigs (int | 4). TYPICAL COMMANDS Short reads, single-end, Kmer=31: velveth dir 31 -fastq -short reads.fq velvetg dir/ -read_trkg yes Short paired-end reads, Kmer=53: velveth dir 53 -fastq -separate -shortPaired F.fq R.fq -min_trans_lgth Minimum length for a transcript to be outputted (int | 100). -merge To merge and process a Velvet assemblyof-assemblies. TYPICAL COMMANDS oases dir/ Given dir/ as the output folder of a velveth followed by a velvetg denovo assembly. The outcome of Oases will create a transcripts.fa file containing all the loci and their related transcripts. According to the official manual, it is advised to run Oases on a array of single-K assemblies (Cfr. Velvet with k≤ Kmers< K), which would have been merged. Example with Paired-End reads: python oases_pipeline.py \ -m 21 -M 35 -s 2 -o PE_assembly \ -d '-fastq -shortPaired -separate \ reads1.fq reads2.fq' Oases example with short reads: \ -p '-ins_length 200 -min_trans_lgth 100' 1 - Array of single-Kmer lengths: velveth dir 23,35,2 -fastq -short reads.fastq COPYRIGHTS for i in dir*; do velvetg $i -read_trkg yes; done 2 - Merging of the single-K assemblies : velveth Merged 27 -long dir*/contigs.fa velvetg Merged/ -read_trkg yes -conserverLong yes 3 – Assembly of transcripts with Oases : oases Merged/ -merge -min_trans_lgth 200 OASES PYTHON PIPELINE For convenience, Oases comes with a Python script designed to accomplish all tasks from the single-K assemblies to the final merged transcriptome assembly. OASES PYTHON OPTIONS VELVET: was developed and is maintained by Daniel R. Zerbino (zerbino@ebi.ac.uk) and Ewan Birney (birney@ebi.ac.uk). Please visit www.ebi.ac.uk/~zerbino/velvet website for further details on Velvet. Mailing list at: http://listserver.ebi.ac.uk/mailman/listinfo/velvet-users. OASES: was developed and is maintained by Daniel R. Zerbino (zerbino@ebi.ac.uk) and Marcel Schulz (marcel.schulz@mogen.mpg.de). Please visit www.ebi.ac.uk/~zerbino/oases website for further details on Oases. Mailing list at: http://listserver.ebi.ac.uk/mailman/listinfo/oasesusers. CITATIONS: - D. R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18:821-829 - M.H. Schulz, D.R. Zerbino, M. Vingron and Ewan Birney. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics, 2012. DOI:10.1093/bioinformatics/ bts094 -d 'FILES' Velveth file descriptors (string). -p 'OPTIONS' -m KMIN Oases options passed to the command line (string). Minimum K-mer length (odd int). -M KMAX Maximum K-mer length (odd int). -s KSTEP Steps in K-mer length (even int). EMBnet - European Molecular Biology Network - is a worldwide bioinformatics support network. Most countries have a national node which can provide training courses and other forms of help for users of bioinformatics software. -g KMERGE K-mer length for the merging of assemblies. You can find information about your national node from EMBnet site: -o NAME Output directory prefix (string). -r Only do the merging. -c Clean temporary files after all steps. THIS DOCUMENT was written and designed by Axel Thieffry and is being distributed by the P&PR Publications Committee of EMBnet. http://www.embnet.org/ A Quick Guide To Velvet v1.2.10 First edition © 2014 Axel Thieffry LICENSE: CC-BY-NC 3.0 http://creativecommons.org/licenses/by-nc/ THANKS to Jose Valerde for the help, Daniel Zerbino for review and validation, and the EMBnet community for helpful advice and review. Velvet & Oases