ppt

NGS Bioinformatics Workshop 2.3 Tutorial – Transcriptome Assembly May 16th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB Workflow for Today Erratum about last week Questions from last week Galaxy @ Westgrid now available Transcriptome assembly Mapped-based assembly: Bowtie, TopHat,Cufflinks de novo assembly: Velvet + Oases Trans-ABySS Erratum: Running Velvet with two paired end data files Run velveth: velveth outputdir k_mer –fastq  -shortPaired paired_data_file_1  -shortPaired2 paired_data_file_2 Run velvetg: velvetg outputdir  -ins_length 200 -exp_cov 20 1st Question from last week… What are the limits in read lengths (e.g. Sanger ~1000 - 1500) to NGS assemblers? From P.4 of ALLPATHS-LG manual: Capabilities and limitations ALLPATHS-LG is a short-read assembler. It has been designed to use reads produced by new sequencing technology machines such as the Illumina Genome Analyser. the version described here has been optimized for, but not necessarily limited to, reads of length 100 bases. ALLPATHS is not designed to assemble Sanger or 454 FLX reads, or a mix of these with short reads. 1st Question from last week… On p5 of the Velvet manual: Read lengths are stored on signed 16bit integers, meaning that if you are assembling contigs longer than 32kb long, then more memory is required to store the coordinates. To do so, simply add the following option to the make command: make 'LONGSEQUENCES=1‘ (Note the single quotes and absence of spacing.) This will cost more memory overhead. 2nd Question from last week… What are the limits to insert sizes of libraries? From P.10 of ALLPATHS-LG manual: Supported library constructions …any input dataset should include as least one fragment library and one jumping library... A jumping library has a longer separation, typically in the 3kbp-10kbp range... …Additionally, ALLPATHS also supports long jumping libraries. A jumping library is considered to be long if the insert size is larger than 20 kbp. In Velvet Manual, P.10 Shows a command line switch example of –ins_length_long=40000 Now available: Galaxy @ WestGrid https://joffre.westgrid.ca/galaxy/ Accessing the Westgrid Galaxy instance Use your Westgrid ID (email name without @part) to log into Joffre, e.g. if your email is ‘rbruskie@sfu.ca’, your server access id is ‘rbruskie’, and use your WestGrid password Logging into the Galaxy instance Once into Galaxy, you need to register (initially) or log in (if already registered) using your username (your full email, e.g. ‘rbruskie@sfu.ca’) and (important!) use your WestGrid password as the Galaxy password Transcriptome Assembly - Overview  As in whole genome, one can have a reference based (‘map based’) assembly, based on read alignment, and a ‘de novo’ assembly, based on De Bruijn graph construction.  In some respects, transcriptome assembly can be more challenging due to splice isoforms and overlapping transcripts, and other issues.  For a detailed review of the issues and available software, see Martin JA and Wang Z. 2011. Next-generation transcriptome assembly. Nature Reviews Genetics 12:671-682 Assembly by Mapping: Bowtie/TopHat/Cufflinks Suite  Bowtie2: Ultrafast short read alignment http://bowtie-bio.sourceforge.net/bowtie2  TopHat: is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to large genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. http://tophat.cbcb.umd.edu  Cufflinks: Isoform assembly and quantitation for RNA-Seq. http://cufflinks.cbcb.umd.edu/  It is non-trivial to install this software suite…  Fortunately, the software is installed under Galaxy and some useful tutorials are available (see https://main.g2.bx.psu.edu/u/jeremy/p/galaxy-rna-seqanalysis-exercise) de novo Assembly: Velvet (last week) + Oases Obtain version of oases compatible with velvet http://www.ebi.ac.uk/~zerbino/oases/ wget …oases_latest.tgz tar –zxvf oases_latest.tgz make VELVET_DIR=/path/to/velvet Put on your $PATH Velvet + Oases with (BAM) paired end read data  Running velveth: velveth outputdir k_mer –bam  -shortPaired read_data.bam  Running velvetg: velvetg outputdir  -ins_length 250 -exp_cov auto  Run oases: oases outputdir -scaffolding yes  -min_trans_lgth 100 -ins_length2 250 -unused_reads yes Sort, Filter and Cluster your Transcripts  Sorting and clustering transcripts. Can use the ‘usearch’ tool (http://www.drive5.com/usearch/) usearch --sort transcripts.fa  --output transcripts.sorted.fa  --minlen min# --maxlen max# --log sorted.log usearch --cluster transcripts.sorted.fa  --id 0.95  --seedsout $@ --uc results.uc  --minlen min# --maxlen max#  --log clustered.log trans-Abyss  Obtain software:  Download http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss  tar –zxvf …/trans-ABySS-v1.3.2.tar.gz  Need to look under the release web page for the manual link. http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss/  releases/1.3.2  Consult this file for full details about how to set up and run trans-ABySS (non-trivial to set up, many dependencies)  To execute, first need to run ABySS (abyss-pe) over a series of kmer values, then run the pipeline.  Unfortunately, NOT installed (yet) under Galaxy…

ppt

Related documents

Products

Support

ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib