CLC Academy Transcriptome assembly workshop

CLC Academy Transcriptome assembly workshop CLC bio Finlandsgade 10-12 8200 Aarhus N Denmark Telephone: +45 70 22 55 09 Fax: +45 70 22 55 19 www.clcbio.com info@clcbio.com CLC Academy Contents 1 Axolotl data set on command line 3 2 Bombus terrestris in the Workbench 4 3 Litomosoides sigmodontis in the Workbench 5 3.1 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 RNA-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 CLC Academy Chapter 1 Axolotl data set on command line The first exercise focuses on the command line program, the CLC Assembly Cell. First, we run the assembler using the default parameters on the trimmed file: clc_novo_assemble -v -q Trimmed_Axolotl_GAP7TUS01_02_100000reads.sff -o Axolotl_default.fa Remember the -v option which tells you the word size automatically calculated. It should be 18. Check the result using the sequence_info program with the -n option which outputs N50: sequence_info -n Axolotl_default.fa Now, you can try to adjust the word size to see the effect on the assembly: clc_novo_assemble -v -q Trimmed_Axolotl_GAP7TUS01_02_100000reads.sff -o Axolotl_small_kmer.fa -w 16 To see the effect on the assembly, try to run the untrimmed data set as well: clc_novo_assemble -v -q Axolotl_GAP7TUS01_02_100000reads.sff -o Axolotl_small_kmer_untrimmed.fa -w 16 Take the best assembly (which is Axolotl_small_kmer.fa) for submission to the BLAST script used yesterday to evaluate the assembly. 3 CLC Academy Chapter 2 Bombus terrestris in the Workbench Open the CLC Genomics Workbench and go to the Tool bar: NGS Import ( ) | Roche 454 Select the 454 Flowgram(.sff) file filter in the import dialog and import the trimmed file: Trimmed_Bombus_terrestris_100000reads.sff. Choose to Save the results. Run the de novo assembly from the Toolbox: Toolbox | High-throughput Sequencing ( ) | De Novo Assembly ( ) Select the read file and proceed through the dialogs, leaving the settings at default. At the last step, select Simple contigs as output (see figure 2.1). Figure 2.1: Choose simple contigs. Export ( ) the resulting data for submission to the BLAST script used yesterday to evaluate the assembly. 4 CLC Academy Chapter 3 Litomosoides sigmodontis in the Workbench Import the five sff files into the Workbench. Since the reads still contain adapter sequences, they need to be trimmed prior to assembly. First, add the MINT adapter to the list of adapters used for trimming: Edit | Preferences ( ) | Data At the bottom of the panel, under Adapter trimming, press Add Row. Add information about the adapter sequence (see figure 3.1). Figure 3.1: Add the MINT adapter. 5 CHAPTER 3. LITOMOSOIDES SIGMODONTIS IN THE WORKBENCH 6 The sequence is AAGCAGTGGTATCAACGCAGAGTACGGGG. When double-clicking the alignment score, you can specify settings for the match. This is used for a simple Smith-Waterman alignment against the reads. Each match in the alignment is awarded one point. Since the adapter is 29 long, we set a score of 20. That will allow three mismatches (matches score 26 mismatch costs at 6) or two/three indels. We also allow end gaps with a score of 2. This means that adapters sitting at the end of the read will be removed. Although we will make some false hits here, this is better than leaving adapter trim parts on the reads. CLC Academy Next, click OK and start the trimming: Toolbox | High-throughput Sequencing ( ) | Trim Sequences ( ) Select all the five read files and click Next. Leave the settings at default to trim away low quality regions on the reads. In the next step, select the MINT adapter you added and click the Search both strands option (see figure 3.2). Figure 3.2: Trim preview. Since adapter trimming can be quite tricky, we have added a preview panel that dynamically shows how the settings in the dialog affects the trim of the first 1000 reads selected. You can double-click the Alignment score for example to see how a different scoring threshold affects the trim result. Click through the rest of the wizard and choose to Save the results. Once the analysis is complete, open the trim report and you can get an overview of how much was trimmed because of low quality and adapter trim, respectively. P. 6 CHAPTER 3. LITOMOSOIDES SIGMODONTIS IN THE WORKBENCH 3.1 7 Assembly Run de novo assembly on the trimmed data as explained for the previous data set. As there are well over 700,000 reads this will take a few minutes (this is Java software - be inspired and go get a cup of coffee). Export ( CLC Academy 3.2 ) the results as fasta for comparison using the BLAST script. RNA-Seq The following will show one of the ways of working downstream with the data produced by the assembly. We will go through a RNA-Seq work flow using the contigs from the assembly as reference sequences. Basically we want to compare the expression of the samples to identify candidate genes that are differentially expressed. For simplicity’s sake we will analyze two of the samples, the t_msc... and t_fsc since they are the smallest. For each sample, perform the following (remember to use the trimmed reads): Toolbox | High-throughput Sequencing ( ) | RNA-Seq Analysis ( ) Select reads from one of the samples and click Next. Select to use Reference without annotation and select the contigs created by the de novo assembly. Leave the rest of the settings at default and proceed to the last step and Save the results. If you open the results, you will see that each contig from the assembly now has an expression value based on the number of reads mapped back to it. Select the results and create an experiment to compare the expression of the two samples: Toolbox | Expression Analysis ( ) | Set Up Experiment ( ) Setting up an experiment is a way to define the groups for a comparative analysis. This will create a new file with all the statistics for each of the RNA-Seq samples. Create two groups and assign one sample to each group. Open the experiment created and switch to the Scatter Plot ( ) at the lower left corner of the view. This will plot the expression values of the two samples against each other and let you get a feel for the spread of expression (see figure 3.3). Feel free to explore all the tools in the expression analysis toolbox to get a feel of the possibilities working further on this data. P. 7 8 CLC Academy CHAPTER 3. LITOMOSOIDES SIGMODONTIS IN THE WORKBENCH Figure 3.3: Scatter plot. P. 8

CLC Academy Transcriptome assembly workshop

Related documents

Products

Support

CLC Academy Transcriptome assembly workshop

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib