Figure S1 - BioMed Central

advertisement

Additional file 1

Figure S1. Quality for each base in reads viewed by software FastQC

Figure S2. Base composition of reads as visualized by the software FastQC

Figure S3. Evaluation of false mapping reads using duplicated genes

Figure S4. Genome-wide assessment of mapping RNA-seq reads

Figure S5. Sequencing randomness assessment

Figure S6. Gene coverage statistics

Figure S7. Comparison of gene expression in two samples of root and leaf tissues in gene expression

Figure S8.

RT-PCR validation of novel transcripts

Figure S9. Cumulative distribution of FPKM values of all and novel transcripts in each sample

Figure S10. Expression profile of B. rapa genome displayed in Integrative Genomics

Viewer (IGV)

Figure S11. Numbers for different types of dinucleotides at the splicing borders

Figure S12. Genes with multiple AS events

Callus_1.fq

Sample GSM1059153

Callus_2.fq

Sample GSM1059153

Root_1_1.fq

Sample GSM1059157

Root_1_2.fq

Sample GSM1059157

Root_2_1.fq

Sample GSM1059158

Root_2_2.fq

Sample GSM1059158

Leaf_1_1.fq

Sample GSM1059155

Leaf_1_2.fq

Sample GSM1059155

Leaf_2_1.fq

Sample GSM1059156

Leaf_2_2.fq

Sample GSM1059156

Flower_1.fq

Sample GSM1059154

Flower_2.fq

Sample GSM1059154

Stem_1.fq

Sample GSM1059160

Stem_2.fq

Sample GSM1059160

Silique_1.fq

Sample GSM1059159

Silique_2.fq

Sample GSM1059159

Figure S1.

Quality for each base in reads viewed by software FastQC.

Callus_1.fq

Sample GSM1059153

Callus_2.fq

Sample GSM1059153

Leaf_1_1.fq

Sample GSM1059155

Leaf_1_2.fq

Sample GSM1059155

Root_1_1.fq

Sample GSM1059157

Root_1_2.fq

Sample GSM1059157

Leaf_2_1.fq

Sample GSM1059156

Leaf_2_2.fq

Sample GSM1059156

Root_2_1.fq

Sample GSM1059158

Root_2_2.fq

Sample GSM1059158

Flower_1.fq

Sample GSM1059154

Flower_2.fq

Sample GSM1059154

Stem_1.fq

Sample GSM1059160

Stem_2.fq

Sample GSM1059160

Silique_1.fq

Sample GSM1059159

Silique_2.fq

Sample GSM1059159

Figure S2. Base composition of reads as visualized by the software FastQC.

Figure S3. Evaluation of false mapping reads using duplicated genes.

(a) Sequence imilarities between duplicated gene pairs (10447) retained in triplicated genomic regions of B. rapa . After aligning, the identity between two duplicated gene sequences were calculated according to the all aligned regions (red line) and aligned regions after removing aligned regions containing gap (-) (blue line). Sequence similarities between duplicated genes distributed over a wide range and showed a peak around 90%.

Considering an at most 2 bp mis-match were allowed in read mapping, the similarity between trimmed reads and targeted reference sequences should be more than (81-2/81) = 97.53%. We observed that less than 0.2%

(14) of duplicated gene pairs have similarity higher than 96%. (b) Mapping of reads to the duplicated genes retained from triplicated genomic regions. We statistic the number of reads multiple mapped to duplicated genes and the number of duplicated gene pairs affected by multiply-mapped reads (Table 4) and it shows that very few of them have been affected.

Thus, the duplicated genes derived from Brassiceae lineage whole genome triplication (13–17 million years ago) have diverged to an extent that could be distinguishing by RNA-seq reads.

A

Introns

(3%)

B

Exon (81%)

Contiguously mapped reads

(74%)

Unique mapped reads

(100%)

Exon Intron Intergenic regions

Novel transcripts

Figure S4. Genome-wide assessment of mapping RNA-seq reads. (A) Circle plot of the RNA-seq reads mapping to the B. rapa genome. The innermost depict the percentage of reads corresponding to contiguously, splicing mapped reads and unmapped reads. The middle circles depict the percentage of RNA-seq with unique/multi-mapping. The innermost circle represents the distribution of reads in exons, introns and intergenic regions according to the annotated gene set. (B) Box and whisker plots of log2-transformed average read depth per base for the following genomic regions: all intergenic sequences, introns, coding sequences, and identified novel transcripts. The three horizontal lines in the boxes show the first quantile, median and third quantile, respectively. The other two horizontal lines show the inner boundaries, and empty circles represent data outside of the inner boundaries.

Root_2 Stem

Root_1

Leaf_1 Leaf_2

Flower

Callus

Silique

Figure S5. Sequencing randomness assessment. The randomness of mRNA/cDNA fragmentation was evaluated with the reads distribution in reference genes. Each gene was divided into 100 windows according to the total length of coding sequences. The number of reads in each window of gene = the sum of read-depth for each base in the window / 81(bp).

~66%

Root_1

~7%

~5%

~2%

~4%

~4%

~3%

~3%

~3%

~3%

~71%

1

2

4

5

6

7

8

9

~6%

10

~4%

~2%

~3%

~3%

~3%

~3%

~2%

~3%

~71%

7

8

9

5

6

~6%

10

1

2

4

3

Stem

~4%

~2%

~3%

~3%

~3%

~3%

~3%

~2%

~63%

1

2

Leaf_1

~9%

~9%

~5%

~6%

~4%

~4%

~57%

~3%

~4%

~3%

~3%

~2%

~4%

7

8

9

4

5

6

10

Silique

~4% ~4%

~4%

~4%

~3%

~5%

~8%

~6%

7

8

9

4

5

6

10

~4%

1

~60%

2

Callus

~4%

~3%

~4%

~4%

~2%

~5%

~8%

~64%

~5%

Flower

7

8

9

10

4

5

6

1

2

3

~4%

~3%

~4%

~3%

~3%

~2%

~4%

Figure S6. Gene coverage statistics.

The length percentages of reads coverage for genes were obtained. Then, the total gene numbers and percentage in each range of coverage were showed in pies.

8

9

10

5

6

7

1

2

3

4

7

8

9

10

1

2

3

4

5

6

Figure S7. Comparison of gene expression in two samples of root and leaf tissues in gene expression. The expression level of B. rapa annotated genes was quantified by normalized number of fragments per kilo bases per million reads (FPKM) using

Cufflinks software (Trapnell, 2010 Nature Biotechnol 28, 5: 511-515). The Pearson's correlation (R value) was calculated between the log2-transformed FPKM values of two samples for leaf and root tissue, respectively.

3

4

5

1

2

6

ID

A)

Transcripts

ID

Primer 1

(5’-3’)

Primer 2

(5’-3’)

Size

(bp)

CUFF.35943 AGTAACTTTCAATCTTGCCTCT GGCGGAGTAAACAACAAATA 159

CUFF.22518 GCCAGTGAGTGAGCATAGGG TGTGCCAAGGAGCTTTAGAC 305

CUFF.7947 GTATTGGCTTACTGGCAGATGG GAAAGGTCCGAGTTTGGTGG 496

CUFF.16809 TGCATAGCGAGCAAACAGAC GGGTGGGCGACTTCTAATGT 312

CUFF.25971 GTTTACCGAACCCGAAGTCT CGTGGAGCAAATACAGGAGG 362

310 CUFF.36671 GATGAGCCTTTGATTAGAACTT GTTGTCCTGGTCTGTCCCTA

B)

Scaffold

Scaffold000170

A07

A03

A05

A08

Scaffold004752

Start End

259523 259934

10371787 10373240

5018474 5020442

19809444 19811671

10262976 10263810

166 420

Figure S8. RT-PCR validation of novel transcripts. (A) The primers designed for 6 randomly selected novel transcripts.

(B) The RT-PCR results of amplification of 6 (ID

1-6 in this picture) randomly selected novel transcripts predicted by Cufflink in this paper. Each of six Cufflink-predicted transcripts was amplified in five tissues: root, stem, leaf, flower and silique (from left to right in panel for each transcript). The primers for amplifying the selected novel cufflink TU in this paper are as follows.

100

80

60

40

20

0

Root_1 p-value < 2.2e-16

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

100

80

60

40

20

0

Leaf_1 p-value < 2.2e-16

100

80

60

40

20

0

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

Silique p-value = 0.5009

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

100

80

100

80

60

40

20

0

Root_2 p-value < 2.2e-16

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

Leaf_2

60

40

20 p-value < 2.2e-16

0

100

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

Callus

80

60

40

20

0 p-value = 0.07625

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

60

40

20

0

100

80

60

40

Stem

20 p-value < 2.2e-16

0

100

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

Flower

80 p-value < 2.2e-16

-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM

Figure S9. Cumulative distribution of FPKM values of all and novel transcripts in each sample. The P-value was calculated from Wilcoxon test for annotated novel transcripts (red line) and all identified transcripts (green line) in each tissue.

Figure S10. Expression profile of B. rapa genome displayed in Integrative

Genomics Viewer (IGV). (A) An identified novel transcribed regions (for example), which was not annotated in B. rapa chromosome.

(B) The tissue conserved expression level and the extension of gene structure (the peak for each base) shown for a gene

(Bra028466) in A07 chromosome of B. rapa .

180000

160000

140000

120000

100000

80000

60000

40000

153984

Junctions

20000

1912 579 41

0

GT-AG GC-AG AT-AC others dinucleotides at the splicing borders of junctions

Figure S11. Numbers for different types of dinucleotides at the splicing borders.

5000

4500

4405

4000

3500

3000

2500

2000

1500

1000

500

0

1568

697

383

203 134 94 76 27 29 19 10 11 6 3 2 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Number of AS events within a gene

Figure S12. Genes with multiple AS events. The gene number (y-axis) for gene with different AS events number (x-axis).

Download