Figure S1. Quality for each base in reads viewed by software FastQC
Figure S2. Base composition of reads as visualized by the software FastQC
Figure S3. Evaluation of false mapping reads using duplicated genes
Figure S4. Genome-wide assessment of mapping RNA-seq reads
Figure S5. Sequencing randomness assessment
Figure S6. Gene coverage statistics
Figure S7. Comparison of gene expression in two samples of root and leaf tissues in gene expression
Figure S8.
RT-PCR validation of novel transcripts
Figure S9. Cumulative distribution of FPKM values of all and novel transcripts in each sample
Figure S10. Expression profile of B. rapa genome displayed in Integrative Genomics
Viewer (IGV)
Figure S11. Numbers for different types of dinucleotides at the splicing borders
Figure S12. Genes with multiple AS events
Callus_1.fq
Sample GSM1059153
Callus_2.fq
Sample GSM1059153
Root_1_1.fq
Sample GSM1059157
Root_1_2.fq
Sample GSM1059157
Root_2_1.fq
Sample GSM1059158
Root_2_2.fq
Sample GSM1059158
Leaf_1_1.fq
Sample GSM1059155
Leaf_1_2.fq
Sample GSM1059155
Leaf_2_1.fq
Sample GSM1059156
Leaf_2_2.fq
Sample GSM1059156
Flower_1.fq
Sample GSM1059154
Flower_2.fq
Sample GSM1059154
Stem_1.fq
Sample GSM1059160
Stem_2.fq
Sample GSM1059160
Silique_1.fq
Sample GSM1059159
Silique_2.fq
Sample GSM1059159
Figure S1.
Quality for each base in reads viewed by software FastQC.
Callus_1.fq
Sample GSM1059153
Callus_2.fq
Sample GSM1059153
Leaf_1_1.fq
Sample GSM1059155
Leaf_1_2.fq
Sample GSM1059155
Root_1_1.fq
Sample GSM1059157
Root_1_2.fq
Sample GSM1059157
Leaf_2_1.fq
Sample GSM1059156
Leaf_2_2.fq
Sample GSM1059156
Root_2_1.fq
Sample GSM1059158
Root_2_2.fq
Sample GSM1059158
Flower_1.fq
Sample GSM1059154
Flower_2.fq
Sample GSM1059154
Stem_1.fq
Sample GSM1059160
Stem_2.fq
Sample GSM1059160
Silique_1.fq
Sample GSM1059159
Silique_2.fq
Sample GSM1059159
Figure S2. Base composition of reads as visualized by the software FastQC.
Figure S3. Evaluation of false mapping reads using duplicated genes.
(a) Sequence imilarities between duplicated gene pairs (10447) retained in triplicated genomic regions of B. rapa . After aligning, the identity between two duplicated gene sequences were calculated according to the all aligned regions (red line) and aligned regions after removing aligned regions containing gap (-) (blue line). Sequence similarities between duplicated genes distributed over a wide range and showed a peak around 90%.
Considering an at most 2 bp mis-match were allowed in read mapping, the similarity between trimmed reads and targeted reference sequences should be more than (81-2/81) = 97.53%. We observed that less than 0.2%
(14) of duplicated gene pairs have similarity higher than 96%. (b) Mapping of reads to the duplicated genes retained from triplicated genomic regions. We statistic the number of reads multiple mapped to duplicated genes and the number of duplicated gene pairs affected by multiply-mapped reads (Table 4) and it shows that very few of them have been affected.
Thus, the duplicated genes derived from Brassiceae lineage whole genome triplication (13–17 million years ago) have diverged to an extent that could be distinguishing by RNA-seq reads.
A
Introns
(3%)
B
Exon (81%)
Contiguously mapped reads
(74%)
Unique mapped reads
(100%)
Exon Intron Intergenic regions
Novel transcripts
Figure S4. Genome-wide assessment of mapping RNA-seq reads. (A) Circle plot of the RNA-seq reads mapping to the B. rapa genome. The innermost depict the percentage of reads corresponding to contiguously, splicing mapped reads and unmapped reads. The middle circles depict the percentage of RNA-seq with unique/multi-mapping. The innermost circle represents the distribution of reads in exons, introns and intergenic regions according to the annotated gene set. (B) Box and whisker plots of log2-transformed average read depth per base for the following genomic regions: all intergenic sequences, introns, coding sequences, and identified novel transcripts. The three horizontal lines in the boxes show the first quantile, median and third quantile, respectively. The other two horizontal lines show the inner boundaries, and empty circles represent data outside of the inner boundaries.
Root_2 Stem
Root_1
Leaf_1 Leaf_2
Flower
Callus
Silique
Figure S5. Sequencing randomness assessment. The randomness of mRNA/cDNA fragmentation was evaluated with the reads distribution in reference genes. Each gene was divided into 100 windows according to the total length of coding sequences. The number of reads in each window of gene = the sum of read-depth for each base in the window / 81(bp).
~66%
Root_1
~7%
~5%
~2%
~4%
~4%
~3%
~3%
~3%
~3%
~71%
1
2
4
5
6
7
8
9
~6%
10
~4%
~2%
~3%
~3%
~3%
~3%
~2%
~3%
~71%
7
8
9
5
6
~6%
10
1
2
4
3
Stem
~4%
~2%
~3%
~3%
~3%
~3%
~3%
~2%
~63%
1
2
Leaf_1
~9%
~9%
~5%
~6%
~4%
~4%
~57%
~3%
~4%
~3%
~3%
~2%
~4%
7
8
9
4
5
6
10
Silique
~4% ~4%
~4%
~4%
~3%
~5%
~8%
~6%
7
8
9
4
5
6
10
~4%
1
~60%
2
Callus
~4%
~3%
~4%
~4%
~2%
~5%
~8%
~64%
~5%
Flower
7
8
9
10
4
5
6
1
2
3
~4%
~3%
~4%
~3%
~3%
~2%
~4%
Figure S6. Gene coverage statistics.
The length percentages of reads coverage for genes were obtained. Then, the total gene numbers and percentage in each range of coverage were showed in pies.
8
9
10
5
6
7
1
2
3
4
7
8
9
10
1
2
3
4
5
6
Figure S7. Comparison of gene expression in two samples of root and leaf tissues in gene expression. The expression level of B. rapa annotated genes was quantified by normalized number of fragments per kilo bases per million reads (FPKM) using
Cufflinks software (Trapnell, 2010 Nature Biotechnol 28, 5: 511-515). The Pearson's correlation (R value) was calculated between the log2-transformed FPKM values of two samples for leaf and root tissue, respectively.
3
4
5
1
2
6
ID
A)
Transcripts
ID
Primer 1
(5’-3’)
Primer 2
(5’-3’)
Size
(bp)
CUFF.35943 AGTAACTTTCAATCTTGCCTCT GGCGGAGTAAACAACAAATA 159
CUFF.22518 GCCAGTGAGTGAGCATAGGG TGTGCCAAGGAGCTTTAGAC 305
CUFF.7947 GTATTGGCTTACTGGCAGATGG GAAAGGTCCGAGTTTGGTGG 496
CUFF.16809 TGCATAGCGAGCAAACAGAC GGGTGGGCGACTTCTAATGT 312
CUFF.25971 GTTTACCGAACCCGAAGTCT CGTGGAGCAAATACAGGAGG 362
310 CUFF.36671 GATGAGCCTTTGATTAGAACTT GTTGTCCTGGTCTGTCCCTA
B)
Scaffold
Scaffold000170
A07
A03
A05
A08
Scaffold004752
Start End
259523 259934
10371787 10373240
5018474 5020442
19809444 19811671
10262976 10263810
166 420
Figure S8. RT-PCR validation of novel transcripts. (A) The primers designed for 6 randomly selected novel transcripts.
(B) The RT-PCR results of amplification of 6 (ID
1-6 in this picture) randomly selected novel transcripts predicted by Cufflink in this paper. Each of six Cufflink-predicted transcripts was amplified in five tissues: root, stem, leaf, flower and silique (from left to right in panel for each transcript). The primers for amplifying the selected novel cufflink TU in this paper are as follows.
100
80
60
40
20
0
Root_1 p-value < 2.2e-16
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
100
80
60
40
20
0
Leaf_1 p-value < 2.2e-16
100
80
60
40
20
0
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
Silique p-value = 0.5009
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
100
80
100
80
60
40
20
0
Root_2 p-value < 2.2e-16
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
Leaf_2
60
40
20 p-value < 2.2e-16
0
100
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
Callus
80
60
40
20
0 p-value = 0.07625
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
60
40
20
0
100
80
60
40
Stem
20 p-value < 2.2e-16
0
100
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
Flower
80 p-value < 2.2e-16
-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 log2-FPKM
Figure S9. Cumulative distribution of FPKM values of all and novel transcripts in each sample. The P-value was calculated from Wilcoxon test for annotated novel transcripts (red line) and all identified transcripts (green line) in each tissue.
Figure S10. Expression profile of B. rapa genome displayed in Integrative
Genomics Viewer (IGV). (A) An identified novel transcribed regions (for example), which was not annotated in B. rapa chromosome.
(B) The tissue conserved expression level and the extension of gene structure (the peak for each base) shown for a gene
(Bra028466) in A07 chromosome of B. rapa .
180000
160000
140000
120000
100000
80000
60000
40000
153984
Junctions
20000
1912 579 41
0
GT-AG GC-AG AT-AC others dinucleotides at the splicing borders of junctions
Figure S11. Numbers for different types of dinucleotides at the splicing borders.
5000
4500
4405
4000
3500
3000
2500
2000
1500
1000
500
0
1568
697
383
203 134 94 76 27 29 19 10 11 6 3 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of AS events within a gene
Figure S12. Genes with multiple AS events. The gene number (y-axis) for gene with different AS events number (x-axis).