file - BioMed Central

advertisement
Supplemental Information
Percentage of paired-end and single-end data mapped.
If we allow the reads mapped to up to 10 different locations (10 hits), about 80% of
the data could be mapped. Only about 15% of the data could not be mapped on the
mouse reference genome. We further analyzed mapping results when we only
allowed unique hit reads. For E18 stage data, about 60% of the paired-end data
could be uniquely mapped onto the mm9 genome, while only about 40% of the
single-end data could be uniquely mapped. For P7 stage data, the uniquely mapped
percentage was 62% for paired-end data and 54% for single-end data. 57% of the
entire E18 data could be uniquely mapped, and 61% of the entire P7 data could be
uniquely mapped. There was no noticeable difference in reads quality between
paired-end reads and single-end reads. All E18 and P7 reads, including paired-end
and single-end ones, was 36 bp in length. There was no noticeable difference in
reads quality between the E18 data and P7 data, either. We found that paired-end
data had a higher mappability if the RNA-seq data was mapped using TopHat [1].
The distribution of the most expressed genes among chromosomes.
The top 500 most expressed genes were selected according to the previously
published method [2]. From each chromosome, the number of genes belonging to
the top 500 was counted. The counted number for each chromosome was then
normalized against each chromosome’s length in a method similar to the RPKM
method [3]:
1
CN 
C 1012
CT  L
(S1)

CN: normalized count for a given chromosome.
C: number of the most expressed genes located in the given chromosome.
CT: total number of the selected genes (in this case, 500).
L: length of the given chromosome (unit: bp)
The distribution of the most expressed genes among chromosomes for E18 stage
was very similar to that for P7 stage (Fig. S5). Both stages’ profiles were also similar
to the total gene expression distribution among chromosomes (Fig. S4).
2
Table S1. Chromosomal expression across stages in read counts
hESC
chr2
92476
11635
1
89415
13322
8
N2
12100
3
19501
0
chr3
68408
chr4
97551
76658
10728
9
92907
12398
8
99243
13795
4
chr5
76508
80364
chr6
chr7
81063
10292
0
78476
11568
0
83364
11025
7
12394
6
94414
10212
5
14432
3
chr8
58980
61532
chr9
chr1
0
chr1
1
chr1
2
chr1
3
chr1
4
chr1
5
chr1
6
chr1
7
chr1
8
chr1
9
77712
93932
74119
10233
9
79362
11061
7
67252
15284
9
73139
16642
0
77072
19616
7
89781
21568
9
51725
54751
71727
76497
42278
47373
51176
63538
50762
49722
56582
60894
69589
81617
79826
88766
40058
39991
47270
60747
63137
72609
82793
84524
51396
50382
58234
65976
48964
55859
67622
67584
chrX
chrY
65669
621
63436
372
98576
593
86681
779
chr1
N1
N3
11997
2
23184
5
E18
57664
2
86936
8
47207
0
62042
3
67664
1
51371
9
72680
1
54277
1
57440
5
51433
5
94512
2
46620
6
37499
4
33530
0
45465
7
33907
9
47587
0
33139
9
33580
1
33253
9
1008
P7
565546
106854
8
AMB
166459
1
273992
9
108296
7
172401
1
191483
6
133147
8
304719
5
180258
2
166740
1
153371
9
294699
1
484446
968689
342598
826123
351659
744911
313915
395946
343972
917395
148999
8
406472
631873
697665
749796
826537
131110
7
276625
646429
413794
839180
265056
516904
408372
855284
117464
3
750516
899498
358502
645
809519
1033
223777
1697
286129
2438
617325
977243
486040
635012
740624
538667
928153
626383
649769
546428
369717
498847
337969
AMM
836121
119039
4
597877
681761
503478
496872
339448
2
150385
3
AML
110157
9
106615
3
109176
3
107089
7
270804
6
926322
155014
7
874972
837867
126006
6
574747
141705
9
755932
117626
4
Table S2. Principal inertias (eigenvalues) of the Correspondence Analysis
Value
Percentage
Accumulated
1
0.043309
66.34%
66.3%
2
0.019397
29.71%
96.1%
3
0.001323
2.03%
98.1%
3
4
0.000912
1.4%
99.5%
5
0.000177
0.27%
99.7%
Percentage
4
Table S3. Manually validated expressed exons physically connected with other
exons.
Chromosome Name
chr14
chr14
chr14
chr7
chr7
chr7
chr2
chr16
chr16
chr16
chr5
chr5
chr12
chr12
chr12
chr7
chr7
chr7
chr7
chr7
chr5
chr5
chr5
chr5
chr5
chr1
chr8
chr6
chr6
chr6
Start Position
79688556
79690050
79691477
25207317
25206137
25205830
130533586
66664625
66661317
7353023
36200474
36192164
81268532
81269844
81272714
29681107
29681893
29677979
29679519
29683087
121975057
121975057
121975670
121976746
121978875
158738249
125411214
28376068
28375737
28374721
5
Stop Postion
79688927
79690115
79691586
25207797
25206284
25205961
130535067
66664753
66663459
7353902
36200614
36192845
81269469
81270003
81272863
29681254
29682074
29679406
29679679
29683270
121976421
121975250
121975779
121976871
121979001
158739325
125411526
28376500
28375866
28374832
Table S4. Manually validated expressed single exon genes (SGEs).
Chromosome Name
chr12
chr12
chr5
chr15
chr2
chr7
chrX
chrX
chr3
chr14
chr12
chr4
chr8
chr9
Start Position
17178905
28020080
35621214
84383827
53936503
69493162
119508169
149932774
61168428
109309205
8504564
88364195
73222847
123168174
6
Stop Postion
17183331
28027577
35624412
84388253
53937963
69494813
119510994
149936101
61172625
109313456
8506791
88368412
73224515
123169907
Table S5. E18 intronic TARs with hits in miRbase
Query
Query
Length
Subject
Subject
Length
BitSc
ore
chr1-3385274633857360
chr1-5698966156990743
chr2-67196966721827
chr3-158158463158160069
chr4-4583432845836371
chr11-104173757104179094
chr11-5710374357105116
chr12-118489259118490932
chr12-6830809168309299
chr14-5569794555699546
chr18-4300693143008045
chr19-1622799216228349
4615
mmumir-1935
mmumir-1935
mmumir-1935
mmumir-1935
mmumir-1935
mmumir-1935
mmumir-1935
mdo-mir153-2
mmumir-1935
mmumir-1935
mmumir-1935
mmumir-1935
60
77.8
60
85.7
60
79.8
60
71.9
60
93.7
60
95.6
60
69.9
87
157
60
85.7
60
69.9
60
87.7
60
73.8
1083
2132
1607
2044
5338
1374
1674
1209
1602
1115
358
7
Evalu
e
1.66E
-06
8.28E
-08
6.12E
-07
6.78E
-06
5.60E
-09
1.65E
-08
6.14E
-06
9.24E
-17
8.28E
-08
6.14E
-06
6.09E
-08
1.66E
-06
Alignment
Length
Matc
hed
Ga
ps
55
51
0
51
49
0
52
49
0
60
54
0
55
53
0
60
57
0
55
50
0
87
85
0
55
52
0
55
50
0
60
56
0
53
49
0
Table S6. E18 intronic TARs with hits in lncRNAdb
Query
Query
Length
Subject
Subject
Length
BitSc
ore
chr6-4920573749207540
chr7-7517948175181002
chr11-104173757104179094
chr11-2400477224006928
chr17-51042615111031
chr19-4041965640421427
1804
B2 SINE
RNA
B2 SINE
RNA
B2 SINE
RNA
B2 SINE
RNA
B2 SINE
RNA
B2 SINE
RNA
177
311
177
165
177
301
177
272
177
103
177
311
1522
5338
2157
6771
1772
8
Evalu
e
1.22E
-37
1.56E
-18
7.33E
-36
1.98E
-32
5.58E
-10
1.22E
-37
Alignment
Length
Matc
hed
Ga
ps
177
172
0
142
128
2
176
170
0
177
168
1
104
91
0
177
172
0
Supplemental Figure Legends
Figure S1. 100 Kb resolution expression map of chromosome X. All symbols
represent the same information as in Fig. 1, except each horizontal box now
represents only a 100 Kb genome region.
Figure S2. Expression level correlation between exonic and non-exonic regions. A.
Genome wide exonic expression vs. intronic expression. Interval size = 1mb. B.
Genome wide exonic expression vs. intergenic expression. Interval size = 1mb. C.
Chromosome 11 exonic expression vs. intronic expression. Interval size = 100kb. D.
Chromosome 11 exonic expression vs. intergenic expression. Interval size = 100kb.
Only regions with both type of expression were analyzed.
Figure S3. Individual chromosome’s expression level measured in RPKM* as
described in formula (1). A. RNA-seq reads of all neural samples, along with hESC
reads, mapped onto mouse reference genome. B. Comparison between adult mouse
brain, liver and muscle. C. RNA-seq reads of four human samples mapped onto
human reference genome. D. Standard deviation (StdDev) of chromosomal
expression level across datasets. E. Mitochondrial expression level across datasets.
(hESC: human Embryonic Stem Cell, N1: early initiation of hESC, N2: neural
progenitor cell induced from hESC, N3: early glial-like cell from hESC, E18:
embryonic day 18 mouse brain cortices, P7: post-natal day 7 mouse brain cortices,
AMB: adult mouse brain, AMM: adult mouse muscle, AML: adult mouse liver)
9
Figure S4. 1. Amino acid alignment of the intronic TAR detected in mouse ATP2B1
gene and ATP2B1 exons from mouse and human. 2. DNA alignment of the intronic
TAR detected in mouse Trim 3 gene and Trim3 exons from rat, dog and human.
Figure S5. DNA level conservation between the intronic TARs detected in mouse
Zeb2, Ntrk3 and Odz2 genes and introns from rat, dog, human and opossum of the
same gene.
Figure S6. Scatter plot of orthologous gene expression level between selected stages.
Genes without detectable expression were not included.
Figure S7. Chromosomal distribution of the top 500 most highly expressed genes in
E18 and P7 stages. Y-axis CN was calculated as decribed in formula (S1).
10
Reference:
1.
2.
3.
Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions
with RNA-Seq. Bioinformatics 2009, 25:1105-1111.
Han X, Wu X, Chung WY, Li T, Nekrutenko A, Altman NS, Chen G, Ma H:
Transcriptome of embryonic and neonatal mouse cortex by highthroughput RNA sequencing. Proc Natl Acad Sci U S A 2009, 106:1274112746.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008,
5:621-628.
11
Download