11427_OF_15_0095FL - SCIENCE CHINA Life Sciences

advertisement
Supplementary Materials
Supplementary files
Description
Figure S1
Exon and transcript length distributions for lncRNAs and protein-coding
genes
Figure S2
Gene expression distributions for lncRNAs and protein-coding genes
Figure S3
Distributions of tissue specificity scores for lncRNAs and protein-coding
genes
Figure S4
Number of human protein-coding genes found to be conserved through
pair-wise conservation search
Figure S5
Circos plots of lncRNAs conservation atlas for all human chromosomes
Figure S6
Mammalian protein-coding gene expression phylogenies
Figure S7
Mammalian lncRNA expression phylogenies( without rat)
Figure S8
Mammalian lncRNA expression phylogenies (including rat)
Figure S9
Mammalian lincRNA expression phylogenies(without rat)
Figure S10
Mammalian lincRNA expression phylogenies (including rat)
Figure S11
Evolutionary conservation of hotair and xist
Table S1
Statistics of lncRNA catalogues across mammals
Table S2
Statistics of tissue specific protein-coding genes
Table S3
Statistics of tissue specific lncRNAs
Table S4
Number of conserved lncRNA gene loci from pair-wise conservation
search
Table S5
Number of protein-coding genes conserved in different lineages
Table S6
Number of lncRNAs conserved in different lineages
Table S7
Conservation status of 14 evolutionarily conserved eye lincRNAs
Note: The gene set annotations, sequences, tissue specificity and conservation annotations of
lncRNA genes can be downloaded at http://www.bioinfo.org/phyloNoncode/download.html
Figure S1. Exon and transcript length distributions for lncRNAs and protein-coding genes.
Figure S1. Exon and transcript length distributions for lncRNAs and protein-coding genes.
Shown are the density plots of exon and transcript length distributions. The x-axis is log2 of exon
or transcript length, y-axis is the density.
Figure S2. Gene expression distributions for lncRNAs and protein-coding genes
Figure S2. Gene expression distributions for lncRNAs and protein-coding genes. Average
expression abundance (log2-normalized FPKM values estimated by Cufflinks) of six tissues (brain,
cerebellum, heart; kidney; liver and testis) for each lncRNA (dotted real line) and protein-coding
gene (black real line).
Figure S3. Distributions of tissue specificity scores for lncRNAs and protein-coding genes.
Figure S3. Distributions of tissue specificity scores for lncRNAs and protein-coding genes.
Shown are distributions of maximal tissue specificity scores across six tissues for protein-coding
genes (black) and lncRNAs (red).
Figure S4. Number of human protein-coding genes found to be conserved through pair-wise
conservation search
Figure S4. Number of human protein-coding genes found to be conserved through pair-wise
conservation search. The bars for each species represent the conserved counterpart number in the
target species. The line across the bar represents the number of conserved protein-coding genes
that are expressed.
Figure S5. Circos plots of lncRNAs conservation atlas for all human chromosomes.
Figure S5. Circos plots of lncRNAs conservation atlas for all human chromosomes. The outer
to inner rings represent the human lncRNAs to conserved lncRNAs in chicken respectively.
Conservation degree is shown by color depth. Deeper color represents stronger conservation.
Figure S6. Mammalian protein-coding gene expression phylogenies
Figure S6. Mammalian protein-coding gene expression phylogenies. Neighbour-joining trees
are based on pair-wise distance matrices (1-ρ, Spearman’s correlation coefficient) for six tissues
(brain, cerebellum, heart; kidney; liver and testis). Bootstrap values (10061 mammal-conserved
protein-coding genes randomly sampled with replacement 1000 times) are indicated by circles:
white ≥0.9; yellow <0.9
Figure S7. Mammalian lncRNA expression phylogenies (without rat)
Figure S7. Mammalian lncRNA expression phylogenies (without rat). Mammalian expression
phylogenetic trees are constructed from 10770 amniotes-conserved lncRNAs. Bootstrap values are
indicated by circles: white ≥0.9; yellow <0.9
Figure S8. Mammalian lncRNA expression phylogenies (including rat).
Figure S8. Mammalian lncRNA expression phylogenies including rat. Mammalian expression
phylogenetic trees are constructed from 10770 amniotes-conserved lncRNAs. Bootstrap values are
indicated by circles: white ≥0.9; yellow <0.9
Figure S9. Mammalian lincRNA expression phylogenies (without rat).
Figure S9. Mammalian lincRNA expression phylogenies (without rat). Mammalian expression
phylogenetic trees are constructed from 4410 amniotes-conserved lincRNAs. Bootstrap values are
indicated by circles: white ≥0.9; yellow <0.9
Figure S10. Mammalian lincRNA expression phylogenies (including rat).
Figure S10. Mammalian lincRNA expression phylogenies (including rat). Mammalian
expression phylogenetic trees are constructed from 4410 amniotes-conserved lincRNAs. Bootstrap
values are indicated by circles: white ≥0.9; yellow <0.9
Figure S11. Evolutionary conservation of hotair and xist.
Figure S11. Evolutionary conservation of hotair and xist. Hotair gene (PNCG_HSA009366)
was found widely existing in mammals, as shown by the CPCC score. The xist gene
( PNCG_HSA042010) was found existing from humans to the opossum, and was absent in the
platypus and chicken.
Table S1. Statistics of lncRNA catalogues across mammals
Ensembl annotation
Species
Protein-
Pseudo
Non-
coding
gene
lncRNA
Assembly
LncRNA
gene
Combined lncRNA catalog
Protein-coding genes
Novel
Combined
Expressed
fully and partially
lncRNA
lncRNA
lncRNA
reconstructed
catalog
Human
20279
634
7501
10214
16791
4211
42558
13709
Chimp
18459
394
5546
0
15812
9347
9347
9347
Gorilla
20918
1400
5295
0
16829
6517
6517
6517
Orangutan
20065
1023
5772
0
15317
4141
4141
4141
Rhesus
21024
1659
4857
0
16936
8094
8094
8094
Mouse
22367
544
4938
3080
17202
5086
25464
9728
Rat
22935
1417
4425
379
22802
16163
18330
16163
Opossum
19457
722
1750
0
15792
5964
5964
5964
Platypus
17796
488
3213
0
14600
7872
7872
7872
Chicken
16731
96
870
0
14294
6804
6804
6804
Note: The known annotations of protein-coding genes, pseudogenes, non-lncRNA, lncRNA are from Ensembl
database. Non-lncRNAs are noncoding RNAs (microRNA, tRNAs, snoRNAs, rRNAs, snRNAs, and scoRNAs).
Protein-coding genes fully and partiallyreconstructed are the genes which have classcode of =,c,j,e, and o when
compared with the assembled transcripts using cuffcompare. Combined lncRNAs are the unions of lncRNAs
identified from the RNA-seq data and Ensembl (GENCODE, NONCODE and Human Body Map lncRNAs were
also included for human beings)
Table S2.Statistics of tissue-specific protein-coding genes
Species
Brain
Cerebellum
Heart
Kidney
Liver
Testis
Human
168
78
122
233
250
1405
Chimp
142
84
113
272
281
1392
Gorilla
177
74
103
250
266
1341
Orangutan
393
129
186
404
389
NA
Rhesus
208
72
176
257
326
1623
Mouse
134
68
198
334
369
1860
Opossum
157
102
194
286
288
1419
Platypus
151
74
174
292
265
996
Chicken
156
56
127
210
227
1000
Note: The figures in this table are the number of protein-coding genes specifically expressed (JS score >0.9) in
certain tissues.
Table S3. Statistics of tissue-specific lncRNAs.
Species
Brain
Cerebellum
Heart
Kindey
Liver
testis
Human
137
127
99
213
214
2581
Chimp
106
110
95
249
145
2981
Gorilla
111
117
45
182
222
1737
Orangutan
150
114
125
213
285
NA
Rhesus
102
77
66
136
174
3453
Mouse
83
33
136
161
145
3114
Opossum
80
52
123
137
80
2114
Platypus
88
81
141
174
136
1487
Chicken
46
38
55
114
66
2405
Note: The figures in this table are the number of lncRNAs specifically expressed (JS score >0.9) in certain
tissues.
Table S4.Number of conserved lncRNA gene loci from pair-wise conservation search
Species
Human
Chimp
Gorilla
Orangutan
Rhesus
Mouse
Opossum
Platypus
Chicken
Human
42558
42172
41787
41827
41131
37528
29850
23304
12967
(13709)
(11198)
(10367)
(8212)
(9939)
(8379)
(6151)
(4675)
(2707)
9314
9347
9200
9265
8979
7834
6044
4509
3624
(3052)
(9347)
(2712)
(1664)
(2058)
(1144)
(764)
(557)
(499)
6465
6429
6517
6442
6226
5374
4227
3354
1703
(2043)
(2043)
(6517)
(1012)
(1323)
(648)
(473)
(303)
(165)
4098
4077
4038
4141
4024
3510
2359
2463
1377
(1232)
(1208)
(1177)
(4141)
(968)
(586)
(342)
(292)
(164)
7931
7868
7814
7977
8094
7387
4030
4241
1908
(1534)
(1530)
(1373)
(852)
(8094)
(783)
(355)
(328)
(199)
23171
22718
22942
22673
23161
25464
14767
12343
8939
(4827)
(4587)
(4397)
(3984)
(4529)
(9728)
(2463)
(1955)
(1571)
3902
3791
3835
3079
2960
3085
5964
3385
1358
(507)
(486)
(450)
(311)
(391)
(320)
(5964)
(302)
(110)
5332
5302
5261
5355
5255
4632
5579
7872
4008
(575)
(552)
(530)
(457)
(584)
(439)
(542)
(7872)
(340)
2120
3078
2064
2080
1995
2570
2059
3188
6804
(140)
(184)
(134)
(108)
(141)
(155)
(107)
(153)
(6804)
Chimp
Gorilla
Orangutan
Rhesus
Mouse
Opossum
Platypus
Chicken
Note: The figures in brackets are the number of expressed lncRNA genes. A lncRNA loci was defined as
expressed when there is at least one transcript completely or partially (>80%) located in. The green-colored figures
are the numbers of genes to be searched against other species. The black-colored figures are the numbers of genes
to be found existing in the counterpart species.
Table S5. Number of protein-coding genes conserved in different lineages
Primates
Placentals
Placentals &
Mammals
Amniotes conserved
conserved
conserved
marsupials
conserved
(human~chicken)
(human~rhesus)
(human~mouse)
conserved
(human~platypus)
(human~opossum)
Genomicconserved
19635
19371
18125
16417
10061
17050
17156
16623
15235
5616
11277
10739
8238
5188
2526
Conserved and
expressed in at
least one species
of the lineage
Conserved and
expressed in all
species of the
lineage
Table S6. Number of lncRNAs conserved in different lineages
Primates
Placentals
Placentals &
Mammals
Amniotes conserved
conserved
conserved
marsupials
conserved
(human~chicken)
(human~rhesus)
(human~mouse)
conserved
(human~platypus)
(human~opossum)
Genomicconserved
40160
36131
27156
20202
10770
18434
18428
16579
13804
7715
3944
3278
2039
1091
494
Conserved and
expressed in at
least one species
of the lineage
Conserved and
expressed in all
species of the
lineage
Table S7. Conservation status of 14 evolutionarily conserved eye lincRNAs
Eye
Ensembl
PhyloNoncode
lincRNA
ID
ID
Eyelinc1
ENSMUST-
NONMMU-
00000052354
G06030
ENSMUST-
NONMMU-
00000072769
G15482
ENSMUST-
NONMMU-
00000081739
G11409
ENSMUST-
NONMMU-
00000136359
G21963
ENSMUST-
NONMMU-
00000159347
G02305
ENSMUST-
NONMMU-
00000164531
G22687
ENSMUST-
NONMMU-
00000165677
G04644
ENSMUST-
NONMMU-
00000165869
G12698
ENSMUST-
NONMMU-
00000166167
G05017
ENSMUST-
NONMMU-
00000166576
G00683
ENSMUST-
NONMMUG2
00000168221
1420
ENSMUST-
NONMMU-
00000169263
G09672
ENSMUST-
NONMMU-
00000170823
G17470
ENSMUST-
NONMMU-
00000171491
G06070
Eyelinc2
Eyelinc3
Eyelinc6
Eyelinc9
Eyelinc10
Eyelinc11
Eyelinc12
Eyelinc13
Eyelinc14
Eyelinc15
Eyelinc16
Eyelinc17
Eyelinc18
HSA
PTR
GGO
PPY
MML
MMU
RNO
MDO
OAN
GGA
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.1
0.1
0.0
1.0
1.0
0.9
0.9
1.0
1.0
1.0
0.4
0.4
0.4
1.0
1.0
0.5
0.9
1.0
1.0
1.0
0.4
-
0.0
1.0
0.3
0.3
0.3
0.3
0.3
0.3
0.0
0.0
0.0
0.3
1.0
1.0
1.0
1.0
1.0
-
0.9
0.0
0.0
1.0
0.0
0.0
0.0
1.0
0.3
1.0
0.0
0.0
0.0
0.0
1.0
1.0
0.8
0.8
1.0
-
0.9
0.3
0.8
1.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
1.0
1.0
0.8
0.8
-
1.0
0.2
0.0
0.2
0.2
0.0
0.3
0.1
0.0
0.0
0.2
1.0
1.0
1.0
1.0
0.0
1.0
0.0
0.9
1.0
1.0
1.0
0.0
1.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.0
0.0
-
1.0
Note: The table shows the conservation status of 14 eye lincRNAs reported in previous work. HSA~GGA are the
abbreviations for different species (HSA: human; PTR: chimp; GGO: Gorilla; PPY: orangutan; MML: rhesus;
MMU: mouse; RNO: rat; MDO: opossum; OAN: platypus; GGA: chicken). The figures in the corresponding
columns are the CPCC score. “-” denotes no conserved counterpart found in the target species.
Download