Supplementary Table 1. Sequence data used. Species

advertisement
1/15
Supplementary Table 1.
Sequence data used.
Species
Oryza sativa
Oryza spp.b
Triticum aestivum (wheat)
Zea mays (maize)
Hordeum vulgare (barley)
Saccharum officinarum (sugarcane)
Sorghum bicolor (sorghum)
Arabidopsis thaliana (thale cress)
a
# of mRNAs
# of ESTs
34,887a
285,019
10
5,868
1,934
565,328
10,754
415,235
1,006
391,861
123
246,301
86
190,949
59,734
322,651
32,127 FLcDNAs are included.
b
Oryza species other than O. sativa.
Note: rice full-length cDNAs (as of October 1, 2004) and other sequences (as of September 1,
2004) were retrieved from the International Nucleotide Sequence Databases.
2/15
Supplementary Table 2.
Features of O. sativa and A. thaliana transcripts.
O. sativa
with mRNAs
Number of exons
A. thaliana
Predictions
with mRNAs
106,447
33,937
101,396
5.19
4.89
5.40
4,473 (21.8%)
1,235 (17.8%)
3,583 (19.1%)
333
292
263
Mean first exon length (bp)
412
379
323
Mean internal exon length (bp)
177
207
159
Mean last exon length (bp)
648
363
476
423
490
168
Mean mRNA length (bp)
1,728
1,428
1,428
Mean pre-mRNA length (bp)
3,501
3,334
2,160
Exon coverage on the genome
35.4 Mbp
9.9 Mbp
26.8 Mbp
Transcribed genomic regions
71.8 Mbp
23.1 Mbp
40.5 Mbp
Mean exon number
Number of single exon loci
Mean exon length (bp)
Mean intron length (bp)
3/15
Supplementary Table 3. Classification of transposable elements in the gneome and
mRNAs.
Genome
TIGR code1
Class I
Copy No.
Coverage (bp)
Coverage (%)2
Ty1-copia
TERT001
6,612
5,604,084
1.51
Ty3-gypsy
TERT002
25,426
30,662,518
8.27
LINE
TERT003
477
183,043
0.05
p-SINE1
TEMT011
3,620
500,981
0.14
Other class I
TERTOOT
21,264
18,680,085
5.04
Ac/Ds
TETN001
1,598
225,023
0.06
TETN002
17,651
15,060,940
4.06
MULE
TETN003
3,700
799,682
0.22
MLE
TETN004
326
78,309
0.02
Stowaway
TEMT002
317
27,335
0.01
Tourist
TEMT001
16,149
3,834,704
1.03
42,301
6,492,629
1.75
Other TE
130,256
21,721,885
5.86
Total
269,697
103,871,218
28.01
CACTA,
En/Spm
Class II
TETN005,
Other class II
TETNOOT
4/15
mRNA
TIGR code1
Class I
Copy No.
Coverage (bp)
Coverage (%)3
Ty1-copia
TERT001
91
35,202
0.07
Ty3-gypsy
TERT002
227
70,338
0.14
LINE
TERT003
5
1,150
0.00
p-SINE1
TEMT011
66
7,401
0.02
Other class I
TERTOOT
224
100,203
0.20
Ac/Ds
TETN001
14
2,637
0.01
TETN002
105
22,732
0.05
MULE
TETN003
52
13,544
0.03
MLE
TETN004
46
5,690
0.01
Stowaway
TEMT002
3
235
0.00
Tourist
TEMT001
149
25,308
0.05
357
52,752
0.11
Other TE
1,264
165,199
0.34
Total
2,603
502,391
1.03
CACTA,
En/Spm
Class II
TETN005,
Other class II
TETNOOT
1
For the TIGR codes, see http://www.tigr.org/tdb/e2k1/plant.repeats/repeat.code.shtml.
2
Fraction in the genome
3
Fraction in the total mRNAs
5/15
Supplementary Table 4.
Features of annotated non-protein-coding (np) RNAs.
Feature
Multi-exon
Single-exon
Total
npRNA
108 (100%)
23 (100%)
131 (100%)
Mean length (bp)
1186
965
N.A.*
2.8
1.0
N.A.*
EST support
47 (43.5%)
5 (21.7%)
52 (39.7%)
polyadenylation signal
18 (16.7%)
19 (82.6%)
37 (28.2%)
2 (1.9%)
0 (0%)
2 (1.5%)
Mean exon number
genomic polyadenosine
*Not available.
6/15
Supplementary Table 5. Putative rice antisense npRNAs and their sense genes.
as-npRNA
Chr Sense gene
Sense gene description
(A) Antisensse to known protein genes:
Os02g0180800
2
Os02g0180700 Cinnamoyl-CoA reductase (EC 1.2.1.44)
Os03g0118500
3
Os03g0118600 Dihydrodipicolinate reductase-like protein
Os03g0127100
3
Os03g0127200 NAM protein
Os05g0577000
5
Os05g0576900 PIN1-like auxin transport protein
Os06g0514700
6
Os06g0514600 Cyclophilin-RNA interacting protein
Os07g0653300
7
Os07g0653200 BLE2 protein
Os07g0654800
7
Os07g0654700 BLE2 protein
Os08g0103900 NAM-like protein
Os08g0103700
8
Os08g0103600 BTP/POZ domain containing protein
Os12g0114900
12
Os12g0115000 Lipid transfer protein LPT II
Os12g0132900
12
Os12g0133000 Major facilitator superfamily antiporter
Os07g0524300
7
Os07g0524400 Nucleolin (Protein C23)
(B) Antisense to domain-containing protein genes:
Os02g0684800
2
Os02g0684900 Zn-finger, FYVE type domain containing protein
Os01g0494300
1
Os01g0494400 Retrotransposon gag protein family protein
Os09g0429300
9
Os09g0429200 Ionotropic glutamate receptor family protein
Os08g0538100
8
Os08g0538200 Plant protein of unknown function family protein
Os06g0664000
6
Os06g0663900 Protein kinase domain containing protein
Os09g0471300
9
Os09g0471400 Protein kinase domain containing protein
Os10g0142700
10
Os10g0142600 Protein kinase domain containing protein
Os11g0173600
11
Os11g0173700 Protein kinase domain containing protein
Os05g0323400
5
Os05g0323300 BED finger domain containing protein
7/15
Os04g0588500
4
Os04g0588600 ABC transporter domain containing protein
Os09g0278900
9
Os09g0279000 ENT domain containing protein
Os11g0697100
11
Os11g0697200 Eukaryotic protein of unknown function DUF889
family protein
Os04g0172600
4
Os04g0172500 RNase H domain containing protein
Os01g0119800
1
Os01g0119700 Ubiquitin domain containing protein
Os06g0477600
6
Os06g0477500 Viral coat and capsid protein family protein
Os06g0555900
6
Os06g0556000 Amino acid carrier fragment
(C) Antisense to hypothetical protein genes:
Os01g0646400
1
Os01g0646500 Conserved hypothetical protein
Os03g0442800
3
Os03g0442900 Conserved hypothetical protein
Os06g0134200
6
Os06g0134100 Conserved hypothetical protein
Os11g0204500
11
Os11g0204400 Conserved hypothetical protein
Os12g0256600
12
Os12g0256500 Conserved hypothetical protein
Os01g0810700
1
Os01g0810600 Hypothetical protein
Os02g0228600
2
Os02g0228700 Hypothetical protein
Os02g0779500
2
Os02g0779600 Hypothetical protein
Os02g0792100
2
Os02g0792200 Hypothetical protein
Os02g0289300
2
Os02g0289400 Hypothetical protein (single-exon)
Os04g0308200
4
Os04g0308000 Hypothetical protein
Os05g0137800
5
Os05g0137900 Hypothetical protein
Os05g0294800
5
Os05g0294700 Hypothetical protein
Os05g0115200
5
Os05g0115300 Hypothetical protein
Os06g0516800
6
Os06g0516900 Hypothetical protein
Os07g0590700
7
Os07g0590800 Hypothetical protein
8/15
Os08g0384700
8
Os08g0384800 Hypothetical protein
Os08g0391300
8
Os08g0391200 Hypothetical protein
Os08g0555600
8
Os08g0555700 Hypothetical protein
Os09g0309900
9
Os09g0310000 Hypothetical protein
Os09g0321500
9
Os09g0321600 Hypothetical protein
Os09g0469500
9
Os09g0469600 Hypothetical protein
Os10g0479100
10
Os10g0479000 Hypothetical protein
Os11g0286500
11
Os11g0286400 Hypothetical protein
Os12g0255000
12
Os12g0255100 Hypothetical protein
Os12g0545600
12
Os12g0545500 Hypothetical protein
Os12g0199200
12
Os12g0199300 Hypothetical protein (single-exon)
Os04g0601200
4
Os04g0601300 Hypothetical protein
9/15
Supplementary Table 6.
Isoacceptor tRNA gene copy number and the relative
synonymous codon usage (RSCU).
Amino acid
Codon
Gly
GGU
0
0.80
GGC
28
1.58
GGA
10
0.78
GGG
9
0.84
GUU
17
0.94
GUC
16
1.21
GUA
4
0.40
GUG
10
1.46
AAA
12
0.64
AAG
20
1.36
AAU
0
0.89
AAC
29
1.11
CAA
21
0.73
CAG
10
1.27
CAU
0
0.90
CAC
26
1.10
GAA
16
0.71
GAG
25
1.29
GAU
1
0.94
GAC
31
1.06
UAU
2
0.76
UAC
19
1.24
Val
Lys
Asn
Gln
His
Glu
Asp
Tyr
Gene number
RSCU
10/15
Cys
UGU
1
0.65
UGC
17
1.35
UUU
0
0.72
UUC
20
1.28
AUU
18
1.00
AUC
0
1.39
AUA
5
0.61
Met
AUG
56
Trp
UGG
18
Arg
AGA
12
0.93
AGG
11
1.40
CGU
24
0.61
CGC
0
1.50
CGA
4
0.49
CGG
8
1.07
CUU
15
1.02
CUC
0
1.76
CUA
10
0.46
CUG
9
1.46
UUA
4
0.38
UUG
19
0.93
AGU
0
0.66
AGC
20
1.22
UCU
12
0.97
UCC
4
1.24
Phe
Ile
Leu
Ser
11/15
Thr
Pro
Ala
UCA
17
0.97
UCG
8
0.94
ACU
11
0.88
ACC
8
1.24
ACA
15
0.97
ACG
5
0.91
CCU
14
0.94
CCC
0
0.85
CCA
14
0.99
CCG
9
1.21
GCU
20
0.84
GCC
1
1.31
GCA
10
0.75
GCG
12
1.10
Note. - Most abundant isoacceptor tRNAs and codons are written in boldface.
12/15
Supplementary Table 7. The top 40 InterPro hits in O. sativa and A. thaliana.
Rank IPR ID
Name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
IPR011009
IPR000719
IPR002290
IPR001245
IPR008271
IPR001611
IPR008941
IPR001810
IPR002885
IPR009057
IPR007090
IPR001841
IPR008938
IPR001128
IPR002182
IPR000767
IPR008940
IPR002401
IPR000504
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
IPR003593
IPR000379
IPR001680
IPR003591
IPR001005
IPR011046
IPR002110
IPR010983
IPR009007
IPR002048
IPR001440
IPR010255
IPR002016
IPR002213
IPR008994
IPR001878
IPR001092
IPR003612
38
IPR001687
Protein kinase-like
Protein kinase
Serine/threonine protein kinase
Tyrosine protein kinase
Serine/threonine protein kinase, active site
Leucine-rich repeat
TPR-like
Cyclin-like F-box
Pentatricopeptide repeat
Homeodomain-like
Leucine-rich repeat, plant specific
Zn-finger, RING
ARM repeat fold
Cytochrome P450
NB-ARC
Disease resistance protein
Protein prenyltransferase
E-class P450, group I
RNA-binding region RNP-1 (RNA recognition
motif)
AAA ATPase
Esterase/lipase/thioesterase
WD-40 repeat
Leucine-rich repeat, typical subtype
Myb, DNA-binding
WD40-like
Ankyrin
EF-Hand-like
Peptidase aspartic
Calcium-binding EF-hand
TPR repeat
Haem peroxidase
Haem peroxidase, plant/fungal/bacterial
UDP-glucuronosyl/UDP-glucosyltransferase
Nucleic acid-binding OB-fold
Zn-finger, CCHC type
Basic helix-loop-helix dimerisation region bHLH
Plant lipid transfer/seed storage/trypsin-alpha
amylase inhibitor
ATP/GTP-binding site motif A (P-loop)
# of O. sativa
proteins
1277
1221
1150
1114
842
666
557
398
391
365
354
351
322
303
291
274
273
255
249
244
237
233
233
229
224
201
187
169
167
155
148
146
146
144
141
138
138
137
13/15
39
40
IPR001650
IPR001410
Helicase, C-terminal
DEAD/DEAH box helicase
Rank IPR ID
Name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
IPR011009
IPR000719
IPR002290
IPR001245
IPR008271
IPR001810
IPR008941
IPR001611
IPR002885
IPR009057
IPR001841
IPR008938
IPR007090
IPR008940
IPR003593
IPR001005
IPR006527
IPR000504
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
IPR001128
IPR011046
IPR001680
IPR008994
IPR002401
IPR000379
IPR010983
IPR011043
IPR003591
IPR011011
IPR002048
IPR011050
IPR000767
IPR001687
IPR002182
IPR001092
IPR001410
IPR011424
IPR001650
IPR002110
Protein kinase-like
Protein kinase
Serine/threonine protein kinase
Tyrosine protein kinase
Serine/threonine protein kinase, active site
Cyclin-like F-box
TPR-like
Leucine-rich repeat
Pentatricopeptide repeat
Homeodomain-like
Zn-finger, RING
ARM repeat fold
Leucine-rich repeat, plant specific
Protein prenyltransferase
AAA ATPase
Myb, DNA-binding
F-box protein interaction domain
RNA-binding region RNP-1 (RNA recognition
motif)
Cytochrome P450
WD40-like
WD-40 repeat
Nucleic acid-binding OB-fold
E-class P450, group I
Esterase/lipase/thioesterase
EF-Hand-like
Galactose oxidase, central
Leucine-rich repeat, typical subtype
FYVE/PHD zinc finger
Calcium-binding EF-hand
Pectin lyase-like
Disease resistance protein
ATP/GTP-binding site motif A (P-loop)
NB-ARC
Basic helix-loop-helix dimerisation region bHLH
DEAD/DEAH box helicase
C1-like
Helicase, C-terminal
Ankyrin
134
132
# of A. thaliana
proteins
1075
1042
1008
984
731
606
603
539
463
452
430
364
329
308
306
297
256
251
246
237
234
224
222
217
205
181
172
169
168
164
159
155
154
149
149
146
144
142
14/15
39
40
IPR001440
IPR006566
TPR repeat
FBD
142
142
15/15
Supplementary Table 8.
InterPro IDs of potential frequent hitters excluded from
functional descriptions.
PS00001 (IPR000042)
N-glycosylation site
PS00002 (IPR002179)
Glycosaminoglycan attachment site
PS00003 (IPR002032)
Tyrosine sulfation site
PS00004 (IPR001833)
cAMP/cGMP-dependent protein kinase,
phosphorylation site
PS00005 (IPR001495)
Protein kinase C, phosphorylation site
PS00006 (IPR000430)
Casein kinase II phosphorylation site
PS00007 (IPR000220)
Tyrosine kinase phosphorylation site
PS00008 (IPR000338)
N-myristoylation site
PS00009 (IPR000134)
Amidation site
PS00010 (IPR000152)
Aspartic acid and asparagine hydroxylation site
PS00015 (IPR001430)
Bipartite nuclear targeting sequence
PS00016 (IPR001918)
Cell attachment region
PS00029 (IPR002158)
Leucine zipper
PS50079 (IPR001472)
Bipartite nuclear localization signal
PS50099 (IPR000694)
Proline-rich region
PS50101/PS00017 (IPR001687)
ATP/GTP-binding site motif A
PR01217 (IPR002965)
Proline-rich extensin
PR00019/PF00560 (IPR001611)
Leucine-rich repeat
Download