1/15 Supplementary Table 1. Sequence data used. Species Oryza sativa Oryza spp.b Triticum aestivum (wheat) Zea mays (maize) Hordeum vulgare (barley) Saccharum officinarum (sugarcane) Sorghum bicolor (sorghum) Arabidopsis thaliana (thale cress) a # of mRNAs # of ESTs 34,887a 285,019 10 5,868 1,934 565,328 10,754 415,235 1,006 391,861 123 246,301 86 190,949 59,734 322,651 32,127 FLcDNAs are included. b Oryza species other than O. sativa. Note: rice full-length cDNAs (as of October 1, 2004) and other sequences (as of September 1, 2004) were retrieved from the International Nucleotide Sequence Databases. 2/15 Supplementary Table 2. Features of O. sativa and A. thaliana transcripts. O. sativa with mRNAs Number of exons A. thaliana Predictions with mRNAs 106,447 33,937 101,396 5.19 4.89 5.40 4,473 (21.8%) 1,235 (17.8%) 3,583 (19.1%) 333 292 263 Mean first exon length (bp) 412 379 323 Mean internal exon length (bp) 177 207 159 Mean last exon length (bp) 648 363 476 423 490 168 Mean mRNA length (bp) 1,728 1,428 1,428 Mean pre-mRNA length (bp) 3,501 3,334 2,160 Exon coverage on the genome 35.4 Mbp 9.9 Mbp 26.8 Mbp Transcribed genomic regions 71.8 Mbp 23.1 Mbp 40.5 Mbp Mean exon number Number of single exon loci Mean exon length (bp) Mean intron length (bp) 3/15 Supplementary Table 3. Classification of transposable elements in the gneome and mRNAs. Genome TIGR code1 Class I Copy No. Coverage (bp) Coverage (%)2 Ty1-copia TERT001 6,612 5,604,084 1.51 Ty3-gypsy TERT002 25,426 30,662,518 8.27 LINE TERT003 477 183,043 0.05 p-SINE1 TEMT011 3,620 500,981 0.14 Other class I TERTOOT 21,264 18,680,085 5.04 Ac/Ds TETN001 1,598 225,023 0.06 TETN002 17,651 15,060,940 4.06 MULE TETN003 3,700 799,682 0.22 MLE TETN004 326 78,309 0.02 Stowaway TEMT002 317 27,335 0.01 Tourist TEMT001 16,149 3,834,704 1.03 42,301 6,492,629 1.75 Other TE 130,256 21,721,885 5.86 Total 269,697 103,871,218 28.01 CACTA, En/Spm Class II TETN005, Other class II TETNOOT 4/15 mRNA TIGR code1 Class I Copy No. Coverage (bp) Coverage (%)3 Ty1-copia TERT001 91 35,202 0.07 Ty3-gypsy TERT002 227 70,338 0.14 LINE TERT003 5 1,150 0.00 p-SINE1 TEMT011 66 7,401 0.02 Other class I TERTOOT 224 100,203 0.20 Ac/Ds TETN001 14 2,637 0.01 TETN002 105 22,732 0.05 MULE TETN003 52 13,544 0.03 MLE TETN004 46 5,690 0.01 Stowaway TEMT002 3 235 0.00 Tourist TEMT001 149 25,308 0.05 357 52,752 0.11 Other TE 1,264 165,199 0.34 Total 2,603 502,391 1.03 CACTA, En/Spm Class II TETN005, Other class II TETNOOT 1 For the TIGR codes, see http://www.tigr.org/tdb/e2k1/plant.repeats/repeat.code.shtml. 2 Fraction in the genome 3 Fraction in the total mRNAs 5/15 Supplementary Table 4. Features of annotated non-protein-coding (np) RNAs. Feature Multi-exon Single-exon Total npRNA 108 (100%) 23 (100%) 131 (100%) Mean length (bp) 1186 965 N.A.* 2.8 1.0 N.A.* EST support 47 (43.5%) 5 (21.7%) 52 (39.7%) polyadenylation signal 18 (16.7%) 19 (82.6%) 37 (28.2%) 2 (1.9%) 0 (0%) 2 (1.5%) Mean exon number genomic polyadenosine *Not available. 6/15 Supplementary Table 5. Putative rice antisense npRNAs and their sense genes. as-npRNA Chr Sense gene Sense gene description (A) Antisensse to known protein genes: Os02g0180800 2 Os02g0180700 Cinnamoyl-CoA reductase (EC 1.2.1.44) Os03g0118500 3 Os03g0118600 Dihydrodipicolinate reductase-like protein Os03g0127100 3 Os03g0127200 NAM protein Os05g0577000 5 Os05g0576900 PIN1-like auxin transport protein Os06g0514700 6 Os06g0514600 Cyclophilin-RNA interacting protein Os07g0653300 7 Os07g0653200 BLE2 protein Os07g0654800 7 Os07g0654700 BLE2 protein Os08g0103900 NAM-like protein Os08g0103700 8 Os08g0103600 BTP/POZ domain containing protein Os12g0114900 12 Os12g0115000 Lipid transfer protein LPT II Os12g0132900 12 Os12g0133000 Major facilitator superfamily antiporter Os07g0524300 7 Os07g0524400 Nucleolin (Protein C23) (B) Antisense to domain-containing protein genes: Os02g0684800 2 Os02g0684900 Zn-finger, FYVE type domain containing protein Os01g0494300 1 Os01g0494400 Retrotransposon gag protein family protein Os09g0429300 9 Os09g0429200 Ionotropic glutamate receptor family protein Os08g0538100 8 Os08g0538200 Plant protein of unknown function family protein Os06g0664000 6 Os06g0663900 Protein kinase domain containing protein Os09g0471300 9 Os09g0471400 Protein kinase domain containing protein Os10g0142700 10 Os10g0142600 Protein kinase domain containing protein Os11g0173600 11 Os11g0173700 Protein kinase domain containing protein Os05g0323400 5 Os05g0323300 BED finger domain containing protein 7/15 Os04g0588500 4 Os04g0588600 ABC transporter domain containing protein Os09g0278900 9 Os09g0279000 ENT domain containing protein Os11g0697100 11 Os11g0697200 Eukaryotic protein of unknown function DUF889 family protein Os04g0172600 4 Os04g0172500 RNase H domain containing protein Os01g0119800 1 Os01g0119700 Ubiquitin domain containing protein Os06g0477600 6 Os06g0477500 Viral coat and capsid protein family protein Os06g0555900 6 Os06g0556000 Amino acid carrier fragment (C) Antisense to hypothetical protein genes: Os01g0646400 1 Os01g0646500 Conserved hypothetical protein Os03g0442800 3 Os03g0442900 Conserved hypothetical protein Os06g0134200 6 Os06g0134100 Conserved hypothetical protein Os11g0204500 11 Os11g0204400 Conserved hypothetical protein Os12g0256600 12 Os12g0256500 Conserved hypothetical protein Os01g0810700 1 Os01g0810600 Hypothetical protein Os02g0228600 2 Os02g0228700 Hypothetical protein Os02g0779500 2 Os02g0779600 Hypothetical protein Os02g0792100 2 Os02g0792200 Hypothetical protein Os02g0289300 2 Os02g0289400 Hypothetical protein (single-exon) Os04g0308200 4 Os04g0308000 Hypothetical protein Os05g0137800 5 Os05g0137900 Hypothetical protein Os05g0294800 5 Os05g0294700 Hypothetical protein Os05g0115200 5 Os05g0115300 Hypothetical protein Os06g0516800 6 Os06g0516900 Hypothetical protein Os07g0590700 7 Os07g0590800 Hypothetical protein 8/15 Os08g0384700 8 Os08g0384800 Hypothetical protein Os08g0391300 8 Os08g0391200 Hypothetical protein Os08g0555600 8 Os08g0555700 Hypothetical protein Os09g0309900 9 Os09g0310000 Hypothetical protein Os09g0321500 9 Os09g0321600 Hypothetical protein Os09g0469500 9 Os09g0469600 Hypothetical protein Os10g0479100 10 Os10g0479000 Hypothetical protein Os11g0286500 11 Os11g0286400 Hypothetical protein Os12g0255000 12 Os12g0255100 Hypothetical protein Os12g0545600 12 Os12g0545500 Hypothetical protein Os12g0199200 12 Os12g0199300 Hypothetical protein (single-exon) Os04g0601200 4 Os04g0601300 Hypothetical protein 9/15 Supplementary Table 6. Isoacceptor tRNA gene copy number and the relative synonymous codon usage (RSCU). Amino acid Codon Gly GGU 0 0.80 GGC 28 1.58 GGA 10 0.78 GGG 9 0.84 GUU 17 0.94 GUC 16 1.21 GUA 4 0.40 GUG 10 1.46 AAA 12 0.64 AAG 20 1.36 AAU 0 0.89 AAC 29 1.11 CAA 21 0.73 CAG 10 1.27 CAU 0 0.90 CAC 26 1.10 GAA 16 0.71 GAG 25 1.29 GAU 1 0.94 GAC 31 1.06 UAU 2 0.76 UAC 19 1.24 Val Lys Asn Gln His Glu Asp Tyr Gene number RSCU 10/15 Cys UGU 1 0.65 UGC 17 1.35 UUU 0 0.72 UUC 20 1.28 AUU 18 1.00 AUC 0 1.39 AUA 5 0.61 Met AUG 56 Trp UGG 18 Arg AGA 12 0.93 AGG 11 1.40 CGU 24 0.61 CGC 0 1.50 CGA 4 0.49 CGG 8 1.07 CUU 15 1.02 CUC 0 1.76 CUA 10 0.46 CUG 9 1.46 UUA 4 0.38 UUG 19 0.93 AGU 0 0.66 AGC 20 1.22 UCU 12 0.97 UCC 4 1.24 Phe Ile Leu Ser 11/15 Thr Pro Ala UCA 17 0.97 UCG 8 0.94 ACU 11 0.88 ACC 8 1.24 ACA 15 0.97 ACG 5 0.91 CCU 14 0.94 CCC 0 0.85 CCA 14 0.99 CCG 9 1.21 GCU 20 0.84 GCC 1 1.31 GCA 10 0.75 GCG 12 1.10 Note. - Most abundant isoacceptor tRNAs and codons are written in boldface. 12/15 Supplementary Table 7. The top 40 InterPro hits in O. sativa and A. thaliana. Rank IPR ID Name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 IPR011009 IPR000719 IPR002290 IPR001245 IPR008271 IPR001611 IPR008941 IPR001810 IPR002885 IPR009057 IPR007090 IPR001841 IPR008938 IPR001128 IPR002182 IPR000767 IPR008940 IPR002401 IPR000504 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 IPR003593 IPR000379 IPR001680 IPR003591 IPR001005 IPR011046 IPR002110 IPR010983 IPR009007 IPR002048 IPR001440 IPR010255 IPR002016 IPR002213 IPR008994 IPR001878 IPR001092 IPR003612 38 IPR001687 Protein kinase-like Protein kinase Serine/threonine protein kinase Tyrosine protein kinase Serine/threonine protein kinase, active site Leucine-rich repeat TPR-like Cyclin-like F-box Pentatricopeptide repeat Homeodomain-like Leucine-rich repeat, plant specific Zn-finger, RING ARM repeat fold Cytochrome P450 NB-ARC Disease resistance protein Protein prenyltransferase E-class P450, group I RNA-binding region RNP-1 (RNA recognition motif) AAA ATPase Esterase/lipase/thioesterase WD-40 repeat Leucine-rich repeat, typical subtype Myb, DNA-binding WD40-like Ankyrin EF-Hand-like Peptidase aspartic Calcium-binding EF-hand TPR repeat Haem peroxidase Haem peroxidase, plant/fungal/bacterial UDP-glucuronosyl/UDP-glucosyltransferase Nucleic acid-binding OB-fold Zn-finger, CCHC type Basic helix-loop-helix dimerisation region bHLH Plant lipid transfer/seed storage/trypsin-alpha amylase inhibitor ATP/GTP-binding site motif A (P-loop) # of O. sativa proteins 1277 1221 1150 1114 842 666 557 398 391 365 354 351 322 303 291 274 273 255 249 244 237 233 233 229 224 201 187 169 167 155 148 146 146 144 141 138 138 137 13/15 39 40 IPR001650 IPR001410 Helicase, C-terminal DEAD/DEAH box helicase Rank IPR ID Name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 IPR011009 IPR000719 IPR002290 IPR001245 IPR008271 IPR001810 IPR008941 IPR001611 IPR002885 IPR009057 IPR001841 IPR008938 IPR007090 IPR008940 IPR003593 IPR001005 IPR006527 IPR000504 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 IPR001128 IPR011046 IPR001680 IPR008994 IPR002401 IPR000379 IPR010983 IPR011043 IPR003591 IPR011011 IPR002048 IPR011050 IPR000767 IPR001687 IPR002182 IPR001092 IPR001410 IPR011424 IPR001650 IPR002110 Protein kinase-like Protein kinase Serine/threonine protein kinase Tyrosine protein kinase Serine/threonine protein kinase, active site Cyclin-like F-box TPR-like Leucine-rich repeat Pentatricopeptide repeat Homeodomain-like Zn-finger, RING ARM repeat fold Leucine-rich repeat, plant specific Protein prenyltransferase AAA ATPase Myb, DNA-binding F-box protein interaction domain RNA-binding region RNP-1 (RNA recognition motif) Cytochrome P450 WD40-like WD-40 repeat Nucleic acid-binding OB-fold E-class P450, group I Esterase/lipase/thioesterase EF-Hand-like Galactose oxidase, central Leucine-rich repeat, typical subtype FYVE/PHD zinc finger Calcium-binding EF-hand Pectin lyase-like Disease resistance protein ATP/GTP-binding site motif A (P-loop) NB-ARC Basic helix-loop-helix dimerisation region bHLH DEAD/DEAH box helicase C1-like Helicase, C-terminal Ankyrin 134 132 # of A. thaliana proteins 1075 1042 1008 984 731 606 603 539 463 452 430 364 329 308 306 297 256 251 246 237 234 224 222 217 205 181 172 169 168 164 159 155 154 149 149 146 144 142 14/15 39 40 IPR001440 IPR006566 TPR repeat FBD 142 142 15/15 Supplementary Table 8. InterPro IDs of potential frequent hitters excluded from functional descriptions. PS00001 (IPR000042) N-glycosylation site PS00002 (IPR002179) Glycosaminoglycan attachment site PS00003 (IPR002032) Tyrosine sulfation site PS00004 (IPR001833) cAMP/cGMP-dependent protein kinase, phosphorylation site PS00005 (IPR001495) Protein kinase C, phosphorylation site PS00006 (IPR000430) Casein kinase II phosphorylation site PS00007 (IPR000220) Tyrosine kinase phosphorylation site PS00008 (IPR000338) N-myristoylation site PS00009 (IPR000134) Amidation site PS00010 (IPR000152) Aspartic acid and asparagine hydroxylation site PS00015 (IPR001430) Bipartite nuclear targeting sequence PS00016 (IPR001918) Cell attachment region PS00029 (IPR002158) Leucine zipper PS50079 (IPR001472) Bipartite nuclear localization signal PS50099 (IPR000694) Proline-rich region PS50101/PS00017 (IPR001687) ATP/GTP-binding site motif A PR01217 (IPR002965) Proline-rich extensin PR00019/PF00560 (IPR001611) Leucine-rich repeat