SUPPLEMENTARY FIGURE LEGEND

advertisement
ADDITIONAL MATERIALS
Additonal Materials 2: Arabidopsis 3’ UTR, 5’ UTR and Upstream ORF datasets (HTML).
We used the annotation of the NCBI 2005 assembly of Arabidopsis thaliana. From the
annotation we extracted the position informations of CDS and mRNA exons of each
annotated genes. Comparing these two types of data (CDS and mRNA) we defined the length
of 3' and 5' UTRs and the positions of introns.
In the html tables the UTR lengths are calculated for spliced UTRs. Position of an intron
means the first base of the spliced UTR downstream of the intron. 3’ UTR dataset lists
Arabidopsis genes according to the length of their 3’ UTR.
Number of annotated Arabidopsis genes: 26519
Number of annotated Arabidopsis genes having either 5’ UTR, 3’ UTR or both 5’ and 3’
UTRs:18494
Number of annotated genes that contain intron in the 3’ UTR: 693, 81 contain more than 1
intron
Number of annotated genes that have intron > 54 nt downstream of the stop codon: 257
Arabidopsis thaliana upstream ORF (uORF) dataset. uORF is defined as an at least 10 amino
acids long ORF located in the 5’ UTR. uORF positions are defined on spliced (intron
subtracted ) 5’ UTR sequences.
Additonal Materials 3: Distribution of introns in different eukaryots. Note that except the
very 5’ and 3’ regions of coding sequences, the plant introns are distributed equally within the
coding region.
Download