tpj13014-sup-0017-Legends

advertisement
LEGENDS FOR SUPPORTING INFORMATION
Figure S1. Effect of counting method on transcript abundance estimates for maize
simulated RNAseq reads and the relationship with transcript size. Alignments were
performed requiring a unique best alignment. Graphs were scaled to show the majority
of the distribution and exclude extreme outliers (N indicates number of data points per
graph). (a) Relationship between HTSeq counts and Cufflinks fragment per kilobase of
exon model per million fragments mapped (FPKM) values. (b) Relationship between
HTSeq derived FPKM values and Cufflinks FPKM values. (c) Relationship between HTSeq
derived FPKM values and Cufflinks FPKM values for transcripts greater than 300
nucleotides.
Figure S2. Genome browser view of example Arabidopsis non-flagged and flagged genes
from simulated RNAseq reads. Alignments were performed requiring a unique
alignment. Table inset provides the gene(s), observed/expected values, and the flagged
status for each panel. (a) Example of a gene with an acceptable observed/expected
value. (b) Example of overlapping genes with acceptable observed/expected values. (c)
Example of overlapping genes with exons overlapping on the same strand of DNA. One
gene maintains an acceptable observed/expected value, whereas one gene does not. (d)
Example of a multi gene family based on OrthoMCL grouping. In this case the genes
share 100% sequence and have unacceptable observed/expected values.
Figure S3. Percent deviation from expected values for simulated RNAseq reads in seven
species. Alignments were performed requiring a unique best alignment (U) and allowing
multiple alignments (M). In this scale, values less than 100% are undercounted, 100% is
perfect counting, and values greater than 100% are over counted.
Figure S4. Effect of overlapping gene models on transcript abundance estimates. (a)
Diagram showing how reads aligning to overlapping gene models are treated with
default HTSeq counting parameters, implying the genes are on the same strand and
counting based on stranded data. Reads that contain an “x” in the diagram are
disregarded during counting. (b) The number of genes in each species that have
overlapping exons on the same strand with greater than 20% overlap or less than or
equal to 20% overlap. (c) The relationship between observed count divided by expected
count and percent of exon overlap for genes with at least one nucleotide overlapping
another exon gene model on the same strand for unique alignments. The red box
indicates genes for which percent overlap is not the only attribute that contributed to
observed values that deviated from expected values.
Figure S5. Conservation of flagged genes across species. (a) Venn diagram of flagged
genes in each grass species. Syntenic genes across species were previously determined
(Schnable et al., 2012). Genes present in sub-genome1 of maize and with a syntenic
ortholog in both rice and brachpodium were selected for this analysis. The subset of
genes that were flagged in each species were identified and used to generate the Venn
diagram. (b) Enriched GO terms for biological processes for five of the simulated RNAseq
species flagged genes. A yellow box denotes the terms enriched for each species, with
summary metrics listed in the upper left.
Figure S6. Distribution of average expression values for a maize developmental
expression atlas categorized by transcript length. FPKM values were previously
calculated (Stelpflug et al.).
Download