tpj13014-sup-0017-Legends

LEGENDS FOR SUPPORTING INFORMATION Figure S1. Effect of counting method on transcript abundance estimates for maize simulated RNAseq reads and the relationship with transcript size. Alignments were performed requiring a unique best alignment. Graphs were scaled to show the majority of the distribution and exclude extreme outliers (N indicates number of data points per graph). (a) Relationship between HTSeq counts and Cufflinks fragment per kilobase of exon model per million fragments mapped (FPKM) values. (b) Relationship between HTSeq derived FPKM values and Cufflinks FPKM values. (c) Relationship between HTSeq derived FPKM values and Cufflinks FPKM values for transcripts greater than 300 nucleotides. Figure S2. Genome browser view of example Arabidopsis non-flagged and flagged genes from simulated RNAseq reads. Alignments were performed requiring a unique alignment. Table inset provides the gene(s), observed/expected values, and the flagged status for each panel. (a) Example of a gene with an acceptable observed/expected value. (b) Example of overlapping genes with acceptable observed/expected values. (c) Example of overlapping genes with exons overlapping on the same strand of DNA. One gene maintains an acceptable observed/expected value, whereas one gene does not. (d) Example of a multi gene family based on OrthoMCL grouping. In this case the genes share 100% sequence and have unacceptable observed/expected values. Figure S3. Percent deviation from expected values for simulated RNAseq reads in seven species. Alignments were performed requiring a unique best alignment (U) and allowing multiple alignments (M). In this scale, values less than 100% are undercounted, 100% is perfect counting, and values greater than 100% are over counted. Figure S4. Effect of overlapping gene models on transcript abundance estimates. (a) Diagram showing how reads aligning to overlapping gene models are treated with default HTSeq counting parameters, implying the genes are on the same strand and counting based on stranded data. Reads that contain an “x” in the diagram are disregarded during counting. (b) The number of genes in each species that have overlapping exons on the same strand with greater than 20% overlap or less than or equal to 20% overlap. (c) The relationship between observed count divided by expected count and percent of exon overlap for genes with at least one nucleotide overlapping another exon gene model on the same strand for unique alignments. The red box indicates genes for which percent overlap is not the only attribute that contributed to observed values that deviated from expected values. Figure S5. Conservation of flagged genes across species. (a) Venn diagram of flagged genes in each grass species. Syntenic genes across species were previously determined (Schnable et al., 2012). Genes present in sub-genome1 of maize and with a syntenic ortholog in both rice and brachpodium were selected for this analysis. The subset of genes that were flagged in each species were identified and used to generate the Venn diagram. (b) Enriched GO terms for biological processes for five of the simulated RNAseq species flagged genes. A yellow box denotes the terms enriched for each species, with summary metrics listed in the upper left. Figure S6. Distribution of average expression values for a maize developmental expression atlas categorized by transcript length. FPKM values were previously calculated (Stelpflug et al.).

tpj13014-sup-0017-Legends

Related documents

Products

Support

tpj13014-sup-0017-Legends

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib