Group Populus: Petra van Berkel Casper Gerritsen Astri Herlino Brian Lavrijssen Dataset of S. cerevisiae Data generated by Nookaew et al (2012) Two conditions: Glucose excess (Batch) & Glucose limited (Chemostat) 3 Biological replicates per condition RNA-seq data: 12 Files 3 Sets of Paired-end reads per condition Pipeline for differential gene expression analysis TopHat – Cufflinks analysis Protocols based on Trapnell et al (2012) 75% of reads mapped Plots based on Cuffdiff gene expression output Cuffdiff output • 5800 genes with FPKM values • Q-value threshold based on Nookaew et al (2012) Data Summary Significant differentially expressed FPKM > 0 and value_2 > 0 log2(fold change) > 1 log2(fold change) < -1 log2(fold change) > 3 log2(fold change) > -2 Q-value < 0.05 2560 2554 735 510 177 44 Q-value < 1e-5 1293 1292 516 410 151 33 Validation of TopHat - Cufflinks Validation of selection Using Excel Literature study Boer et al (2003) Influence of C, N, P and S limitation Microarray analysis > 68 out of 151 significantly upregulated > 9 out of 33 significantly downregulated More or less same genes found in other papers Expression network up Up regulated genes mrnet method in R Number of Nodes = 57 Number of Edges = 1560 Expression network down Down regulated genes mrnet method in R Number of Nodes = 33 Number of Edges = 513 GO Terms and GO Enrichment R version 2.15.0 (2012-03-30) Packages: biomaRt: Ensembl gene 69, S. cerevisiae EF3 org.Sc.sgd.db GOstats Rgraphviz GO enrichment: 8419 genes in the universe (org.Sc.sgdPMID2ORF) Threshold: p-value < 10-4 GO Terms Down regulated 32 genes 29 genes with 208 GO terms (3 genes are not annotated) Gene GO ID Description Low affinity glucose transporter HXT3 GO:0006810, GO:0016020, GO:0016021, GO:0005215, GO:0055085 High-affinity glucose transporter HXT4 GO:0006810, GO:0055085, GO:0022891, GO:0005215, GO:0022857 Up regulated 133 genes 113 genes with 855 GO terms (20 genes are not annotated) Gene GO ID Description - Protein of unknown function involved in energy metabolism under respiratory conditions - Protein required for survival at high temperature during stationary phase GO:0097079, GO:0015355, GO:0022857, GO:0016021, GO:0034219 Monocarboxylate/proton symporter of the plasma membrane RGI2 SPG4 JEN1 GO Enrichment Down regulated Biological process: not found Up regulated GOBPID Pvalue OddsRatio ExpCount Count Size Term GO:0055114 2.02E-10 4.98 7.66 29 415oxidation-reduction process monocarboxylic acid catabolic 23 process generation of precursor 221 metabolites and energy GO:0072329 2.41E-10 33.95 0.46 9 GO:0006091 1.70E-09 6.00 4.40 21 GO:0006099 3.75E-09 22.61 0.60 9 30tricarboxylic acid cycle GO:0009109 3.75E-09 22.61 0.60 9 30coenzyme catabolic process Biological process of up regulated genes Validation: Yeast genome database Problem: Not well annotated because the biomaRt was not updated to Ensembl gene 70, S. cerevisiae EF4 Top 100 gffread: make the transcripts fasta file Determine the top 100 highest and lowest expressed genes for the two conditions R: order cuffdiff output on FPKM value (4 files) Take out the genes with FPKM = 0 Top 100 Top genes: G3P dehydrogenase, F16P aldolase, Ribosomal subunit protein Bottom genes: dubious transcript, retro transposon, etc.. GC-content & transcript length Determine GC-content and transcript length Import top 100 genes files For each file check the genes in top 100 file in transcripts.fa and count GC content and the transcript length GC-content & transcript length Highly expressed in batch: Length: 515.19 GC: 0.43 Lowly expressed in batch: Length: 831.46 GC: 0.41 Highly expressed in chemostat: Length: 556.65 GC: 0.43 Lowly expressed in chemostat: Length: 727.29 GC: 0.41 GC-content & transcript length Short sequence length! mainly in highly expressed genes, gives unrealistic view of codon usage and intron length These are often ribosomal subunit proteins Intron length Genes.gtf as input Create an indexfile Look for the interesting genes Print them to an outputfile Calculate average file mean intron length introns_hi1.out 429.455 introns_hi2.out 440.125 introns_low1.out 60.6667 introns_low2.out 43.5 Codon usage Method (perl script): Input are top high and low expressed genes Build gene ID list and codons list and retrieve sequences Count codon usage and calculate RSCU and average RSCU Conclusion The up and down regulated genes are involved in carbon metabolism Highly expressed genes are involved in carbon metabolism or are ribosomal subunit proteins