EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu The problem: Functional Interpretation • Identify and assess functional associations between an experimentally derived gene/protein set and well-known gene/protein sets Agenda • Related research • The method • The Evaluation • The results • The conclusion Related Research • Over-representation analysis (ORA) • Gene set enrichment analysis (GSEA) • Modular enrichment analysis (MEA) Limitations • ORA tend to have low discriminative power • Functional information from interaction network disregarded • Missing annotation gene/protein ignored • Tissue-specific gene/protein set association often infeasible Agenda • Related research • The method • The Evaluation • The results • The conclusion General workflow • Input gene/protein list(>=10), a database of interest (KEGG etc.) • Processing gene mapping, score the distance with RWR, compare scores with background model • Output A pathways/processes ranking table, visualization of sub-networks Input Output The method .3 0 Pathway 1 0 .5 Pathway 1 0 0 .6 .6 .6 .6 .6 .5 .2 .1 .6 .6 .9 0.9 0 RWR 1 0 0 0 1 .6 .9 .6 .6 .3 .9 0 .6 .4 0 .1 0 .8 0 .3 Pathway N .3 Pathway N 0 .4 .3 0 0 ……. 0 1 1 .3 .6 0 ……. Input Gene Set 0 .4 .2 10 Algorithm for distance score Relate scores to a background model • Discretized into equal-sized bins • Quatify each pathway’s deviation from average Agenda • Related research • The method • The Evaluation • The results • The conclusion Evaluation method • Compare with ORA 5 datasets and 2 reference gene sets from literature 1. select 100 most DEGs 2. get association scores of EnrichNet and ORA 3. compute a running-sum statistic for all gene sets • The consensus of GSEA-derived(SAM-GS, GAGE) pathway ranking as external benchmark pathway ranking Agenda • Related research • The method • The Evaluation • The results • The conclusion The results-EnrichNet vs ORA The results-Xd-score vs Q-value The results-comparative validation Protein–protein interaction sub-networks (largest connected components) for target and reference set pairs with small overlap, predicted to be functionally associated by EnrichNet: (a) gastric cancer mutated genes (blue) and genes/proteins from the BioCarta pathway ‘Role of Erk5 in Neuronal Survival’ (magenta, the shared genes are shown in green); (b) bladder cancer mutated genes (blue) and genes/proteins from Gene Ontology term ‘Tyrosine phosphorylation of Stat3’ (GO:0042503, magenta; the only shared gene NF2 is shown in green). Protein–protein interaction sub-network (largest connected component) for the PD gene set (blue) and genes/proteins from GO term ‘Regulation of interleukin-6 biosynthetic process’ (magenta, GO:0045408; the only shared gene IL1B is shown in green). The results-tissue specificity • EnrichSet don’t require additional gene expression measurement data • Brain tissue: Xd-scores over-representated • Non-Brain tissue: center of Xd-score distribution significant lower The conclusion • EnrichNet sometimes has more discriminative power when target sets and pathway set has large overlaps • EnrichNet can identifies novel function associations through direct and indirect molecular interactions when target sets and pathway set has little overlaps