EnrichNet:network-based gene set enrichment analysis

advertisement
EnrichNet: network-based gene
set enrichment analysis
Presenter: Lu Liu
The problem: Functional
Interpretation
• Identify and assess functional associations
between an experimentally derived
gene/protein set and well-known
gene/protein sets
Agenda
• Related research
• The method
• The Evaluation
• The results
• The conclusion
Related Research
• Over-representation analysis (ORA)
• Gene set enrichment analysis (GSEA)
• Modular enrichment analysis (MEA)
Limitations
• ORA tend to have low discriminative power
• Functional information from interaction network
disregarded
• Missing annotation gene/protein ignored
• Tissue-specific gene/protein set association often
infeasible
Agenda
• Related research
• The method
• The Evaluation
• The results
• The conclusion
General workflow
• Input
gene/protein list(>=10), a database of interest
(KEGG etc.)
• Processing
gene mapping, score the distance with RWR,
compare scores with background model
• Output
A pathways/processes ranking table, visualization
of sub-networks
Input
Output
The method
.3
0
Pathway 1
0
.5
Pathway 1
0
0
.6
.6
.6
.6
.6
.5
.2
.1
.6
.6
.9
0.9
0
RWR
1
0
0
0
1
.6
.9
.6
.6
.3
.9
0
.6
.4
0
.1
0
.8
0
.3
Pathway N
.3
Pathway N
0
.4
.3
0
0
…….
0
1
1
.3
.6
0
…….
Input
Gene
Set
0
.4
.2
10
Algorithm for distance score
Relate scores to a background model
• Discretized into equal-sized bins
• Quatify each pathway’s deviation from
average
Agenda
• Related research
• The method
• The Evaluation
• The results
• The conclusion
Evaluation method
• Compare with ORA
5 datasets and 2 reference gene sets from
literature
1. select 100 most DEGs
2. get association scores of EnrichNet and ORA
3. compute a running-sum statistic for all gene sets
• The consensus of GSEA-derived(SAM-GS, GAGE)
pathway ranking as external benchmark pathway
ranking
Agenda
• Related research
• The method
• The Evaluation
• The results
• The conclusion
The results-EnrichNet vs ORA
The results-Xd-score vs Q-value
The results-comparative validation
Protein–protein interaction sub-networks (largest connected components) for target and
reference set pairs with small overlap, predicted to be functionally associated by EnrichNet:
(a) gastric cancer mutated genes (blue) and genes/proteins from the BioCarta pathway ‘Role
of Erk5 in Neuronal Survival’ (magenta, the shared genes are shown in green); (b) bladder
cancer mutated genes (blue) and genes/proteins from Gene Ontology term ‘Tyrosine
phosphorylation of Stat3’ (GO:0042503, magenta; the only shared gene NF2 is shown in
green).
Protein–protein interaction sub-network (largest connected component) for the PD gene set
(blue) and genes/proteins from GO term ‘Regulation of interleukin-6 biosynthetic process’
(magenta, GO:0045408; the only shared gene IL1B is shown in green).
The results-tissue specificity
• EnrichSet don’t require additional gene
expression measurement data
• Brain tissue: Xd-scores over-representated
• Non-Brain tissue: center of Xd-score
distribution significant lower
The conclusion
• EnrichNet sometimes has more discriminative
power when target sets and pathway set has
large overlaps
• EnrichNet can identifies novel function
associations through direct and indirect
molecular interactions when target sets and
pathway set has little overlaps
Download