2014-3-20

advertisement
Outline

 Review of major computational approaches to
facilitate biological interpretation of
 high-throughput microarray
 and RNA-Seq experiments.
2
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
Gene Set-Wise Differential
Expression Analysis
co-expression / clustering
Differential Co-Expression
Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:
Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:
Concept lattice analysis : BioLattice
3
Glossary








FAA: Functional Annotation Analysis
GO: Gene Ontology
Pathway
DEG: Differentially Expressed Genes
GSEA: Gene Set Enrichment Analysis
Biological Interpretation and Biological Semantics
Concept lattice analysis
4
Pathway and Ontology-Based Analysis

 GO and biological pathway-based analysis:
 one of the most powerful methods for inferring the
biological meanings of expression changes
 list of genes obtained by:
 differential expression analysis
 co-expression analysis (or clustering)
Pathway and Ontology-Based Analysis

6
7
Pathway and Ontology-Based Analysis

 Attributes can be applied for FAA:








transcription factor binding
clinical phenotypes like disease associations
MeSH (Medical Subject Heading) terms
microRNA binding sites
protein family memberships
chromosomal bands, etc
GO terms
biological pathways
8
Pathway and Ontology-Based Analysis

 Features may have their own ontological structures
 GO has a structure as a DAG (Directed Acyclic
Graph)
9
Pathway and Ontology-Based Analysis

 DEGs:
10
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
Gene Set-Wise Differential
Expression Analysis
co-expression / clustering
Differential Co-Expression
Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:
Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:
Concept lattice analysis : BioLattice
11
Pathway and Ontology-Based Analysis

 DEGs:
 3 techniques which help obtain DEGs:
 t-test
 Wilcoxon’s rank sum test
 ANOVA
 Need to note that multiple-hypothesis-testing
problem should be properly managed
12
Pathway and Ontology-Based Analysis

 Co-expression analysis
13
Pathway and Ontology-Based Analysis

 Co-expression analysis
 puts similar expression profiles together and different
ones apart
 Returning genes that are assumed to be co-regulated
 Clustering algorithms:
 hierarchical-tree clustering
 partitional clustering
14
Pathway and Ontology-Based Analysis

 Pathways are powerful resources for the
understanding of shared biological processes
 E.g.: KEGG, MetaCyc and BioCarta (signaling
pathways)
15
Pathway and Ontology-Based Analysis

 MetaCyc:
 an experimentally determined non-redundant
metabolic pathway database
 It is the largest collection
 containing over 1400 metabolic pathways
16
Pathway and Ontology-Based Analysis

 Ontology / GO:
 providing a shared understanding of a certain domain
of information
 controlled vocabularies
 DAG structures with 3 vocabularies of GO:
 Molecular Function (MF)
 Cellular Compartment (CC)
 Biological Process (BP)
17
Pathway and Ontology-Based Analysis

 Common Gos:
 MIPS: integrated source, protein properties, variety of
complete genomes
 MeSH: clinical including disease names
 OMIM (Online Mendelian Inheritance in Man)
 UMLS (Unified Medical Language System)
18
Pathway and Ontology-Based Analysis
 GO enrichment test:
 For example

 if 20% of the genes in a gene list are annotated with a
GO term ‘apoptosis’
 only 1% of the genes in the whole human genome fall
into this functional category
19
Pathway and Ontology-Based Analysis

 Common statistical tests:
 Chi-square
 binomial
 hypergeometric tests
20
Pathway and Ontology-Based Analysis

 hypergeometric test:
21
Pathway and Ontology-Based Analysis

 Avoid pitfalls when using hypergeometric test
 Choice of background, that makes substantial impact
on the result.
 All genes having at least one GO annotation
 all genes ever known in genome databases
 all genes on the microarray
 GO has a hierarchical tree (or graphical) structure
while hypergeometric test assumes independence of
categories
22
Pathway and Ontology-Based Analysis

 Common Tools






DAVID
ArrayX- Path
Pathway Miner
EASE
GOFish
GOTree etc.
23
24
Gene Set-Wise Differential Expression Analysis

25
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
Gene Set-Wise Differential
Expression Analysis
co-expression / clustering
Differential Co-Expression
Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:
Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:
Concept lattice analysis : BioLattice
26
Gene Set-Wise Differential Expression Analysis

 Evaluates coordinated differential expression of gene
groups
 Gene Set Enrichment Analysis (GSEA)
 The first developed in this category
 evaluates for each a pre-defined gene set the
significant association with phenotypic classes
27
Gene Set-Wise Differential Expression Analysis

 Difference between FAA and GSEA:
 FAA: find over-represented GO terms from a
interesting gene list
 GSEA: obtain the pre-defined gene list first and test
the changes under different conditions.
28
29
Gene Set-Wise Differential Expression Analysis

 Advantages of gene set-wise differential expression analysis:
 successfully identified modest but coordinated changes in
gene expression that might have been missed by
conventional ‘individual gene-wise’ differential expression
analysis.
 (many tiny expression changes can collectively create a big
change)
 straightforward biological interpretation because the gene
sets are defined by biological knowledge
30
Gene Set-Wise Differential Expression Analysis

 Enrichment Score (ES) is calculated by evaluating the fractions
of genes in S (‘‘hits’’) weighted by their correlation and the
fractions of genes not in S (‘‘misses’’) present up to a given
position i in the ranked gene list, L, where N genes are ordered
according to the correlation,
31
Gene Set-Wise Differential Expression Analysis

 Typical gene sets:
 regulatory-motif
 function-related
 disease-related sets
 Database:
 MSigDB:
 6769 gene sets
 classified into five different collections
 Has some interesting extensions
32
Differential Co-Expression Analysis

33
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
Gene Set-Wise Differential
Expression Analysis
co-expression / clustering
Differential Co-Expression
Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:
Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:
Concept lattice analysis : BioLattice
34
Differential Co-Expression Analysis

 Co-expression analysis:
 determines the degree of co-expression of a cluster of genes under
a certain condition
 Differential co-expression analysis:
 determines the degree of co-expression difference of a gene pair or
a gene cluster across different conditions
35
Differential Co-Expression Analysis

 3 major types:
 (a) differential co-expression of gene cluster(s)
 (b) gene pair-wise differential co- expression
 (c) differential co-expression of paired gene sets
36
37
Differential Co-Expression Analysis

 Type (a), identify differentially co-expressed gene cluster(s)
between two conditions
 Let conditions and genes be denoted by J and I, respectively.
The mean squared residual of model is a measurement of coexpression of genes:
38
Differential Co-Expression Analysis
Type (a) cont.

39
Differential Co-Expression Analysis

 Type (b)
40
Differential Co-Expression Analysis

 Type (b), identify differentially co-expressed gene pairs
 Techniques:
 F-statistic
 A meta-analytic approach
41
Differential Co-Expression Analysis

 Note that identification of differentially co-expressed gene
clusters or gene pairs usually do not use a pre-defined gene sets
or pairs.
 Thus the interpretation may also be improved by ontology and
pathway-based annotation analysis.
42
Differential Co-Expression Analysis

 Type (c), dCoxS (differential co-expression of gene sets)
algorithm identifies gene set pairs differentially co-expressed
across different conditions
 Biological pathways can be used as pre-defined gene sets and
the differential co-expression of the biological pathway pairs
between conditions is analyzed.
43
Differential Co-Expression Analysis

 Type (c) cont.
 To measure the expression similarity between paired gene-sets
under the same condition, dCoxS defines the interaction score (IS)
as the correlation coefficient between the sample-wise entropies.
Even when the numbers of the genes in different pathways are
different, IS can always be obtained because it uses only samplewise distances regardless of whether the two pathways have the
same number of genes or not.
44
Differential Co-Expression Analysis

 Type (c) cont.
45
Biological Interpretation and Biological Semantics

46
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
Gene Set-Wise Differential
Expression Analysis
co-expression / clustering
Differential Co-Expression
Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:
Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:
Concept lattice analysis : BioLattice
47
Biological Interpretation and Biological Semantics

 Biomedical semantics provides rich descriptions for biomedical
domain knowledge.
 Motivation for Biological Semantics:
 GO has limitations:





The result of GO is typically a long unordered list of annotations
Most of the analysis tools evaluate only one cluster at a time
time-consuming to read the massive annotation lists
hard to manually assemble
Many annotations are redundant
48
Biological Interpretation and Biological Semantics

 Introducing BioLattice:
 a mathematical framework
 based on concept lattice analysis
 organize traditional clusters and associated annotations into a
lattice of concepts
 A graphical summary
 considers gene expression clusters as objects and annotations as
attributes
 Thus, complex relations among clusters and annotations are
clarified, ordered and visualized.
49
Biological Interpretation and Biological Semantics

 Another advantage of BioLattice is that heterogeneous
biological knowledge resources can be added
50
51
Biological Interpretation and Biological Semantics

 Tool to construct BioLattice:
 The Ganter algorithm http://
www.snubi.org/software/biolattice/
52
53
Conclusion

 Review of major computational approaches to
facilitate biological interpretation of high-throughput
microarray and RNA-Seq experiments.
54
Input: Microarray / RNA seq
DEG: Differentially Expressed Genes
Gene Set-Wise Differential
Expression Analysis
co-expression / clustering
Differential Co-Expression
Analysis
Interest gene, genes list, gene pair or gene list pair
FAA: Functional Annotation Analysis:
Gene Ontology (GO) or Pathway analysis
Gene list with annotations
Visualization, sematic assembling and knowledge learning:
Concept lattice analysis : BioLattice
55
56
Download