Talk5MetaAnalysisDifferentialNetworkAnalysis

advertisement
Meta Analysis and
Differential Network Analysis
with Applications in Mouse
Expression Data
Steve Horvath
Outline
•
•
•
•
•
Standard differential expression analysis
Statistical power studies
Important network concepts
Single versus differential network analysis
Differential network construction
Standard (gene based) differential
expression analysis
• Many software packages and R functions calculate T tests,
p-values, false discovery rates, fold changes, etc.
• WGCNA R functions:
– For a binary trait (e.g. case control status), use
standardScreeningBinaryTrait
– For a numeric trait (e.g. body weight), use
standardScreeningNumericTrait
– For a right censored time variable, use
standardScreeningCensoredTime
metaAnalysis R function in the
WGCNA R package
helpfile metaAnalysis
Stouffer Z statistics from metaAnalysis
Ranking based metaAnalysis statistics
Combine several gene rankings
using the rankPvalue function
Statistical Power Studies
Statistical power calculations
According to google scholar, it
was cited by 11708 (July 2013).
Network concept
=network statistics
Network=Adjacency Matrix
• A network can be represented by an adjacency
matrix, A=[aij], that encodes whether/how a
pair of nodes is connected.
– A is a symmetric matrix with entries in [0,1]
– For unweighted network, entries are 1 or 0
depending on whether or not 2 nodes are adjacent
(connected)
– For weighted networks, the adjacency matrix reports
the connection strength between node pairs
– Our convention: diagonal elements of A are all 1.
Motivational example I:
Pair-wise relationships between genes across different
mouse tissues and genders
Challenge:
Develop simple
descriptive measures
that describe the
patterns.
Solution:
The following network
concepts are useful:
density, centralization,
clustering coefficient,
heterogeneity
Motivational example (continued)
Challenge: Find a simple measure for describing the
relationship between gene significance and connectivity
Solution: network concept called hub gene significance
Backgrounds
• Network concepts are also known as
network statistics or network indices
– Examples: connectivity (degree), clustering
coefficient, topological overlap, etc
• Network concepts underlie network
language and systems biological
modeling.
• Dozens of potentially useful network
concepts are known from graph theory.
Review of some
fundamental network concepts which
are defined for all networks (not just
co-expression networks)
Horvath 2011 Weighted Network Analysis. Springer Book.
Hardcover ISBN: 978-1-4419-8818-8
Dong Horvath 2007 Understanding network concepts in
modules BMC Syst Biol
Horvath Dong (2008) Geometric Interpretation of Gene
Co-expression network analysis. Plos Comp Biol
Connectivity
• Node connectivity = row sum of the adjacency
matrix
– For unweighted networks=number of direct neighbors
– For weighted networks= sum of connection strengths
to other nodes
Connectivityi  ki 
a
ij
j i
ki
Scaled connectivity=K i 
max(k )
Density
• Density= mean adjacency
• Highly related to mean connectivity


Density 
aij
mean(k )

n(n  1)
n 1
where n is the number of network nodes.
i
j i
Centralization
Centralization 
n  max(k )
 max(k )

Density
 Density


n  2  n 1
n 1

= 1 if the network has a star topology
= 0 if all nodes have the same connectivity
Centralization = 1
Centralization = 0
because it has a star topology
because all nodes have the
same connectivity of 2
Heterogeneity
• Heterogeneity: coefficient of variation of the
connectivity
• Highly heterogeneous networks exhibit hubs
variance(k )
Heterogeneity 
mean(k )
Clustering Coefficient
Measures the cliquishness of a particular node
« A node is cliquish if its neighbors know each other »
ClusterCoefi 
l i mi,l ail almami

Clustering Coef
of the black
l i
ail
 
2
l i
2
il
a
Clustering Coef
=1
This
generalizes
directly to
weighted
networks
(Zhang and
Horvath
2005)
The topological overlap dissimilarity is used
as input of hierarchical clustering
a
iu auj
TOM ij 
aij
u i , j
min(ki , k j )  1  aij
DistTOM ij  1  TOM ij
•
•
•
Generalized in Zhang and Horvath (2005) to the case of weighted networks
Generalized in Li and Horvath (2006) to multiple nodes
Generalized in Yip and Horvath (2007) to higher order interactions
Network Significance
• Defined as average gene significance
• We often refer to the network significance of a
module network as module significance.
GS

NetworkSignif 
n
i
Maximum adjacency ratio
Network concepts for comparing two networks
Differential network concepts
• Node specific statistics:
– Diff.ClusterCoef(i) = CC1(i) – CC2(i)
– Diff.Mar(i)= MAR1(i) – MAR2(i)
• Global statistics
– Diff.MeanClusterCoef = Mean.CC1–Mean.CC2
– Diff.MeanConnectivity=Mean.k1 – mean.k2
– Diff.MeanMAR=Mean.MAR1 – mean.MAR2
– Diff.MeanKME=Mean.KME
– Diff.Density=Density1 – Density2
– can be calculated via the modulePreservation function
Measuring the similarity
between two networks
R code for computing network
concepts
R code, help file
Data analysis strategies
Single network analysis versus
differential network analysis
Goals of Single Network
Analysis
• Identifying genetic pathways (modules)
• Finding key drivers (hub genes)
• Modeling the relationships between:
– Transcriptome
– Clinical traits / Phenotypes
– Genetic marker data
Single Network WGCNA
Validation set 1
Validation set 2
1 gene co-expression network
Multiple data sets may be used for validation
Goals of Differential Network Analysis
• Uncover differences in modules and
connectivity in different data sets
– Ex: Human versus chimpanzee brains
(Oldham et al. 2006)
• Differing topology in multiple networks
reveals genes/pathways that are wired
differently in different sample populations
Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, …(2007) "Weighted Gene Co-expression Network
Analysis Strategies Applied to Mouse Weight", Mamm Genome. 18(6):463-472
Oldham MC, …Geschwind DH (2006) Conservation and evolution of gene coexpression networks in
human and chimpanzee brains. Proc Natl Acad Sci U S A 103, 17973-17978.
Differential Network WGCNA
NETWORK 1
NETWORK 2
2+ gene co-expression networks
Identify genes and pathways that are:
1. Differentially expressed
2. Differentially wired
BxH Mouse Data from AJ Lusis
• Single network analysis female BxH mice revealed a
weight-related module (Ghazalpour et al. 2006)
• Samples: Constructed networks from mice from extrema
of weight spectrum:
135 FEMALES
– Network 1: 30 leanest mice
– Network 2: 30 heaviest mice
NETWORK 1
NETWORK 2
• Transcripts: Used 3421 most connected and varying
transcripts
Ghazalpour A, Doss S, Zhang B, Wang S, Plaisier C, Castellanos R, Brozell A, Schadt EE, Drake TA, Lusis AJ, Horvath S
(2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS genetics 2, e130
Methods
•
•
Compute Comparison Metrics
Difference in expression: t-test statistic
Compare difference in connectivity: DiffK
Identify significantly different genes/pathways
Permutation test
Functional analysis of significant genes/pathways
DAVID database
Primary literature
Computing Comparison Metrics
DIFFERENTIAL EXPRESSION
t-test statistic computed for each gene, t(i)
DIFFERENTIAL CONNECTIVITY
K1(i) = k1(i)
max(k1)
K2(i) = k2(i)
max(k2)
DiffK(i): difference in normalized
connectivities for each gene:
DiffK(i) = K1(i) – K2(i)
Sector Plot
We visualize the comparison metrics via a
sector plot:
• x-axis: DiffK
• y-axis: t statistics
We establish sector boundaries to identify
regions of differentially expressed and/or
connected regions
• |t| = 1.96 corresponding to p = 0.05
• |DiffK| = 0.4
Permutation test:
Identifying significant sectors
no.perms: number of
permutations
For each sector j, we
compare the number of
genes in unpermuted
and permuted sectors
(nobs and nperm)
PERMUT
E
pj 
NETWORK 1
j
j
# times (nobs
 n perm
) 1
no.perm s 1
NETWORK 2

Sector Plot Results
X
X
X
0.001
0.001
0.001
0.01
X
Functional Analysis
SECTOR 3
High t statistic
High DiffK
Yellow module in lean
Grey in obese
(63 genes)
SECTOR 5
Low t statistic
High Diff K
(28 genes)
Genes in these sectors have higher connectivity in lean than
obese mice: ~ pathways potentially disregulated in obesity ~
Sector 3:
Functional Analysis Results
DAVID Database
• “Extracellular”:
– extracellular region (38% of genes p = 1.8 x 10-4)
– extracellular space (34% of genes p = 5.7 x 10-4)
• signaling (36% of genes p = 5.4 x 10-4)
• cell adhesion (16% of genes p = 7.7 x 10-4)
• glycoproteins (34% of genes p = 1.6 x 10-3)
• 12 terms for epidermal growth factor or its related proteins
– EGF-like 1 (8.2% of genes p = 8.7 x 10-4),
–
–
–
–
EGF-like 3 (6.6% of genes p = 1.6 x 10-3),
EGF-like 2 (6.6% of genes p = 6.0 x 10-3),
EGF (8.2% of genes p = 0.013)
EGF_CA (6.6% of genes p = 0.015)
Sector 3:
Functional Analysis Results
Primary Literature
• Results supported by a study on EGF
levels in mice (Kurachi et al. 1993)
– EGF found to be increased in obese mice
– Obesity was reversed in these mice by:
• Administration of anti-EGF
• Sialoadenectomy
Kurachi H, Adachi H, Ohtsuka S, Morishige K, Amemiya K, Keno Y, Shimomura I, Tokunaga K, Miyake A,
Matsuzawa Y, et al. (1993) Involvement of epidermal growth factor in inducing obesity in ovariectomized mice. The
American journal of physiology 265, E323-331
Sector 5:
Functional Analysis Results
DAVID Database
•
•
•
•
•
Enzyme inhibitor activity (p = 2.9 x 10-3)*
Protease inhibitor activity (p = 6.0 x 10-3)
Endopeptidase inhibitor activity (p = 6.0 x 10-3)
Dephosphorylation (p = 0.012)
Protein amino acid dephosphorylation (p =
0.012)
• Serine-type endopeptidase inhibitor activity (p =
0.042)
* p values shown are corrected using Bonferroni correction
Sector 5:
Functional Analysis Results
Primary Literature
Itih1 and Itih3
• Enriched for all categories shown previously
• Located near a QTL for hyperinsulinemia (Almind and Kahn
2004)
• Itih3 identified as a gene candidate for obesity-related traits
based on differential expression in murine hypothalamus
(Bischof and Wevrick 2005)
Serpina3n and Serpina10
•
•
Enriched for enzyme inhibitor, protease inhibitor, and endopeptidase
inhibitor
Serpina10, or Protein Z-dependent protease inhibitor (ZPI) has been
found to be associated with venous thrombosis (Van de Water et al.
2004)
Almind K, Kahn CR (2004) Genetic determinants of energy expenditure and insulin resistance in diet-induced obesity in mice. Diabetes 53,
3274-3285
Bischof JM, Wevrick R (2005) Genome-wide analysis of gene transcription in the hypothalamus. Physiological genomics 22, 191-196
Van de Water N, Tan T, Ashton F, O'Grady A, Day T, Browett P, Ockelford P, Harper P (2004) Mutations within the protein Z-dependent
protease
inhibitor gene are associated with venous thromboembolic disease: a new form of thrombophilia. Bjh 127, 190-194
Discussion
• If applicable, always report findings from a standard
differential expression analysis as well.
• A host of network concepts exists for describing the
network topology.
• Relatively few people use differential network analysis
which may reflect the fact that large sample sizes are
needed.
– A large sample size is needed to compare two
correlation coefficients
• To check whether a module is preserved in another
network use the modulePreservation function.
Acknowledgements
HORVATH LAB
Dissertation work of
Tova Fuller
Jun Dong
Peter Langfelder
Mouse data collaboration
LUSIS LAB
Jake Lusis
Anatole Ghazalpour
Thomas Drake
An R tutorial may be found at:
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAnalysis
Download