Reconstructing regulatory networks and signalling pathways from post-genomic data

advertisement
Reconstructing
regulatory networks and
signalling pathways from
post-genomic data
Dirk Husmeier
Biomathematics & Statistics Scotland
Edinburgh, UK
October 2008
unknown
Postgenomics
high-throughput
data
data
data
machine learning
statistical methods
Overview
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Overview
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Relevance networks
(Butte and Kohane, 2000)
1. Choose a measure of association A(.,.)
2. Define a threshold value tA
3. For all pairs of domain variables (X,Y)
compute their association A(X,Y)
4. Connect those variables (X,Y) by an
undirected edge whose association
A(X,Y) exceeds the predefined threshold
value tA
Association scores
Threshold
Bootstrapping or randomization test
strong
correlation σ12
1
2
‘direct interaction’
1
2
X
‘common regulator’
1
2
X
1
2
X
1
2
‘indirect interaction’
Shortcomings
Pairwise associations do not take
the context of the system
into consideration
direct
common
indirect
interaction
regulator
interaction
co-regulation
Overview
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Distinguish between direct and indirect interactions
direct
common
interaction
indirect
regulator
interaction
co-regulation
A and B have a low partial correlation
Identify candidate causal genes
within the eQTL confidence interval
around a marker by (partial) gene
expression correlation analysis
Correlation Æ Partial correlation
Network reconstruction, part 1
• For each gene included in the gene list of an
eQTL confidence interval Æ compute correlation
coefficient with the gene expression profile of the
gene affected by the eQTL.
• Test for significant departure from zero via a ttest with Bonferroni correction (threshold pvalue: 0.05/n, n: number of genes in the eQTL
confidence interval)
• If significant: Identify the gene with the most
significant correlation coefficient Æ Gene 1.
Network reconstruction, part 2
• Compute first-order partial correlation coefficients
between the other genes and the gene affected by
the eQTL, conditional on Gene 1.
• Test for significant departure from zero via a t-test
with Bonferroni correction (threshold p-value:
0.05/(n-1), n: number of genes in the eQTL
confidence interval).
• If significant: Identify the gene with the most
significant partial correlation coefficient Æ Gene 2.
Network reconstruction, part 3
• Compute second-order partial correlation
coefficients between the other genes and the gene
affected by the eQTL, conditional on Genes 1 & 2.
• Test for significant departure from zero via a t-test
with Bonferroni correction (threshold p-value:
0.05/(n-2), n: number of genes in the eQTL
confidence interval).
• If significant: Identify the gene with the most
significant partial correlation coefficient Æ Gene 3.
• And so on …
Shortcomings
• Iterative, heuristic piecemeal approach
• No conditioning on the whole system, but
on a set of pre-selected genes
Overview
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Graphical Gaussian Models
−1
π ij =
− 1 ⋅ (Σ ) ij
(Σ −1 ) ii ⋅ (Σ −1 ) jj
Inverse of the
covariance
matrix
strong partial
Partial correlation, i.e. correlation
correlation π12
conditional on all other domain variables
Corr(X1,X2|X3,…,Xn)
1
2
Direct interaction
1
2
Graphical Gaussian Models
−1
π ij =
− 1 ⋅ (Σ ) ij
(Σ −1 ) ii ⋅ (Σ −1 ) jj
Inverse of the
covariance
matrix
strong partial
Partial correlation, i.e. correlation
correlation π12
conditional on all other domain variables
Corr(X1,X2|X3,…,Xn)
1
2
Direct interaction
1
2
Problem: #observations < #variables
Î Covariance matrix is singular
Shrinkage estimation and the
lemma of Ledoit-Wolf
Shrinkage estimation and the
lemma of Ledoit-Wolf
Shrinkage estimation and the
lemma of Ledoit-Wolf
Shrinkage estimation and the
lemma of Ledoit-Wolf
Shrinkage estimation and the
lemma of Ledoit-Wolf
Summary of the GGM algorithm, part 1
• Partial correlations, as opposed to standard correlations, capture
the influence of the whole system. Mathematically, this is the
correlation between two nodes conditional on the rest of the
system.
• The partial correlations can be computed from the inverse of the
covariance matrix.
• The true covariance matrix is usually unknown Æ approximated
by the empirical covariance matrix, estimated from the data.
• Empirical covariance matrix Æ over-fitting problem, can be illconditioned or rank-deficient (singular) Æ inversion impossible.
• Regularization: add the identity matrix, weighted by some
constant, to the empirical covariance matrix Æ matrix nonsingular. Possible problem: bias.
Summary of the GGM algorithm, part 2
• Set the trade-off parameter so as to minimize the
expected difference between the (unknown) true
covariance matrix and the estimated matrix.
• Statistical decision theory: closed-from expression for
the optimal trade-off parameter.
• Catch: this expression depends on expectation values
with respect to other data sets generated from the same
processes (parallel statistical universes). Cannot be
computed in practice.
• Heuristics: replace expectation values by the actually
observed values or use bootstrapping.
Correlation graph,
150 leading edges from
Arabidopsis thaliana
Partial correlation graph
(CIG), 150 leading edges from
Arabidopsis thaliana
• How do we choose the threshold?
• Can we control the false discover rate
(FDR)?
Small-sample setting
• Covariance matrix: singular or illconditioned Æ standard partial correlation
estimates cannot be used.
• Empirical finding: sampling distributions
have the same functional form as the
theoretical result.
• Number of degrees of freedom κ no longer
a function of N and G, but needs to be
estimated from the data.
Mixture distribution
Parameter estimation
Control of the FDR
Collaboration with plant biologists at SCRI
Genomewide
microarray
experiments for 9
mutant strains of Pba
measured in triplicets
Scale-free
networks
GGMs with varying thresholds values
Number of nodes versus degree of
connectivity on a log-log plot
In a
Æ
Steps 1-4: Standard CIG algorithm
Shortcoming
Pairwise interactions conditional on the
whole systems, but:
no proper scoring of the whole network
direct
interaction
common
indirect
regulator
interaction
co-regulation
P(A,B)=P(A)·P(B)
But: P(A,B|C)≠P(A|C)·P(B|C)
Example
Overview
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Description of the system with ODEs
Alternative pathways
Alternative pathways
Model selection under the assumption that
all the kinetic rate parameters are known
• Given a time series of gene expression
profiles x(1) … x(t) … x(T)
• For all time points t=1 … T: Numerically
integrate the ODEs to obtain the predictor
z(t) for x(t) from x(t-1).
• Find the optimal network structure that
minimizes the deviation Σ [z(t)-x(t)]²
Model selection for unknown
parameters: naïve approach
• Given a time series of gene expression
profiles x(1) … x(t) … x(T)
• For all time points t=1 … T: Numerically
integrate the ODEs to obtain the predictor
z(t) for x(t) from x(t-1).
• Find the optimal network structure and the
parameters that minimize the deviation
Σ [z(t)-x(t)]²
Model selection for unknown
parameters: naïve approach
• Given a time series of gene expression
profiles x(1) … x(t) … x(T)
• For all time points t=1 … T: Numerically
integrate the ODEs to obtain the predictor
z(t) for x(t) from x(t-1).
• Find the optimal network structure and the
parameters that minimize the deviation
Σ [z(t)-x(t)]² Æ Overfitting: this approach
will always select the most complex model
Bayesian model selection
Select the model
with the highest
posterior probability:
This requires an integration of the
whole parameter space:
Illustration: Marginal likelihood
and Ockham factor
Numerical integration by
sampling from the prior
Model:
Parameters:
Problem: Extremely poor
convergence in high dimensions
Prior distribution
Likelihood function
Numerical integration by
sampling from the posterior
Model:
Parameters:
Problem: Poor convergence in high
dimensions and instability
Prior distribution
Sampling from
the peaks
Likelihood function
Posterior distribution
Main contributions to the integral from the valleys
Importance sampling
Arbitrary (possibly unnormalized) distribution
sampled from
Illustration of annealed
importance sampling
Posterior distribution
Prior distribution
Taken from the MSc thesis by Ben Calderhead,
Annealed importance sampling
Gradually transform the prior (0) into the posterior (1)
Prior
Posterior
How to sample from
?
Generate samples from a Markov chain with a transition
kernel T that leaves
invariant. A sufficient
condition for this to hold is the condition of detailed balance:
Annealed importance sampling
Marginal likelihoods for the
alternative pathways
Shortcomings of the method
• Each step of the AIS procedure requires a
numerical solution of the ODEs.
• Computationally extremely expensive
• Applicable only for selecting a pathway out
of a small set of candidate pathways
• Not yet applicable for learning novel
networks structures “from scratch”.
Overview
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Bayesian networks
•Marriage between graph theory and
probability theory.
NODES
A
B
C
EDGES
D
E
F
•Directed acyclic graph (DAG)
representing conditional
independence relations.
•It is possible to score a network in
light of the data: P(D|M), D:data,
M: network structure.
•We can infer how well a particular
network explains the observed data.
P( A, B, C , D, E , F )
= P ( A) ⋅ P( B | A) ⋅ P(C | A) ⋅ P( D | B, C ) ⋅ P ( E | D) ⋅ P ( F | C , D)
Model
Parameters
Multinomial CPD
Linear Gaussian CPD
Bayes
net
ODE
model
Learning Bayesian networks
P(M|D) = P(D|M) P(M) / Z
M: Network structure. D: Data
Madigan & York (1995), Guidici & Castello (2003)
Easy to compute for
Bayesian nets
Computationally very expensive
for ODE models
Ongoing research: Improve mixture and
convergence of the MCMC simulations
Causal structure of pathways
Bayes
net
ODE
model
Bayesian networks versus
causal networks
Bayesian networks represent conditional (in)dependence
relations - not necessarily causal interactions.
Bayesian networks versus
causal networks
• Hidden variables
• Equivalence classes
True
causal
graph
A
B
C
Node A
unknown
A
B
C
Equivalent network structures
Equivalent classes
Break symmetry of equivalence classes
Interventional data
A and B are correlated
A
B
inhibition of A
A
A
B
down-regulation of B
B
A
B
no effect on B
n
n
i =1
i =1
−{i}
P ( D | M ) = ∏ P ( X i = Di | pa[ X i ] = D pa[ X i ] ) → ∏ P ( X i = Di−{i} | pa[ X i ] = D pa
[ Xi ] )
How can we model feedback loops?
Not a valid DAG
Unfolding in time Æ
Dynamic Bayesian network
Valid DAG
• Homogeneous Markov chain
• We need time series data
Performance comparison
Bayesian networks versus
Graphical Gaussian models
Score based versus constraint based
inference
Evaluation
on the Raf signalling pathway
Receptor
molecules
Cell membrane
Activation
Interaction in
signalling
pathway
Phosphorylated
protein
Inhibition
From Sachs et al Science 2005
Evaluation:
Raf signalling pathway
• Cellular signalling network of 11
phosphorylated proteins and
phospholipids in human immune
systems cell
• Deregulation Æ carcinogenesis
• Extensively studied in the literature
Æ gold standard network
Flow cytometry data
•
Intracellular multicolour flow
cytometry experiments:
concentrations of 11
proteins
•
5400 cells have been
measured under 9 different
cellular conditions (cues)
•
Downsampling to 100
instances (5 separate
subsets): indicative of
microarray experiments
Spellman et al (1998)
Cell cycle
73 samples
Tu et al (2005)
Metabolic cycle
36 samples
time
time
Genes
Genes
Microarray example
In addition to experimental data:
Comparison with simulated data
Raf pathway
Comparison with simulated data 2
Steady-state approximation
“True” network
Reconstructed
network
“True” network
Ranking of
edges
“True” network
Thresholded
network
Performance evaluation:
ROC curves
•We use the Area Under the Receiver Operating
Characteristic Curve (AUC).
AUC=0.5
AUC=1
0.5<AUC<1
Alternative performance evaluation:
True positive (TP) scores
BN
GGM
RN
5 FP counts
Directed graph evaluation - DGE
Undirected graph evaluation - UGE
Synthetic data, observations
Synthetic data, interventions
Cytometry data, observations
Cytometry data, interventions
Complications with real data
• Putative negative feedback loops:
Can we trust our gold-standard network?
• Interventions might not be “ideal” owing
to negative feedback loops.
Disputed structure of the
gold-standard network
Regulation of Raf-1 by Direct
Feedback Phosphorylation. Molecular
Cell, Vol. 17, 2005 Dougherty et al
Stabilisation
through negative
feedback loops
Inhibition
Disputed structure of the
gold-standard network
Regulation of Raf-1 by Direct
Feedback Phosphorylation. Molecular
Cell, Vol. 17, 2005 Dougherty et al
Stabilisation
through negative
feedback loops
Inhibition
Conclusions 1
• BNs and GGMs outperform RNs, most
notably on Gaussian data.
• No significant difference between BNs and
GGMs on observational data.
• For interventional data, BNs clearly
outperform GGMs and RNs, especially
when taking the edge direction (DGE
score) rather than just the skeleton (UGE
score) into account.
Conclusions 2
Performance on synthetic data better
than on real data.
• Real data: more complex
• Real interventions might not be ideal
• Possible errors in the gold-standard
network
+
+
+
+
…
Deviation between the network G
and the prior knowledge B: Graph: є {0,1}
“Energy”
Prior knowledge: є [0,1]
Prior distribution over networks
Hyperparameter
Multiple sources of prior knowledge
Sample networks and hyperparameters from
the posterior distribution
Bayesian networks
with two sources of prior
Source 1
β1
Data
Source 2
BNs +
MCMC
Recovered Networks and
trade off parameters
β2
Bayesian networks
with two sources of prior
Source 1
β1
Data
Source 2
BNs +
MCMC
Recovered Networks and
trade off parameters
β2
Sample networks and hyperparameters from
the posterior distribution with MCMC
Proposal probabilities
Metropolis-Hastings scheme
Prior distribution
Energy of a network
Rewriting the energy
Approximation of the partition function
Partition function of an ideal gas
Evaluation
on the Raf regulatory network
From Sachs et al Science 2005
Flow cytometry data
•
Intracellular multicolour flow
cytometry experiments:
concentrations of 11
proteins
•
5400 cells have been
measured under 9 different
cellular conditions (cues)
•
Downsampling to 100
instances (5 separate
subsets): indicative of
microarray experiments
Prior knowledge from KEGG
Prior distribution
Prior knowledge from KEGG
0.87
0
0.71
1
1
0
0.25
0
0
0.5
0
0.5
0.5
0.5
0
0
Raf network
0
Data and prior knowledge
+ KEGG
+ Random
Evaluation
• Can the method automatically
evaluate how useful the different
sources of prior knowledge are?
• Do we get an improvement in the
regulatory network reconstruction?
• Is this improvement optimal?
Sampled values of the
hyperparameters
Bayesian networks
with two sources of prior knowledge
Random
β1
Data
BNs +
MCMC
Recovered Networks and
trade off parameters
KEGG
β2
Bayesian networks
with two sources of prior knowledge
Random
β1
Data
BNs +
MCMC
Recovered Networks and
trade off parameters
KEGG
β2
Evaluation
• Can the method automatically
evaluate how useful the different
sources of prior knowledge are?
• Do we get an improvement in the
regulatory network
reconstruction?
• Is this improvement optimal?
Flow cytometry data
and KEGG
Evaluation
• Can the method automatically
evaluate how useful the different
sources of prior knowledge are?
• Do we get an improvement in the
regulatory network reconstruction?
• Is this improvement optimal?
Learning the trade-off hyperparameter
Mean and standard deviation
of the sampled trade off
parameter
• Repeat MCMC
simulations for
large set of fixed
hyperparameters β
• Obtain AUC scores
for each value of β
• Compare with the
proposed scheme in
which β is
automatically
inferred.
Monolithic
Individual
datadatadata
data data data
Propose a compromise between the two
Compromise between the two previous
ways of combining the data
M*
β1
M1
D1
β2
M2
D2
...
βI
MI
DI
BGe or BDe
Ideal gas approximation
MCMC
Result 1: Successful detection of
corrupted data sets
M*
β1
β2
M1
D1
M2
...
βI
MI
DI
0
Result 2: Improved network
reconstruction accuracy
Summary
•
•
•
•
•
Relevance networks
Partial correlation analysis
Graphical Gaussian models
Differential equation models
Bayesian networks
Thank you!
Any questions?
Download