A Tempering Algorithm For Large-sample Network Inference D. Barker , S. Hill

advertisement
A Tempering Algorithm For Large-sample
Network Inference
D. Barker1 , S. Hill1 and S.
Mukherjee2
1 Complexity
Science DTC, 2 Dept. of
Statistics
University of Warwick, England
CV4 7AL
MIR@W - 7 March 2011
Background
• Networks (e.g. genes, proteins,
metabolites) important notion in
current biology.
• Probabilistic Graphical Models
(PGM) are a key approach.
• RTK is an example of a signalling
network.
2 of 18
(Weinberg 2007, Yarden &
Sliwkowski 2001)
Bayesian Networks
• Stochastic Models where a graph is
used to describe probabilistic
relationships between components.
A
B
C
D
E
F
G
• Graph specifies conditional
independence statements.
• In some frameworks the graph must be
directed and acyclic (DAG).
Special cases include HMMs, Bayesian
Networks (BN), Dynamic BNs.
3 of 18
I
J
H
K
L
N
M
Structural Inference
Interested in inference of graph G given some observed data X.
Posterior probability over graphs G give by Bayes’ Theorem
P(G |X) ∝ P(X|G )P(G )
Has closed form up to proportionality constant for certain choices of
underlying models.
Maximising P(G |X) can have robustness problems; If posterior has
several highly scoring graphs how do we choose between them?
• For this reason we use model averaging.
4 of 18
Model Averaging
Probability E (e) of seeing an edge e averaged over all graphs G is
more robust.
• Edges which repeatedly appear in likely graphs have high E (e).
Knowledge of proportionality constant requires enumeration of whole
p-node DAG space G.
• G grows super-exponentially with p.
Thus we must use MCMC to estimate the posterior probabilities
P(G |X).
5 of 18
Monte Carlo
• Move around G by performing elementary
moves on current graph G .
Addition
• Accept or reject new graphs G 0 based on MH
acceptance probability;
α=
P(X|G 0 )|η(G )|
P(X|G )|η(G 0 )|
Deletion
(for uniform priors)
Neighbourhood η(G ) is all
Called MC3 (Madigan & York 1995)
graphs reachable from G .
Estimate of posterior probability given by
PĚ‚(G |X) =
6 of 18
Reversal
tmax
1 X
tmax
t=1
I (g (t) = G )
Sample Size
Having more data is clearly a good thing.
• High throughput experiments, FACS,
social science, etc...
Caution!
(4 Nodes)
5.5
Mean Entropy − <H[p]>
5
In certain situations large sample size N
can cause problems.
4.5
4
3.5
3
Convergence to correct stationary
distribution can be slow.
2.5
2
1.5
1
0
7 of 18
50
100
Number of Data − N
150
Motivation
0.3
· Posterior mass concentrates on a few
Posterior − P(G|X)
· Posterior for p=4 node system with two
different sample sizes N = 5 and N = 10
0.25
0
0
· If these are hard to get between Markov
correct data generating class.
200
300
Graph − G
400
500
100
200
300
Graph − G
400
500
0.3
0.2
0.15
0.1
0.05
0
0
8 of 18
100
0.25
Posterior − P(G|X)
Note: As N → ∞ we pick out all graphs from the
0.1
0.05
highly likely graphs.
chain mixing is slow.
0.2
0.15
Tempering
T
• Aim is to allow the Markov chain (MC)
to move between high scoring graphs.
• Natural idea is to use tempering.
• Couple higher temperature MCs to the
one with the desired posterior.
Temperature analogy
achieved by raising
posterior score to β =
P(G |X)β ∝ ( P(X|G )P(G ) )β
9 of 18
1
T:
Tempering Scheme
Set up m MCs at temperatures T1 , ...Tm .
MCs at higher temperature can explore the space more freely.
• Each chain simulated using often used MH scheme.
Every iteration with probability pswap swap graphs between randomly
chosen neighbouring chains i and j
• Accept the swap with probability ρ.
Swapping probability
ρ=
10 of 18
(P(X|Gj )P(Gj ))βi (P(X|Gi )P(Gi ))βj
(P(X|Gi )P(Gi ))βi (P(X|Gj )P(Gj ))βj
Simulation
First we examine performance on synthetic data generated from the
known network shown earlier.
Data is generated using
• A ∼ N(0, σ) for parent nodes.
• C ∼ N(A + B + γAB, σ) for child nodes.
(with parents A and B)
Since we know the underlying graph from which the data were
generated we can draw ROC curves...
11 of 18
ROC Curves
Curves parametrised by threshold t; keep in output graph all edges with
E (e) > t.
N=5000
Number of true positives
15
10
5
PT
MH
CB
0
0
5
10
15
20
Number of false positives
25
Tempering has picked up fewer false positive edges compared to standard
MC3 for the same number of true edges.
(Xie & Geng, JMLR 2008)
12 of 18
ROC Curves
N=500
N=5000
15
10
5
PT
MH
CB
0
0
50
100
Number of false positives
Number of true positives
Number of true positives
15
10
5
PT
MH
CB
0
0
150
50
100
Number of false positives
N=500
N=5000
13 of 18
15
Number of true positives
Number of true positives
15
10
5
PT
MH
CB
0
0
150
5
10
15
20
Number of false positives
25
10
5
PT
MH
CB
0
0
5
10
15
20
Number of false positives
25
Proteomic Data
We examine here the application to inferring the underlying DBN from a set
of proteomic data.
Due to certain factorisation for DBNs we can calculate exact edge
probabilities.
• Gives us gold standard comparison!
We examine;
• Correlation ρ between the exact and MC estimated edge
probabilities.
• Normalised sum difference s between the exact and MC
estimated edge probs.
14 of 18
Edge Probabilities
1
0.16
PT
0.9
MH
0.8
MH
0.12
0.7
<S>
<ρ>
PT
0.14
0.6
0.08
0.5
0.06
0.4
0
0.1
1
2
MCS
3
4
5
5
x 10
0.04
0
1
2
MCS
T =1.0,1.25,1.5,1.75,2.0 and pswap = 0.1, averaged over 4 runs.
15 of 18
3
4
5
5
x 10
Edge Probabilities
If we look at the individual edge probabilities we see better performance
(closer to x = y ) for tempering:
Estimated - E(e)
0.8
1
MC3
0.6
0.4
0.2
0
0
MC4
0.8
Estimated - E(e)
1
0.6
0.4
0.2
0.2
0.4
0.6
Exact - E(e)
0.8
1
0
0
0.2
0.4
0.6
Exact - E(e)
0.8
1
Toughest edges to infer are significantly better estimated by using tempering.
16 of 18
Conclusions
• As sample size increases posterior mass can concentrate around several
hard to move between graphs.
• Widely employed MCMC schemes can fail to estimate edges properly in
these increasingly common situations.
• Counter this by using higher temperature chains coupled to desired
posterior: PT.
• Important to draw robust conclusions from data in a wide range of fields.
17 of 18
Acknowledgements
I would like to thank my co-authors
• Sach Mukherjee,
• Steven Hill,
as well as
• R. Goudie
• M. Nicodemi
• N. Podd
for useful discussions, EPSRC for funding...
And finally, thank you for listening.
18 of 18
Download