A Tempering Algorithm For Large-sample Network Inference D. Barker1 , S. Hill1 and S. Mukherjee2 1 Complexity Science DTC, 2 Dept. of Statistics University of Warwick, England CV4 7AL MIR@W - 7 March 2011 Background • Networks (e.g. genes, proteins, metabolites) important notion in current biology. • Probabilistic Graphical Models (PGM) are a key approach. • RTK is an example of a signalling network. 2 of 18 (Weinberg 2007, Yarden & Sliwkowski 2001) Bayesian Networks • Stochastic Models where a graph is used to describe probabilistic relationships between components. A B C D E F G • Graph specifies conditional independence statements. • In some frameworks the graph must be directed and acyclic (DAG). Special cases include HMMs, Bayesian Networks (BN), Dynamic BNs. 3 of 18 I J H K L N M Structural Inference Interested in inference of graph G given some observed data X. Posterior probability over graphs G give by Bayes’ Theorem P(G |X) ∝ P(X|G )P(G ) Has closed form up to proportionality constant for certain choices of underlying models. Maximising P(G |X) can have robustness problems; If posterior has several highly scoring graphs how do we choose between them? • For this reason we use model averaging. 4 of 18 Model Averaging Probability E (e) of seeing an edge e averaged over all graphs G is more robust. • Edges which repeatedly appear in likely graphs have high E (e). Knowledge of proportionality constant requires enumeration of whole p-node DAG space G. • G grows super-exponentially with p. Thus we must use MCMC to estimate the posterior probabilities P(G |X). 5 of 18 Monte Carlo • Move around G by performing elementary moves on current graph G . Addition • Accept or reject new graphs G 0 based on MH acceptance probability; α= P(X|G 0 )|η(G )| P(X|G )|η(G 0 )| Deletion (for uniform priors) Neighbourhood η(G ) is all Called MC3 (Madigan & York 1995) graphs reachable from G . Estimate of posterior probability given by PĚ‚(G |X) = 6 of 18 Reversal tmax 1 X tmax t=1 I (g (t) = G ) Sample Size Having more data is clearly a good thing. • High throughput experiments, FACS, social science, etc... Caution! (4 Nodes) 5.5 Mean Entropy − <H[p]> 5 In certain situations large sample size N can cause problems. 4.5 4 3.5 3 Convergence to correct stationary distribution can be slow. 2.5 2 1.5 1 0 7 of 18 50 100 Number of Data − N 150 Motivation 0.3 · Posterior mass concentrates on a few Posterior − P(G|X) · Posterior for p=4 node system with two different sample sizes N = 5 and N = 10 0.25 0 0 · If these are hard to get between Markov correct data generating class. 200 300 Graph − G 400 500 100 200 300 Graph − G 400 500 0.3 0.2 0.15 0.1 0.05 0 0 8 of 18 100 0.25 Posterior − P(G|X) Note: As N → ∞ we pick out all graphs from the 0.1 0.05 highly likely graphs. chain mixing is slow. 0.2 0.15 Tempering T • Aim is to allow the Markov chain (MC) to move between high scoring graphs. • Natural idea is to use tempering. • Couple higher temperature MCs to the one with the desired posterior. Temperature analogy achieved by raising posterior score to β = P(G |X)β ∝ ( P(X|G )P(G ) )β 9 of 18 1 T: Tempering Scheme Set up m MCs at temperatures T1 , ...Tm . MCs at higher temperature can explore the space more freely. • Each chain simulated using often used MH scheme. Every iteration with probability pswap swap graphs between randomly chosen neighbouring chains i and j • Accept the swap with probability ρ. Swapping probability ρ= 10 of 18 (P(X|Gj )P(Gj ))βi (P(X|Gi )P(Gi ))βj (P(X|Gi )P(Gi ))βi (P(X|Gj )P(Gj ))βj Simulation First we examine performance on synthetic data generated from the known network shown earlier. Data is generated using • A ∼ N(0, σ) for parent nodes. • C ∼ N(A + B + γAB, σ) for child nodes. (with parents A and B) Since we know the underlying graph from which the data were generated we can draw ROC curves... 11 of 18 ROC Curves Curves parametrised by threshold t; keep in output graph all edges with E (e) > t. N=5000 Number of true positives 15 10 5 PT MH CB 0 0 5 10 15 20 Number of false positives 25 Tempering has picked up fewer false positive edges compared to standard MC3 for the same number of true edges. (Xie & Geng, JMLR 2008) 12 of 18 ROC Curves N=500 N=5000 15 10 5 PT MH CB 0 0 50 100 Number of false positives Number of true positives Number of true positives 15 10 5 PT MH CB 0 0 150 50 100 Number of false positives N=500 N=5000 13 of 18 15 Number of true positives Number of true positives 15 10 5 PT MH CB 0 0 150 5 10 15 20 Number of false positives 25 10 5 PT MH CB 0 0 5 10 15 20 Number of false positives 25 Proteomic Data We examine here the application to inferring the underlying DBN from a set of proteomic data. Due to certain factorisation for DBNs we can calculate exact edge probabilities. • Gives us gold standard comparison! We examine; • Correlation ρ between the exact and MC estimated edge probabilities. • Normalised sum difference s between the exact and MC estimated edge probs. 14 of 18 Edge Probabilities 1 0.16 PT 0.9 MH 0.8 MH 0.12 0.7 <S> <ρ> PT 0.14 0.6 0.08 0.5 0.06 0.4 0 0.1 1 2 MCS 3 4 5 5 x 10 0.04 0 1 2 MCS T =1.0,1.25,1.5,1.75,2.0 and pswap = 0.1, averaged over 4 runs. 15 of 18 3 4 5 5 x 10 Edge Probabilities If we look at the individual edge probabilities we see better performance (closer to x = y ) for tempering: Estimated - E(e) 0.8 1 MC3 0.6 0.4 0.2 0 0 MC4 0.8 Estimated - E(e) 1 0.6 0.4 0.2 0.2 0.4 0.6 Exact - E(e) 0.8 1 0 0 0.2 0.4 0.6 Exact - E(e) 0.8 1 Toughest edges to infer are significantly better estimated by using tempering. 16 of 18 Conclusions • As sample size increases posterior mass can concentrate around several hard to move between graphs. • Widely employed MCMC schemes can fail to estimate edges properly in these increasingly common situations. • Counter this by using higher temperature chains coupled to desired posterior: PT. • Important to draw robust conclusions from data in a wide range of fields. 17 of 18 Acknowledgements I would like to thank my co-authors • Sach Mukherjee, • Steven Hill, as well as • R. Goudie • M. Nicodemi • N. Podd for useful discussions, EPSRC for funding... And finally, thank you for listening. 18 of 18