Non-linear/non-homogeneous dynamic Bayesian networks Dirk Husmeier Dynamic Bayesian network Model Parameters q Integral analytically tractable! BDe: UAI 1994 BGe: UAI 1995 BDe: Nonlinear discretized model P P1 Activator P2 Repressor Activation Allow for noise: probabilities P P1 Activator P2 Repressor Inhibition Conditional multinomial distribution BGe: Linear model [A]= w1[P1] + w2[P2] + w3[P3] + P1 w1 w4[P4] + noise P2 w2 w3 P3 w4 P4 A Pros and cons of the two models Linear Gaussian model Multinomial model • Restriction to linear processes • Nonlinear model • Original data Æ no information loss • Discretization Æ information loss Can we get an approximate nonlinear model without data discretization? y x Can we get an approximate nonlinear model without data discretization? Idea: piecewise linear model y x Example: 4 genes, 10 time points t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Example: 4 genes, 10 time points t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Learning with MCMC q Allocation vector h k Number of components (here: 3) Learning with MCMC q h k Learning with MCMC q Parameters fixed Complexity of marginalization: k*m h k Learning with MCMC q Parameters not fixed Complexity of m marginalization: k h k Learning with MCMC q Parameters can be integrated out Allocations fixed h k Viral challenge and immune activation of macrophages Collaboration with DPM (Division of Pathway Medicine, Edinburgh University) Macrophage macrophage Treatment Interferon gamma Interferon gamma IFNγ Infection cytomegalovirus Cytomegalovirus (CMV) 12 hour time course measuring total RNA IFNγ 30 min sampling 0 1 2 3 4 5 6 7 8 9 10 11 72 Agilent Arrays 25 samples per group: Clustering • Analysis Time series statistical analysis (using EDGE) Infection with CMV CMV macrophage • Pre-treatment with IFNγ • IFNγ + CMV 12 Posterior probability of the number of components (top) and co-allocation of two time points to the same component (bottom) Infection Treatment White=1 Black=0 Infection+treatment Literature Æ “Known” interactions between three cytokines: IRF1, IRF2 and IRF3 IRF1 IRF3 IRF2 Evaluation: Average marginal posterior probabilities of the edges versus non-edges Sample of high-scoring networks Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges IRF1 IRF3 IRF2 Average edge Average nonedge score score IRF1 IRF3 IRF2 Average edge Average nonedge score score IRF1 IRF3 IRF2 Average edge Average nonedge score score Gold standard known Æ Posterior probabilities of true interactions IRF1 IRF3 IRF2 New method Gold standard known Æ Posterior probabilities of true interactions IRF1 IRF3 IRF2 Homogeneous model Circadian regulation in Arabidopsis thaliana Circadian rhythms in Arabidopsis thaliana Collaboration with the Institute of Molecular Plant Sciences at Edinburgh University (Andrew Miller’s group) 2 time series T20 and T28 of microarray gene expression data from Arabidopsis thaliana. - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Both time series measured under constant light condition at 13 time points: 0h, 2h,…, 24h, 26h - Plants entrained with different light:dark cycles 10h:10h (T20) and 14h:14h (T28) Gene expression time series plots (Arabidopsis data T20 and T28) T28 T20 Posterior probability of the number of components ? Posterior probability of the number of components Predicted network Blue – activation Red – inhibition Black – mixture Three different line widths: - thin = PP>0.5 - medium = PP>0.75 - fat = PP>0.9 Cogs of the Plant Clockwork Review – Rob McClung, Plant Cell 2006 Two major gene classes… Morning genes e.g. LHY, CCA1 … repress evening genes e.g. TOC1, ELF3, ELF4, GI, LUX … which activate LHY and CCA1 Circadian genes in Arabidopsis thaliana, network learned from two time series over 13 time points ELF3 CCA1 LHY PRR9 GI PRR5 ELF4 TOC1 PRR3 “False negatives” “False positives” True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Overview of the plant clock model Evening Morning LHY/ CCA1 Locke et al. Mol. Syst. Biol. 2006 PRR9/ PRR7 Y (GI) X TOC1 Overview of the plant clock model Yes Morning Yes Yes PRR9/ PRR7 LHY/ CCA1 Y (GI) Yes Locke et al. Mol. Syst. Biol. 2006 X Evening TOC1 Allocation sampler versus change-point process Advances in Bioinformatics, in press Heterogeneous DBN q Allocation vector h k Number of components (here: 3) Change-point process Free allocation Free allocation Example: 4 genes, 10 time points t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Example: 4 genes, 10 time points t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Changepoint process Standard dynamic Bayesian network: homogeneous model t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Our new model: heterogeneous dynamic Bayesian network. Here: 2 components t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Our new model: heterogeneous dynamic Bayesian network. Here: 3 components t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Allocation sampler versus change-point process • More flexibility, unrestricted mixture model. • Not restricted to time series • Higher computational costs • Incorporates plausible prior knowledge for time series. • Reduced complexity • Less universal, not applicable to static data Can we get an approximate nonlinear model without data discretization? Idea: piecewise linear model y Allocation sampler: x Æ changepoint process: t Change-point model versus free allocation: Arabidopsis thaliana (13 time points) Fee allocation Changepoint model Change-point model versus free allocation: Drosophila melanogaster (67 time points) Free allocation Change-point process Not only related to the complexity and MCMC convergence … … but it is intrinsic to the prior. Prior probability of assigning two time points to the same component. White=1. Black=0. Allocation sampler Change-point process Allocation sampler Allocation vector Number of components Number of data points assigned to the kth component Allocation sampler Allocation vector Total number of data points Number of components Number of data points assigned to the kth component Change-point process: even-numbered order statistics Change-point process Reallocation of a change-point Prior probability ratio New changepoint Birth of a new change-point See Peter Green, Biometrika (1995) Insertion of a change-point: K=1 Î K=2 Prior probability ratio Allocation vector Allocation sampler Change-point model Top: change-point location j=m/2 fixed, sample size variable Bottom: Sample size m fixed, change-point location variable Change-point model versus free allocation: Drosophila melanogaster (67 time points) Free allocation Change-point process Allocation sampler applied to the macrophage gene expression time series Infection Treatment Infection+treatment Allocation sampler Change-point process Morphogenesis in Drosophila melanogaster Morphogenesis in Drosophila melanogaster • Gene expression measurements over 66 time steps of 4028 genes (Arbeitman et al., Science, 2002). • Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22 Heterogeneous dynamic Bayesian network: Plausible segmentation? t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Posterior probability 1 2 3 4 5 6 7 Number of components 8 Posterior probability 1 2 3 4 5 6 7 8 Number of components Four stages of the Drosophila life cycle: embryo Æ larva Æ pupa Æ adult time time Morphogenetic transitions: Embryo Æ larva larva Æpupa pupa Æ adult Gene expression program governing the transition to adult morphology active well before the fly emerges from the pupa. Change-point model versus free allocation: Drosophila melanogaster (67 time points) Free allocation Change-point process Node-specific changepoints NIPS 2009 Standard dynamic Bayesian network: homogeneous model t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Heterogeneous dynamic Bayesian network t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 Heterogenous dynamic Bayesian network with node-specific breakpoints t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10 X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10 X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10 X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10 MCMC scheme Moves that change the network structure Moves that act on change-points: reallocation, birth and death moves Avoiding spurious feedback loops Avoiding spurious feedback loops BGe New model Marginal posterior probability for edges BGe New model Application to macrophage gene expression data BGe New model Comparative evaluation: Networks for simulating data Generating synthetic data Noise: Normal distribution New model AUC scores Models for comparison New model Models for comparison Application to Arabidopsis thaliana 4 different gene expression times series Application to Arabidopsis thaliana 4 different gene expression times series Dynamic programming Joint work with Marco Grzegorczyk Two priors for changepoints Prior for the number of changepoints, conditional prior on their positions: and a Two priors for changepoints Prior for the number of changepoints, and a conditional prior on their positions: Point process prior Probability for the first changepoint Probability for the time between two successive changepoints Distribution function for the distance between two successive changepoints T Negative binomial distribution Definitions Assumes parameters can be integrated out in closed form Prior Recursion Proof Reminder … Assumes parameters can be integrated out in closed form Prior Summary Definition Recursion Sampling of changepoints Point process prior Definition Recursion Sampling of changepoints Prior for the number of changepoints, and a conditional prior on their positions Definition Recursion Sampling of changepoints Gibbs sampling procedure •P(changepoints|network,data) Æ dynamic programming •P(network|changepoints,data) Gene expresssion profiles from Arabidopsis thaliana ICML 2010 Flexible network structure with regularization Comparison: integration of prior knowledge Prior distribution: Flexible network structure with regularization Flexible network structure with regularization Partition function Ignoring the fan-in restriction: Î Number of genes Drosophila melanogaster: Expression of 11 muscle development genes over 66 time points Fixed structure, flexible parameters time Morphogenetic transitions: Embryo Æ larva larva Æpupa pupa Æ adult Gene expression program governing the transition to adult morphology active well before the fly emerges from the pupa. Transition probabilities: flexible structure with regularization Morphogenetic transitions: Embryo Æ larva larva Æpupa pupa Æ adult Comparison with: Dondelinger, Lèbre & Husmeier Ahmed & Xing Simulation study Frank Dondelinger, Sophie Lèbre, Dirk Husmeier: ICML 2010 Synthetic simulation study Information sharing between adjacent segments No information sharing between adjacent segments Thank you! Any questions?