Gaussian process inference in differential equations Magnus Rattray Machine Learning and Oprimization Group, University of Manchester June 15th, 2010 Joint work with Neil Lawrence Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 1 / 25 Differential equation models Differential equations are a very popular way to model transcriptional regulation and other cellular processes x(t = 0) = x0 dx = F (x, θ) dt Model-based inference and learning are useful in a number of contexts, e.g. (1) Inference: can we infer the action of unobserved chemical species? (2) Learning: can we learn the model parameters θ from data? (3) Model selection: can we identify the best hypothesis F (x, θ)? Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 2 / 25 Differential equation models Differential equations are a very popular way to model transcriptional regulation and other cellular processes x(t = 0) = x0 dx = F (x, θ) dt Model-based inference and learning are useful in a number of contexts, e.g. (1) Inference: can we infer the action of unobserved chemical species? (2) Learning: can we learn the model parameters θ from data? (3) Model selection: can we identify the best hypothesis F (x, θ)? Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 2 / 25 Differential equation models Differential equations are a very popular way to model transcriptional regulation and other cellular processes x(t = 0) = x0 dx = F (x, θ) dt Model-based inference and learning are useful in a number of contexts, e.g. (1) Inference: can we infer the action of unobserved chemical species? (2) Learning: can we learn the model parameters θ from data? (3) Model selection: can we identify the best hypothesis F (x, θ)? Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 2 / 25 Differential equation models Differential equations are a very popular way to model transcriptional regulation and other cellular processes x(t = 0) = x0 dx = F (x, θ) dt Model-based inference and learning are useful in a number of contexts, e.g. (1) Inference: can we infer the action of unobserved chemical species? (2) Learning: can we learn the model parameters θ from data? (3) Model selection: can we identify the best hypothesis F (x, θ)? Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 2 / 25 Differential equation models Differential equations are a very popular way to model transcriptional regulation and other cellular processes x(t = 0) = x0 dx = F (x, θ) dt Model-based inference and learning are useful in a number of contexts, e.g. (1) Inference: can we infer the action of unobserved chemical species? (2) Learning: can we learn the model parameters θ from data? (3) Model selection: can we identify the best hypothesis F (x, θ)? Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 2 / 25 Differential equation model of activation Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt xj (t) – concentration of gene j’s mRNA Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt xj (t) – concentration of gene j’s mRNA f (t) – concentration of active transcription factor (TF) Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt xj (t) – concentration of gene j’s mRNA f (t) – concentration of active transcription factor (TF) Model parameters: baseline Bj , sensitivity Sj and decay Dj Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt xj (t) – concentration of gene j’s mRNA f (t) – concentration of active transcription factor (TF) Model parameters: baseline Bj , sensitivity Sj and decay Dj Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt xj (t) – concentration of gene j’s mRNA f (t) – concentration of active transcription factor (TF) Model parameters: baseline Bj , sensitivity Sj and decay Dj Problem 1: how do we fit the model when f (t) is not observed? Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Differential equation model of activation Linear Activation Model (Barenco et al. Genome Biology 2006) dxj (t) = Bj + Sj f (t) − Dj xj (t) dt xj (t) – concentration of gene j’s mRNA f (t) – concentration of active transcription factor (TF) Model parameters: baseline Bj , sensitivity Sj and decay Dj Problem 1: how do we fit the model when f (t) is not observed? Problem 2: how do we deal with the fact f (t) does not appear on the left? (the system is “open”) Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 3 / 25 Why use a model-based approach? Co-regulated genes can differ greatly in their expression profiles Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 4 / 25 Why use a model-based approach? Co-regulated genes can differ greatly in their expression profiles 210764sat − CYR61 5 x 10 2 1.5 1 0.5 0 0 2 4 Magnus Rattray (University of Manchester) 6 8 10 Gaussian process inference 15/06/10 4 / 25 Why use a model-based approach? Co-regulated genes can differ greatly in their expression profiles 210764sat − CYR61 5 x 10 204748at − PTGS2 4 x 10 2 2 1.5 1.5 1 1 0.5 0.5 0 0 2 4 Magnus Rattray (University of Manchester) 6 8 10 Gaussian process inference 0 0 2 4 6 8 15/06/10 10 4 / 25 Why use a model-based approach? Co-regulated genes can differ greatly in their expression profiles 210764sat − CYR61 5 x 10 204748at − PTGS2 4 x 10 2 2 1.5 1.5 1 1 0.5 0.5 0 0 2 4 6 8 10 0 0 2 4 6 8 10 Simple clustering cannot be relied on to identify co-regulated genes Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 4 / 25 Why use a model-based approach? Co-regulated genes can differ greatly in their expression profiles 210764sat − CYR61 5 x 10 204748at − PTGS2 4 x 10 2 2 1.5 1.5 1 1 0.5 0.5 0 0 2 4 6 8 10 0 0 2 4 6 8 10 Simple clustering cannot be relied on to identify co-regulated genes Concentration of phosphorylated TF in nucleus difficult to measure Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 4 / 25 Why use a model-based approach? Co-regulated genes can differ greatly in their expression profiles 210764sat − CYR61 5 x 10 204748at − PTGS2 4 x 10 2 2 1.5 1.5 1 1 0.5 0.5 0 0 2 4 6 8 10 0 0 2 4 6 8 10 Simple clustering cannot be relied on to identify co-regulated genes Concentration of phosphorylated TF in nucleus difficult to measure A model-based inference approach is useful Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 4 / 25 Representing f (t) as a Gaussian Process We need a way to represent the TF concentration f (t) Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 5 / 25 Representing f (t) as a Gaussian Process We need a way to represent the TF concentration f (t) A Gaussian Process (GP) is a distribution over functions f (t) v GP m (t) , k t, t 0 where m (t) = E [f (t)] k t, t 0 = E (f (t) − m (t)) f t 0 − m t 0 Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 5 / 25 Representing f (t) as a Gaussian Process We need a way to represent the TF concentration f (t) A Gaussian Process (GP) is a distribution over functions f (t) v GP m (t) , k t, t 0 where m (t) = E [f (t)] k t, t 0 = E (f (t) − m (t)) f t 0 − m t 0 Functional analogue of the Gaussian distribution Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 5 / 25 Representing f (t) as a Gaussian Process We need a way to represent the TF concentration f (t) A Gaussian Process (GP) is a distribution over functions f (t) v GP m (t) , k t, t 0 where m (t) = E [f (t)] k t, t 0 = E (f (t) − m (t)) f t 0 − m t 0 Functional analogue of the Gaussian distribution Like the Gaussian it has some useful properties for inference Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 5 / 25 From a Gaussian distribution to a Gaussian Process Samples from a 25-dimensional Gaussian distribution 2 1.5 0.8 10 0.6 15 0.4 20 0.2 m 1 5 fn 0.5 5 10 −0.5 15 20 25 n 25 5 −1 Magnus Rattray (University of Manchester) Gaussian process inference 10 n 15 20 25 15/06/10 6 / 25 From a Gaussian distribution to a Gaussian Process Samples from a 25-dimensional Gaussian distribution 2 1.5 0.8 10 0.6 15 0.4 20 0.2 m 1 5 fn 0.5 5 10 −0.5 15 20 25 n 25 5 −1 10 n 15 20 25 By making nearby points correlated the samples seem to follow a line Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 6 / 25 Samples from a Gaussian Process 2.5 3 2 2 1.5 1 1 0.5 −3 −2 −1 −0.5 1 2 3 −3 −2 −1 1 2 3 −1 −1 −1.5 −2 −2 −2.5 −3 kf ,f t, t 0 (t − t 0 )2 = exp − l2 ! The parameter l determines the time-scale for changes in f (t) Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 7 / 25 Linear activation model Recall the linear model dxj (t) = Bj + Sj f (t) − Dj xj (t) . dt This differential equation can be solved for xj (t) as Bj xj (t) = + Sj Dj Magnus Rattray (University of Manchester) Z t e −Dj (t−u) f (u) du . 0 Gaussian process inference 15/06/10 8 / 25 Linear activation model Recall the linear model dxj (t) = Bj + Sj f (t) − Dj xj (t) . dt This differential equation can be solved for xj (t) as Bj xj (t) = + Sj Dj Note: Z t e −Dj (t−u) f (u) du . 0 This is a linear operation on f (t) Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 8 / 25 Covariance function Any linear operation on a GP =⇒ Related GP f (t) v GP 0, kf ,f t, t 0 =⇒ xj (t) v GP Bj , kxj ,xj t, t 0 Dj with covariance (i = j) and cross-covariances (i 6= j) between genes: kxi ,xj t, t 0 t0 Z tZ 0 0 e −Di (t−u)−Dj (t −u ) kf ,f u, u 0 dudu 0 = Si Sj 0 0 and cross-covariances between xj (t) and f (t): kxj ,f t, t 0 Z = t e −Di (t−u) kf ,f u, t 0 du . 0 Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 9 / 25 Inferring the transcription factor concentration f (t) Under the linear model, we have 0 f Kff ∼ GP , B Kxf x D Magnus Rattray (University of Manchester) Gaussian process inference Kf x Kxx 15/06/10 10 / 25 Inferring the transcription factor concentration f (t) Under the linear model, we have 0 f Kff ∼ GP , B Kxf x D Kf x Kxx Bayesian GP regression gives the predicted process as p(f | x) ∼ GP(hf ipost , Kffpost ) where hf ipost Kffpost Magnus Rattray (University of Manchester) = −1 Kf x Kxx B x− D −1 = Kff − Kf x Kxx Kxf Gaussian process inference 15/06/10 10 / 25 Inferring the transcription factor concentration f (t) Under the linear model, we have 0 f Kff ∼ GP , B Kxf x D Kf x Kxx Bayesian GP regression gives the predicted process as p(f | x) ∼ GP(hf ipost , Kffpost ) where hf ipost = Kffpost −1 Kf x Kxx B x− D −1 = Kff − Kf x Kxx Kxf Note: in practice x only observed, with noise, at small number of times data Magnus Rattray (University of Manchester) yjt = xj (t) + ηjt Gaussian process inference 15/06/10 10 / 25 Artificial Example – inferring f (t) from noiseless data 6 data points from 3 genes Zero noise observations Known kinetic parameters: j =1 j =2 j =3 Magnus Rattray (University of Manchester) Bj 0.0 7.5 × 10−2 2.5 × 10−3 Sj 1.0 0.4 0.4 Gaussian process inference Dj 1.0 5 × 10−2 1 × 10−3 15/06/10 11 / 25 Artificial Example – inferring f (t) from noiseless data 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 −1 0 −1 0 5 10 j =1 j =2 j =3 Magnus Rattray (University of Manchester) 15 20 Bj 0.0 7.5 × 10−2 2.5 × 10−3 Sj 1.0 0.4 0.4 Gaussian process inference 5 10 15 20 Dj 1.0 5 × 10−2 1 × 10−3 15/06/10 12 / 25 Artificial Example – inferring f (t) from noiseless data 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 −1 0 0 5 10 15 20 −1 0 Magnus Rattray (University of Manchester) Gaussian process inference 5 10 15 15/06/10 20 13 / 25 Parameter learning A likelihood function for model parameters θ = {Bj , Sj , Dj }dj=1 and time-scale l obtained by integrating out the latent function f ! Z Y T L(θ, l) = p(yt |f (t), θ) p(f |l) df t=1 Parameters are obtained by maximum likelihood or Bayesian MCMC Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 14 / 25 Artificial Example – inferring f (t) and learning parameters 18 7 16 6 14 5 12 4 10 8 3 6 2 4 1 2 −2 5 10 15 4 5 −1 10 15 3 1 3 S D B 2 2 0.5 1 1 0 0 gene 1 gene 2 gene 3 gene 4 gene 5 Magnus Rattray (University of Manchester) gene 1 gene 2 gene 3 gene 4 gene 5 Gaussian process inference 0 gene 1 gene 2 gene 3 gene 4 15/06/10 gene 5 15 / 25 p53, data from Barenco et al. Genome Biology 2006 12 15 10 10 x(t) f(t) 8 6 5 4 2 0 0 2 4 6 t 8 10 12 0 0 2 4 6 t 8 10 12 Learning parameters and inferring f (t) from training genes Good correspondance with protein data from westerns Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 16 / 25 Elk-1, data from Amit et al. Nature Genetics 2007 Transcription factor concentration over time Training Gene 2 Training Gene 4 3 8 3.5 2.5 6 3 TF concentration 2 2.5 4 1.5 2 1 2 1.5 0.5 1 0 0.5 0 0 −2 0 2 4 6 0 8 1 2 3 4 5 6 7 8 0 1 2 3 4 5 time (h) time(h) time (h) Training Gene 1 Training Gene 3 Training Gene 5 6 7 8 6 7 8 3.5 3 2.5 3 2.5 2.5 2 2 2 1.5 1.5 1.5 1 1 0.5 0.5 0 0 1 0.5 0 −0.5 0 1 2 3 4 5 6 7 8 time (h) 0 1 2 3 4 5 6 7 8 0 1 2 time (h) 3 4 5 time (h) Learning parameters and inferring f (t) from training genes Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 17 / 25 Elk-1 target ranking Predicted target gene Predicted non−target gene 3 1.8 1.6 2.5 1.4 2 1.2 1 1.5 0.8 1 0.6 0.4 0.5 0.2 0 0 0 1 2 3 4 5 6 7 8 −0.2 0 1 2 time (h) 3 4 5 6 7 8 time (h) Likelihood can be used for ranking putative targets Example of good and bad fit shown above Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 18 / 25 Nonlinear response models Consider the following modification to the model, dxj (t) = Bj + Sj gj (f (t)) − Dj xj (t) , dt where gj (·) is a non-linear function. The differential equation can still be solved, Z t Bj xj (t) = + Sj e −Dj (t−u) gj (f (u)) du Dj 0 but is no longer linear in f (t). Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 19 / 25 Approximate inference 1 Laplace approximation to the required integral: T 1 −1 ˆ ˆ ˆ p(f | x) ' GP(f , A ) ∝ exp − f − f A f −f 2 where fˆ = argmaxp(f | x) and A = −∇∇ log p(f | y ) |f =fˆ . f 2 MCMC: Sampling f (t) on dense grid requires well-designed Metropolis-Hastings moves (Titsias et al. NIPS 2009) Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 20 / 25 Michaelis-Menten kinetics Michaelis-Menten activation model uses following non-linearity gj (f (t)) = e f (t) , γj + e f (t) where GP f (t) now models the log of the TF activity. 4 4 3 3 f(t) f(t) 2 2 1 1 0 −1 0 2 4 Magnus Rattray (University of Manchester) 6 t 8 10 12 0 0 Gaussian process inference 2 4 6 t 8 10 12 15/06/10 21 / 25 Repression Model We can use an analogous model of repression, gj (f (t)) = 1 γj + e f (t) Recall the solution to the ODE (with additional transient term) Z t Bj −Dj t xj (t) = αi e + + Sj e −Dj (t−u) gj (f (u)) du Dj 0 Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 22 / 25 Results for the repressor LexA gene id umuC dinI recN Bj 3 × 106 0.12 5.94 Dj 0.01 0.16 1.00 Sj 0.78 0.39 3.68 αj 1.72 0.05 −5.74 12 γj 1.06 0.81 1.10 4 10 3.5 6 f(t) x(t) 8 3 4 2.5 2 0 0 20 t 40 Magnus Rattray (University of Manchester) 60 Gaussian process inference 2 0 20 t 40 60 15/06/10 23 / 25 Summary We can use a model to infer the concentration of quantities that are difficult to measure: hidden variables Application to ranking putative targets of a phosphorylated TF General principle can be applied to any “open” biochemical system Learning requires integrating out the hidden variable to derive the likelihood function Model-selection requires further integrating out of model parameters to derive the marginal likelihood • Honkela et al. Proc. Natl. Acad. Sci. USA 107(17), 7793-7798 (2010). • Gao et al. Bioinformatics 24(16), i70-i75 (2008). • Lawrence, Rattray, Gao, Titsias. (2010) “Gaussian processes for missing species in biochemical systems” in Learning and Inference in Computational Systems Biology (MIT Press, Cambridge, MA. 2010). Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 24 / 25 Advertisements Two post-doc positions available to work on gene regulatory network inference – see my homepage http://www.cs.man.ac.uk/∼magnus Magnus Rattray (University of Manchester) Gaussian process inference 15/06/10 25 / 25