Stochastic modelling Bayesian parameter inference Dynamic Bayesian network inference Summary and conclusion

advertisement
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Stochastic modelling and Bayesian inference for
biochemical network dynamics
Darren Wilkinson
tinyurl.com/darrenjw
School of Mathematics & Statistics and
Centre for Integrated Systems Biology of Ageing & Nutrition
(CISBAN)
Newcastle University, UK
Warwick EPSRC Symposium 2009
Challenges in Scientific Computing
30th June–3rd July 2009
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Overview
Stochastic modelling in systems biology
Quick guide to stochastic chemical kinetics
Bayesian parameter inference for stochastic kinetic models
MCMC for the chemical Langevin equation (CLE)
Sequential likelihood-free MCMC for Markov process models
Inference for network structure from HTP data
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Pathways to senescence
The mammalian group within CISBAN is interested in cell
ageing and many aspects of the processes which lead to
cellular senescence, and study this using immortalised human
and rodent cell lines
DNA damage and repair processes are one important
component of this large and complex system, and therefore
molecules involved in damage signalling and repair are of
direct interest
Considerable interest in the role of p53 (“the guardian of the
genome”) in this context, and the development of models for
p53 regulation
p53 has many important functions, but of most relevance to
this discussion is its ability to activate DNA repair proteins in
response to DNA damage
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Single cell fluorescence microscopy
Loading movie...
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Single cell fluorescence microscopy
p53-CFP and Mdm2-YFP
p53/Mdm2 oscillations subsequent to gamma irradiation
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Single cell time course data
M3C8
30
20
flourescence
10
40
30
−10
10
0
20
flourescence
50
40
60
M3C4
5
10
15
20
25
30
35
0
20
time (hours)
M3C10
M3C1
30
40
30
40
60
flourescence
10
20
40
60
50
40
30
20
flourescence
10
time (hours)
80
0
0
10
20
30
40
time (hours)
0
10
20
time (hours)
Geva-Zatorsky et al (2006), Mol. Sys. Bio. [Uri Alon’s lab]
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Stochastic chemical kinetics
u species: X1 , . . . , Xu , and v reactions: R1 , . . . , Rv
Ri : pi1 X1 + · · · + piu Xu −→ qi1 X1 + · · · + qiu Xu , i = 1, . . . , v
In matrix form: PX −→ QX (P and Q are sparse)
S = (Q − P)0 is the stoichiometry matrix of the system
Xjt : # molecules of Xj at time t. Xt = (X1t , . . . , Xut )0
Reaction Ri has hazard (or rate law, or propensity) hi (Xt , ci ),
where ci is a rate parameter, c = (c1 , . . . , cv )0 ,
h(Xt , c) = (h1 (Xt , c1 ), . . . , hv (Xt , cv ))0 and the system evolves
as a Markov jump process
For mass-action stochastic kinetics,
u Y
Xjt
hi (Xt , ci ) = ci
, i = 1, . . . , v
pij
j=1
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Time change representation
Rit : # reactions of type Ri in (0, t], Rt = (R1t , . . . , Rvt )0
Xt − X0 = SRt (state updating equation)
For i = 1, . . . , v , Ni (t) are the count functions for
independent unit Poisson processes, so
Z t
hi (Xτ , ci )dτ
Rit = Ni
0
Putting N(t1 , . . . , tv ) =(N1 (t1 ), . . . , Nv (tv ))0 , we can write
Rt
Rt = N 0 h(Xτ , c)dτ to get:
Time-change representation of the Markov jump process
Z t
Xt − X0 = S N
h(Xτ , c)dτ
0
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
The Gillespie algorithm
1
Initialise the system at t = 0 with rate constants c1 , c2 , . . . , cv and
initial numbers of molecules for each species, x = (x1 , x2 , . . . , xu )0 .
2
For each i = 1, 2, . . . , v , calculate hi (x, ci ) based on the current
state, x.
Pv
Calculate h0 (x, c) ≡ i=1 hi (x, ci ), the combined reaction hazard.
3
4
Simulate time to next event, τ , as an Exp(h0 (x, c)) random
quantity, and put t := t + τ .
5
Simulate the reaction index, j, as a discrete random quantity with
probabilities hi (x, ci ) / h0 (x, c), i = 1, 2, . . . , v .
6
Update x according to reaction j. That is, put x := x + S (j) , where
S (j) denotes the jth column of the stoichiometry matrix S.
7
Output x and t.
8
If t < Tmax , return to step 2.
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Modelling large biological systems
BASIS — Biology of Ageing e-Science Integration and Simulation
BBSRC Bioinformatics and e-Science grant, and UK e-Science
GRID pilot project, now incorporated into CISBAN —
http://www.basis.ncl.ac.uk/
Modelling large complex systems with many interacting
components (with emphasis on biological mechanisms relating
to ageing)
SBML model database
Discrete stochastic simulation service running on a large
cluster (and a results database)
Distributed computing infrastructure for routine use (web
portal and web-service interface for GRID computing)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Stochastic kinetic model
Discrete stochastic kinetic model developed at Newcastle (by
Carole Proctor) for the key biomolecular interactions between
p53, Mdm2 and their response to DNA damage induced by
irradiation
More complex than a simple Lotka-Volterra system (17
species and 20 reactions), but essentially the same regulatory
feedback mechanism (Mdm2 synthesis depends on the level of
free p53, and Mdm2 encourages degradation of p53)
Some information about most kinetic parameters, but
considerable uncertainty for several — ideal for a Bayesian
analysis
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
300
Molecules
100
100
200
200
Molecules
300
400
400
500
Model structure and sample output
0
5
10
15
20
25
30
35
Time
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
0
5
10
15
20
25
30
35
Time
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Introduction
Stochastic modelling of p53 oscillations
Stochastic chemical kinetics
Computational infrastructure
Benefits of stochastic modelling
Several (essentially) deterministic models have been proposed
Only a stochastic model can mimic the behaviour of single
cells (observed individually, or at the level of a cell population
model, using FACS) and the average behaviour of the cell
population (using time-course microarrays — single peak)
Many possible sources of heterogeneity in the cell population
(though genetic differences should be minimal) - eg. cell size,
cell cycle phase
This discrete molecular-level model shows that intrinsic
stochasticity in gene expression is sufficient to explain the
observed heterogeneity (but does not rule out other sources),
and requires no artificial modelling devices such as time-delays
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Bayesian inference
Tuning model parameters so that output from the model “better
matches” experimental data is a standard optimisation problem,
but is problematic and unsatisfactory for a number of reasons:
Defining an appropriate “objective function” is not
straightforward if the model is stochastic or the measurement
error has a complex structure (not IID Gaussian)
The statistical concept of likelihood provides the “correct”
way of measuring the evidence in favour of a set of model
parameters, but typically requires computationally intensive
Monte Carlo procedures for evaluation in complex settings
Simple optimisation of the likelihood (the maximum likelihood
approach) is also unsatisfactory, as there are typically many
parameter combinations with very similar likelihoods (and the
likelihood surface is typically multi-modal, making global
optimisation difficult)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Markov chain Monte Carlo (MCMC)
Additionally, likelihood ignores any existing information known
about likely parameter values a priori, which can be very
useful for regularising the inference problem — better to base
inference on the posterior distribution
MCMC algorithms can be used to explore plausible regions of
parameter space in accordance with the posterior distribution
— these provide rich information
eg. rather than simple point estimates for parameter values,
can get plausible ranges of values, together with information
on parameter identifiability and confounding
MCMC algorithms are computationally intensive, but given
that evaluation of the likelihood is typically computationally
intensive anyway, nothing to lose and everything to gain by
doing a Bayesian analysis
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Bayesian inference for stochastic models
Bayesian inference techniques can be used to estimate the
parameters of non-linear stochastic process models from data
As well as giving insight into plausible parameter values and
the extent to which these are identified by the data, they also
allow one to asses the extent to which the stochastic model
fits the data at all
Ultimately, predictive quantitative statements can be made
about the behaviour of individual cells and cell populations
under a range of experimental conditions
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Bayesian inference
In principle it is possible to carry out rigorous Bayesian
statistical inference for the parameters of stochastic kinetic
models
Fairly detailed experimental data are required — eg.
quantitative single-cell time-course data derived from live-cell
imaging
The standard procedure uses GFP labelling of key reporter
proteins together with time-lapse confocal microscopy, but
other approaches are also possible
Global MCMC algorithms for exact inference for the true
discrete model (Boys, W, Kirkwood 2008) do not scale well to
problems of realistic size and complexity, due to the difficulty
of efficiently exploring large complex integer lattice state
spaces
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
The chemical Langevin equation (CLE)
The CLE is a diffusion approximation to the true Markov
jump process
Start with the time change representation
Z t
h(Xτ , c)dτ
Xt − X0 = S N
0
and approximate Ni (t) ' t + Wi (t), where Wi (t) is an
independent Wiener process for each i
Substituting in and using a little stochastic calculus gives:
The CLE as an Itò‚ SDE:
dXt = Sh(Xt , c) dt +
p
S diag{h(Xt , c)}S 0 dWt
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
MCMC-based Bayesian inference for the CLE
Inference for a non-linear multivariate stochastic differential
equation model observed partially, at discrete times and most
likely with error
This also turns out to be a rather challenging problem, due to
the intractability of the discrete-time transition densities, but
it is possible to develop computationally intensive MCMC
algorithms that are very effective (Golightly & W, 05, 06a,
06b, 08, 09)
However, the global MCMC algorithms (05, 08, 09) are very
computationally intensive, rely on the CLE being a reasonable
approximation, and are non-trivial to adapt to realistic
scenarios (mutiple data sets on different species and different
model variants) — sequential MCMC algorithms (06a, 06b)
are more flexible, and are not limited to the CLE
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
MCMC-based fully Bayesian inference for fast computer
models
Before worrying about the issues associated with slow
simulators, it is worth thinking about the issues involved in
calibrating fast deterministic and stochastic simulators, based
only on the ability to forward-simulate from the model
In this case it is often possible to construct MCMC algorithms
for fully Bayesian inference using the ideas of likelihood-free
MCMC (Marjoram et al 2003)
Here an MCMC scheme is developed exploiting forward
simulation from the model, and this causes problematic
likelihood terms to drop out of the M-H acceptance
probabilities
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Generic problem
Model parameters: c
(Stochastic) model output: x
(Noisy and/or partial) data: D
For simplicity suppose that c ⊥⊥ D|x (but can be relaxed)
We wish to treat the model as a “black box”, which can only
be forward-simulated
We are thinking about data relating to a single realisation of
the model (so no need to explicitly treat initial conditions),
but replicate runs and multiple conditions can be handled
sequentially (as will become clear)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
MCMC-based Bayesian inference
Target: π(c|D)
Specify a “measurement error model”, π(D|x) — eg. just a
product of Gaussian or t densities
Generic MCMC scheme:
Propose c ? ∼ f (c ? |c)
Accept with probability min{1, A}, where
A=
π(c ? ) f (c|c ? ) π(D|c ? )
×
×
π(c)
f (c ? |c)
π(D|c)
π(D|c) is the “marginal likelihood” (or “observed data
likelihood”, or...)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Special case: deterministic model
Deterministic function g (·) such that x = g (c)
Then
π(D|c) = π(D|c, g (c))
= π(D|c, x)
= π(D|x)
Here π(D|x) is just the “measurement error model” — eg.
simple product of Gaussian or t densities
This setup is somewhat simplistic for the deterministic case,
but we are really more concerned with the stochastic case...
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Stochastic model
Can’t get at the marginal likelihood directly, so make the
target π(c, x|D), where x is the “true” simulator output which
led to the observed data...
Clear that we can marginalise out x if necessary, but typically
of inferential interest anyway
Use ideas from “likelihood-free MCMC” (Marjoram et al,
2003)
Propose (c ? , x? ) ∼ f (c ? |c)π(x? |c ? ), so that x? is a forward
simulation from the (stochastic) model based on the proposed
new c ?
π(c ? ) f (c|c ? ) π(D|x? )
A=
×
×
π(c)
f (c ? |c)
π(D|x)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
“Likelihood-free” MCMC
Again π(D|x) is a simple measurement error model...
Crucially, because the proposal exploits a forward simulation,
the acceptance probability does not depend on the likelihood
of the simulator output — important for complex stochastic
models
This scheme is completely general, and works very well
provided that |D| is small
Problem: If |D| is large, the MCMC scheme will mix very
poorly (very low acceptance rates)
Solution: Exploit the Markovian structure of the process, and
adopt a sequential approach, updating one (or a small number
of) observation(s) at a time...
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Sequential likelihood-free algorithm
Data Dt = {d1 , . . . , dt }, D ≡ Dn . Sample paths
xt ≡ {xs |t − 1 < s ≤ t}, t = 2, 3, . . . , n, so that
x ≡ {x2 , . . . , xn }.
1
2
Assume at time t we have a (large) sample from π(c, xt |Dt )
(for time 0, initialise with sample from prior)
Run an MCMC algorithm which constructs a proposal in two
stages:
1
2
3
First sample (c ? , xt? ) ∼ π(c, xt |Dt ) by picking at random and
perturbing slightly (sampling from the kernel density estimate)
Next sample x?t+1 by forward simulation from π(x?t+1 |c ? , xt? )
?
Accept/reject (c ? , xt+1
) with
A=
3
?
π(dt+1 |xt+1
)
π(dt+1 |xt+1 )
Output π(c, xt+1 |Dt+1 ), put t : = t + 1, return to step 2.
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Advantages of the sequential algorithm
In the presence of measurement error, the sequential
likelihood-free scheme is effective, and is much simpler than a
more efficient MCMC approach
The likelihood-free approach is easier to tailor to non-standard
models and data
The essential problem is that of calibration of complex
stochastic computer models
For slow stochastic models, there is considerable interest in
developing fast emulators and embedding these into MCMC
algorithms (as millions of forward-simulations from the model
will typically be required)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Building emulators for slow simulators
Use Gaussian process regression to build an emulator of a slow
deterministic simulator
Obtain runs on a carefully constructed set of design points
(eg. a Latin hypercube) — easy to exploit parallel computing
hardware here
For a stochastic simulator, many approaches are possible
(Mixtures of) Dirichlet processes (and related constructs) are
potentially quite flexible
Can also model output parametrically (say, Gaussian), with
parameters modelled by (independent) Gaussian processes
Will typically want more than one run per design point, in
order to be able to estimate distribution
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Parameter inference for the p53 model
−16.6
−9.3
−9.0
−8.7
−13.5990
(e)
(f)
(g)
2.086
−8.9950
log(kdephosp53)
200
Density
0
100
100 150
Density
50
0
200
0
2.082
−8.9958
300
log(kphosp53)
400
log(kinactATM)
3000
Density
−13.6010
log(kactATM)
log(kphosMdm2)
(d)
0 1000
Density
0
400
4
3
0
1
2
Density
3
2
Density
1
0
−16.9
Density
(c)
800 1200
(b)
4
(a)
−8.995 −8.980
log(kdephosMdm2)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
−13.820
−13.810
log(krepair)
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Posterior correlations for the p53 model
log(kdephosp53)
log(kdephosMdm2)
0.7
0.8
0.5
0.6
0.1
0.2
−0.1
0
−0.3
−0.2
−0.5
−0.4
−0.7
−0.6
−0.9
−0.8
−1
log(krepair)
log(kdephosMdm2)
log(kphosMdm2)
log(kdephosp53)
0.3
0.4
log(kphosp53)
log(kinactATM)
0.9
1
log(kactATM)
log(kinactATM)
log(kactATM)
log(kphosp53)
log(kphosMdm2)
log(krepair)
Correlation
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Predictive fit for the p53 model
2
3
4
10
20
30
p53
40
0
0
10
20
30
Time (hours)
Time (hours)
Cell 5
Cell 6
Cell 7
10
20
30
Time (hours)
0
2
4
6
8
Time (hours)
200
40
p53
50 100
p53
0
0
50 100
50 100
p53
200
200
300
300
300
Time (hours)
0
0
50
50 100
0
0
100 150 200 250
200
p53
p53
100
0
50
100
0
1
Cell 4
300
Cell 3
200
300
200
p53
Cell 2
300
Cell 1
0
5 10
20
Time (hours)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
30
0
5
10
20
Time (hours)
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Why sequential rather than global MCMC?
Can develop global algorithms for single time course (single
cell), but need to condition on data from multiple time
courses (multiple cells) and multiple model variants
In principle this could be handled by developing a hierarchical
model framework, but this will be extremely difficult and
time-consuming in practice
Alternatively, can use the sequential MCMC methods
previously described — it is then easy to handle multiple cells
by taking the posterior distribution from one cell as the prior
distribution for the next
Model variants (such as gene knockouts) can be handled
similarly
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
Calibration of complex simulation models
CaliBayes — Integration of GRID-based post-genomic data
resources through Bayesian calibration of biological simulators
BBSRC Bioinformatics and e-Science II project
http://www.calibayes.ncl.ac.uk/
Bayesian model calibration is concerned with the problem of
parameter estimation, model validation, design and analysis
based only on the ability to forward simulate from the model
It is particularly appropriate for slow and/or complex models
and/or data, where likelihood-based methods are
computationally infeasible
Provides a flexible and generic framework for parameter
inference problems in Systems Biology
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
MCMC algorithms and the CLE
Sequential algorithms for stochastic model calibration
Inference for the p53 model
Sequential algorithms and CaliBayes
CaliBayes service-oriented architecture
CaliBayes simulator interface — a standard SOAP
web-services interface to an SBML-compliant simulator (eg.
BASIS, COPASI, etc.). Could be Gillespie, Langevin,
deterministic, hybrid, or an emulator...
CaliBayes calibration engine — the main back-end
computational service implementing the Bayesian sequential
MCMC algorithm for model calibration based on updating a
single block.
CaliBayes data integrator — the main user-level calibration
service. This service allows the calibration of a model based
on multiple time series, which may consist of measurements of
different species or other model components, and at different
time points.
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
High throughput (HTP) data
Although we would prefer to use high-resolution single-cell
time course data for all of our statistical modelling, such data
is difficult to obtain in a high throughput (HTP) fashion for
large numbers of proteins
We therefore wish to integrate HTP data into our modelling
approach. Such data is usually of lower resolution and
possessing relatively poor dynamic range, but provides
(simultaneous) measurement of very large numbers of
biological features
HTP data is potentially useful for uncovering network
structure
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
Time course microarray data
Color Key
−4
−2
0
2
4
Value
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
m4
m3
m2
m1
w4
m0
w3
w2
w1
w0
YHR087W
DCS1
YMR196W
YNL200C
APE2
SOL4
YOL083W
YBR108W
ECM4
CRG1
PYK2
YDR379C−A
UIP3
GAD1
TPS2
YBR139W
TFS1
ATG19
PGM2
YKL133C
KIN82
SRT1
YOR020W−A
YNL115C
YPR098C
EMI2
PTP1
RBK1
IRC24
NGL3
CYC7
YJR008W
FYV10
DAP1
ATG8
YDL027C
EMP46
YKL162C
MEF2
CIT1
UGA2
MPM1
AVO2
GSM1
HYR1
YLR297W
ROM1
GSP2
NTH1
YKL151C
SIA1
STF1
YGR250C
PCD1
IPK1
YBR053C
ATH1
HOR7
YDC1
XKS1
YGR149W
AYR1
YBR085C−A
IKS1
TGL2
TPK1
GPD1
YGR146C
YBR056W
HXK1
YJL070C
YPC1
LAP4
ATG9
AFR1
RNY1
PNC1
UBC8
ALG14
YNL305C
MRP8
YER053C−A
YEF1
GAL7
YPL119C−A
ETR1
MIG2
GLK1
HFD1
YLR149C
YPR015C
PFK26
SDS24
UGX2
FBP26
SPO12
FIS1
ALG13
ARA1
YJL163C
GGA1
YLR345W
YNL092W
FSH3
YEL047C
UBP11
YOL048C
YGR053C
YBR204C
PEX30
LPP1
ATG7
HUG1
RNR3
RNR2
RAD51
YOR114W
DUN1
RNR4
PLM2
GYP5
YNL058C
SHC1
YPS6
VPS73
YJR098C
YET1
GAL3
YDL124W
GDE1
PTP2
PEP4
MMR1
GTT1
SLT2
YGL157W
TMA10
INH1
YJL132W
YJR096W
PDR10
SYM1
QCR10
ICL2
GRX1
TSL1
HSP26
ALD2
KTR2
PRX1
FUN14
PLB1
PIG1
COS111
GDH3
YCL033C
GPM2
HOR2
COX5B
YJL068C
YHR140W
GRE3
RAD16
YBR013C
UBI4
FMP12
YLR001C
DIN7
GPH1
DDR2
YKL107W
YGL258W−A
GND2
BNA2
MIP6
THI13
DIT1
GPG1
CIT3
YGP1
YLR356W
YER158C
YKL091C
STF2
RCN2
PEX27
COS4
PRB1
YOR152C
RRD2
DCS2
PUT1
PAI3
OM45
EDC2
MPC54
SSE2
CAT8
MSN4
SOM1
AMS1
DMC1
XBP1
ATO2
IML2
NDE2
SRL3
YDL057W
YPK2
FMP46
YGR153W
NCA3
FMP48
MTH1
SSP1
TRX3
CYT1
QCR2
MSC1
YFL054C
SDH4
PCK1
IDP2
DAN4
STE6
YJL133C−A
GDB1
NDE1
PRM8
GCY1
GOR1
SPR3
YFR017C
PIC2
YAK1
QNQ1
YBR285W
HPA2
YKL161C
YDR018C
YNR014W
SMF1
YJL045W
AGX1
NQM1
EDS1
CDA1
MOH1
DAN2
YFR026C
SUE1
HBT1
YMR085W
HSP31
NRG2
OSW2
CAT2
PHM7
SNZ3
YGR067C
HXT9
YFR012W−A
PES4
SPS19
UIP4
YLR414C
YPL230W
RMD5
COS12
HBN1
AGP3
FMP33
GLC3
TPS1
YOR227W
MCR1
SOD2
YBR284W
FRT2
YIR014W
YGR127W
YJL016W
YLR177W
YOL114C
SKM1
GSY1
SDP1
TMA17
ATG22
PET10
YHR138C
YKR011C
PHM8
SAF1
PPE1
YIL055C
HSP42
PRY1
PAU3
UBC5
YJL144W
YKL121W
STB2
YSP3
OPI3
YPR157W
YMR262W
TPO2
YET2
YGR066C
SSK22
YNL134C
ECM38
YOR215C
YHL044W
YBL039W−A
TPO4
YDL199C
YCL049C
ESC8
YML131W
YSW1
PSY3
YBL101W−A
DOG2
YPR117W
BOP2
YOR192C−C
YML002W
YCL021W−A
FMO1
CLN3
SPO1
DOG2
WSC3
YOR289W
DBP1
PDH1
YRO2
ADR1
YBL005W−A
LSC2
YOR052C
YDR357C
STP4
SDH1
SDH2
ENO1
HAP4
YBR147W
WSC4
YJL052W
YJL052W
YJL052W
TDH1
YGL081W
GIP2
FMP16
CTT1
GSC2
DAL1
DIA3
CAR2
GRE1
HSP12
IRC15
GSY2
YCL073C
DSF1
PRM5
YLR050C
YTP1
SMF3
HMX1
VMR1
YPL088W
RTA1
GUP2
POG1
MAG1
YLR108C
PIR3
BAG7
FLO1
SKS1
SPI1
YOR062C
YPL014W
SGA1
FLO10
YEL073C
YNR034W−A
YDR391C
YIL024C
LEE1
TSA2
YOL162W
PAU2
YOL163W
SRX1
YGL010W
GTT2
NRG1
DUR1,2
RTS3
YOR338W
MEI4
FMP40
YOR378W
MTL1
CWP1
TIP1
SER3
GAP1
TIS11
RGM1
HXT4
YPS3
CRH1
PST1
RME1
YLR194C
YJL213W
ARO10
IZH4
FRE4
YBR184W
FRE7
SSU1
ATF1
SCW10
CLB6
YOL019W
YLR049C
YPL158C
ASH1
SIC1
NIS1
DSE3
PCL9
PIR1
HPF1
YIL169C
PEX21
YIL169C
YOR316C−A
PET122
HO
ISR1
BUD9
SCW11
AMN1
DSE4
YNL046W
EGT2
PHM6
YNL024C
HXT2
DBP2
YOL014W
PLB2
MET6
MET2
MET14
SRP40
MMP1
FIT1
MET16
FCY21
U46493−1
HHT1
HTB1
HHF1
HHT2
HTA1
CLN2
YNL217W
MNN1
CLN1
YOX1
SVS1
AXL2
CSI2
PCL1
YPL264C
PRM1
AGA1
SUN4
HMS2
TOF2
CRR1
PRY2
YNL300W
CTS1
FAA3
PRY3
DSE1
DSE2
HAP3
FMP23
TMT1
SNO1
SNZ1
ARG4
YGL117W
MDG1
ARG8
CUP1−1
STB6
ATR1
ICY2
ADH5
KAR1
YOR390W
YPL279C
DED1
TGS1
ACO2
YAH1
OPT1
YJR111C
HIS4
YKL061W
NHP6B
NHP6B
MET22
GGC1
HOM3
CPA2
TRP4
ECM40
ORT1
PPM2
MET10
NMA1
GFD2
ABF1
YDR222W
STR3
ARG80
YAR068W
YAR066W
YKR041W
SRL1
PHO4
SFG1
YHP1
HTZ1
IXR1
HTA2
HTB2
HTA2
HEM3
FSH1
HGH1
AIR1
YBR141C
AAH1
LIA1
YJR124C
TRM1
YEL048C
NPL6
MSH1
AFFX−Sc−YER148W
UTP20
MTR4
YEL018W
BRE1
RAP1
YNL097C−A
AAC3
TBF1
YBR028C
TOS1
DAD3
YGL101W
MTW1
HEK2
RSC4
PAM18
SET2
REB1
HIS6
RPC17
YEL018W
RPB9
NNF1
STH1
AAT1
YLR042C
TSC10
YBR016W
PAP2
ANB1
SMD1
RSN1
TOA2
YMC2
NHP6A
ECM22
HTL1
YMR230W−A
LEU9
ARP9
DLT1
YGR272C
YDL063C
YGR271C−A
DHR2
FAL1
ARX1
RRP1
RKI1
TIF4631
MAK16
RPC31
DUS4
YBL028C
DBP8
MAK21
URA7
SOF1
PWP2
ALB1
IMP4
FAF1
NOP9
NOP13
YIL127C
NOP53
NOP4
RRP17
YLR363W−A
LRP1
ECM1
RRP5
RRP8
FYV7
MPP10
NOP7
HCA4
UTP23
NMD3
ATO3
RRP45
RSA3
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
The sparse VAR(1) model approximates the CLE
We have already seen how the true Markov jump process can
be approximated by the CLE
We can go further and linearise the CLE to get a multivariate
Gaussian Ornstein-Uhlenbeck (OU) process
This OU process can be time-discretised exactly to give a
VAR(1) model with sparse auto-regressive matrix (the sparsity
of this matrix derives from the sparsity of the stoichiometry
matrix of the CLE)
This suggests that the sparse VAR(1) model might be a good
top-down model for inferring the underlying structure of
biochemical networks from dynamic HTP data
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
Sparse VAR(1) model
Observe a p-dimensional vector Xt , at each of n time points,
t = 1, . . . , n (with p >> n)
Xt+1 = µ + A(Xt − µ) + t ,
t ∼ N(0, V )
The p × p matrix A is assumed to be sparse (ie. most
elements are expected to be exactly zero)
Sparsity can be modelled in many ways. Simplest:
Pr(aij 6= 0) = π, ∀i, j,
aij |aij 6= 0 ∼ N(0, σ 2 ), ∀i, j
The non-zero structure of A can be associated with a graph
(network) of dynamic interactions (non-zero aij implies arc
from node j to node i)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
Inference for model parameters and structure from data
Can get a point estimate for the network structure by
computing a shrinkage estimate of A and then thresholding
(Opgen-Rhein & Strimmer, 2007)
Can also use Bayesian MCMC methods to explore the space
of plausible interaction graphs
MCMC methods allow computation of useful quantities such
as Pr(aij 6= 0|D)
Inference for graphs is a hard problem...
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
MCMC for sparse VAR(1) models
RJ-MCMC algorithm to explore both graphical structure and
model parameters (auto-regressive coefficients, mean vector,
variance components) — routine to develop and implement,
but exhibits poor mixing in high-dimensional settings
Conditional on the graphical structure, possible (but messy)
to develop a variational algorithm which gives an approximate
marginal log-likelihood for the model after a few iterations —
can embed this in a very simple MCMC algorithm to explore
just the graphical structure
Even this algorithm mixes poorly for large p (say, p > 200),
2
but there are 2p graphs, after all...
Could probably get reasonable speed-up by using (parallel)
sparse matrix algorithms
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
1.0
Connectivity matrix for the yeast data
0.0
0.2
0.4
0.6
0.8
V1
V2
V3
V4
V5
V6
V7
V8
V9
V10
V11
V12
V13
V14
V15
V16
V17
V18
V19
V20
V21
V22
V23
V24
V25
V26
V27
V28
V29
V30
V31
V32
V33
V34
V35
V36
V37
V38
V39
V40
V41
V42
V43
V44
V45
V46
V47
V48
V49
V50
V51
V52
V53
V54
V55
V56
V57
V58
V59
V60
V61
V62
V63
V64
V65
V66
V67
V68
V69
V70
V71
V72
V73
V74
V75
V76
V77
V78
V79
V80
V81
V82
V83
V84
V85
V86
V87
V88
V89
V90
V91
V92
V93
V94
V95
V96
V97
V98
V99
V100
V101
V102
V103
V104
V105
V106
V107
V108
V109
V110
V111
V112
V113
V114
V115
V116
V117
V118
V119
V120
V121
V122
V123
V124
V125
V126
V127
V128
V129
V130
V131
V132
V133
V134
V135
V136
V137
V138
V139
V140
V141
V142
V143
V144
V145
V146
V147
V148
V149
V150
V151
V152
V153
V154
V155
V156
V157
V158
V159
V160
V161
V162
V163
V164
V165
V166
V167
V168
V169
V170
V171
V172
V173
V174
V175
V176
V177
V178
V179
V180
V181
V182
V183
V1
V5
V9
V14
V19
V24
V29
V34
V39
V44
V49
V54
V59
V64
V69
V74
V79
V84
V89
V94
V99
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
V105
V111
V117
V123
V129
V135
V141
V147
V153
V159
V165
V171
V177
V183
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
High-throughput data
Top-down models
Sparse VAR(1) models
Inferred network for the yeast data
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Summary
Acknowledgements
References
Summary
Stochastic models are useful in many areas of systems biology,
due to intrinsic stochasticity of intra-cellular processes, but are
especially relevant in the context of modelling damage and
repair processes associated with ageing
Fitting stochastic models to data is challenging due primarily
to the difficulty of evaluating the likelihood of the data for a
given parameter set
Bayesian methods can be used for parameter estimation, and
provide much richer information than other approaches
It is possible to develop inferential algorithms which rely only
on the ability to forward simulate from the model
For slow simulation models, it can be useful to develop fast
emulators of the process to be used in place of the full model
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Summary
Acknowledgements
References
Acknowledgements
Stochastic kinetic models: Richard Boys, Leendert Hamoen,
Tom Kirkwood, Colin Gillespie, Andrew Golightly, Conor
Lawless, Pete Milner, Daryl Shanley, Carole Proctor, Wan Ng,
Jan-Willem Veening (funding from BBSRC, EPSRC, MRC,
DTI, Unilever)
Likelihood-free modelling: Richard Boys, Jeff Chen, Colin
Gillespie, Daniel Henderson, Eryk Wolski, Jake Wu (funding
from BBSRC)
Inference for diffusions: Andrew Golightly (funding from
EPSRC)
Sparse VAR(1) modelling: Richard Boys, Colin Gillespie,
Amanda Greenall, Adrian Houghton, Guiyuan Lei, David
Lydall (funding from BBSRC and EPSRC)
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Summary
Acknowledgements
References
More acknowledgements...
Funding: BBSRC, EPSRC, Newcastle University (CISBAN)
People: The work described here was contributed to by more
or less everyone associated with CISBAN:
www.cisban.ac.uk
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Stochastic modelling
Bayesian parameter inference
Dynamic Bayesian network inference
Summary and conclusion
Summary
Acknowledgements
References
Boys, R. J., D. J. Wilkinson and T. B. L. Kirkwood (2008) Bayesian inference
for a discretely observed stochastic kinetic model. Statistics and
Computing, 18(2), 125–135.
Golightly, A. and D. J. Wilkinson (2008) Bayesian inference for nonlinear
multivariate diffusion models observed with error. Computational Statistics and
Data Analysis, 52(3), 1674–1693.
Henderson, D. A., Boys, R. J., Krishnan, K. J., Lawless, C., Wilkinson, D. J.
(2009) Bayesian emulation and calibration of a stochastic computer model of
mitochondrial DNA deletions in substantia nigra neurons, Journal of the
American Statistical Association. 104(485):76-87.
Henderson, D. A., Boys, R. J., Proctor, C. J., Wilkinson, D. J. (2009) Linking
systems biology models to data: a stochastic kinetic model of p53 oscillations,
A. O’Hagan and M. West (eds.) Handbook of Applied Bayesian Analysis, Oxford
University Press, in press.
Wilkinson, D. J. (2009) Stochastic modelling for quantitative description of
heterogeneous biological systems, Nature Reviews Genetics. 10(2):122-133.
Wilkinson, D. J. (2006) Stochastic Modelling for Systems Biology. Chapman &
Hall/CRC Press.
tinyurl.com/darrenjw
Darren Wilkinson — 30/6/09 Warwick EPSRC Symposium
Modelling and inference for biochemical network dynamics
Download