Partial Approximation of the Master Equation by the Fokker

advertisement
Partial Approximation of the Master Equation
by the Fokker-Planck Equation
Paul Sjöberg
Division of Scientific Computing, Department of Information Technology,
Uppsala University, P.O. Box 337, SE-751 05 Uppsala, Sweden
Paul.Sjoberg@it.uu.se
Abstract. The chemical master equation (CME) describes the probability for each internal state of the cell or rather the states of a model
of the cell. The number of states grows exponentially with the number of chemical species in the model, since each species corresponds to
one dimension in the state space. The CME can be approximated by a
Fokker-Planck equation (FPE), which can be solved numerically cheaper
than the CME. The FPE approximation of the full CME is not always
appropriate, while it can be suitable for a subspace in the state space. In
order to exploit the lower cost of the FPE approximation a method for
splitting the state space in two subspaces where one is approximated by
the FPE and one remains unapproximated is presented. A biologically
relevant problem in four dimensions is solved as an example.
1
Introduction
Chemical reactions are often accurately described with a system of ordinary
differential equations that is called the reaction-rate equations. Each equation
accounts for the change in concentration for a chemical species. This system of
equations describes a completely deterministic evolution of the chemical reactor.
The reaction-rate equations assume large number of molecules and that the
system is close to chemical equilibrium. These are reasonable assumptions in
chemical engineering, but not in molecular biology. When the reactor volume
is small with low copy numbers of some of the reacting molecules such a deterministic model is not always good enough. The discreteness of reactions and
molecules introduces a noise that cannot be disregarded [20]. Even if copy numbers are comparatively large there can be large fluctuations [5].
Stochastic models describe the system in a probabilistic way, where the probability for every possible state of the system is considered. This is equivalent
to determining the frequency of each state in an ensemble of cells, while the
reaction-rate equations is a model for the evolution of the mean state in the
ensemble. The theoretical importance of stochastic models has been known for
long time (see for instance the review [17]), but in recent years the experimental
results have been gathering as well [1], [6], [8] and as experimental methods are
developed to examine single molecules in single cells [3], [21] more experiments
can be expected.
B. Kågström et al. (Eds.): PARA 2006, LNCS 4699, pp. 637–646, 2007.
c Springer-Verlag Berlin Heidelberg 2007
638
P. Sjöberg
The chemical master equation (CME) is used to describe the time-evolution
of the probability density function of the state of the model. It is expensive to
solve numerically. This paper describes a splitting of the CME into one part that
is approximated by a Fokker-Planck equation (FPE) and one that is not.
2
Stochastic Biochemical Models
The physical foundation for the master equation as a stochastic model of biochemical reactions requires some assumptions to be valid [11]. The basic assumption for our model is that the system is well approximated by a Markov process,
that is a stochastic process with no memory. It is only the current state that
may affect the probability of the next reaction.
Definition 1. Let the system contain N chemical species Xi , i = 1 . . . N and
M reaction channels Rν , ν = 1 . . . M . Denote the number of Xi -molecules by
xi ∈ [0, ∞). Let the state of the system be determined by x = (x1 , x2 , . . . , xN )T .
Define cν δt as the probability for reaction Rν between a given set of reactant
molecules in the next time interval δt. Also define hν (x) as the number of such
reactive sets and the propensity of reaction Rν , wν (x) = hν (x)cν .
We assume that the biological cell is a homogeneous mixture of the reactive
molecules and some solvent. This is necessary due to the basic assumption since
there is no spatial component in the state variable and the process has no memory. The rationale for the well-stirred assumption is that the reactor volume can
be considered to be well-stirred if each reactive molecule collides a large number
of times with inert molecules between subsequent reaction events. The result
is that there is an equal probability for any two molecules to collide, which is
necessary for a reaction to take place.
A reaction is specified by a reaction propensity wν (x), which is defined as
the probability for reaction Rν per unit time, and stoichiometric coefficients
s = (s1 , s2 , . . . , sN )T determining the change in molecule numbers as the reaction
is fired. A state change can be written
x → x +s.
The propensity depends on the probability of collision and the probability for reaction given that a collision has occurred. The latter probability depends on the
velocity distribution of the reactant molecules (hence the temperature), the activation energy of the reaction, the probability of the molecules to be positioned
in a reactive orientation and so on.
The probability for collision of molecules is naturally dependent on the volume
of the reactor. Cells grow and divide which makes the propensities time dependent and introduce additional noise due to the random splitting of molecules
between daughter cells. This complication is not difficult to include in the presented framework. Here a constant volume assumption is made to avoid these
technicalities.
Partial Approximation of the Master Equation
3
639
Methods
3.1
The Chemical Master Equation
Consider a system as defined in Definition 1. Let −nν denote the stoichiometric
coefficients of reaction Rν . Summing over all reactions we now can write the
CME [20]:
M
M
dp(x, t) =
wν (x + nν )p(x + nν , t) −
wν (x)p(x, t) .
dt
ν=1
ν=1
(1)
The CME states that the change in probability for state x is simply the probability to reach state x from any other state x+nν (first sum), minus the probability
to leave state x for any other state (second sum).
The size of the state space grows exponentially with the number of molecular
species in the model. To compute probability density functions for larger and
larger biochemical models, the aim must be to allow solution of CMEs with
state spaces of higher and higher dimension. In that perspective, the impact of
parallelization is modest. It will add one or two dimensions to the set of solvable
problems. For larger problems it is necessary to use approximations.
3.2
The Fokker-Planck Approximation
For many problems the discrete CME has too many degrees of freedom to be
computationally tractable, even for rather low dimensions. If it is replaced by a
continuous approximation the discretization of the approximation has substantially fewer degrees of freedom.
By Taylor expansion of the CME and truncation after the second order terms
the following FPE is obtained [20]
⎞
N N
2
∂p(x, t)
∂(wν (x)p(x, t))
nνi nνj ∂ (wν (x)p(x, t)) ⎠
⎝
=
nνi
+
∂t
∂x
2
∂xi ∂xj
i
ν=1
i=1
i=1 j=1
M
⎛
N
≡
M
Aν p , (2)
ν=1
where nνi is the i:th element of nν .
3.3
Merging the Discrete CME and the Continuous FPE
Consider a CME for a system as described in Definition 1. Let X be the space of
all physical states (i.e. fulfilling xi ≥ 0 and other restrictions due to the model).
Divide the variables into two subsets X and Y such that span X ∪ Y = X so
that Y is suitable for FPE approximation, but X is not. Split the stoichiometric
coefficients accordingly so that μν and ην are vectors containing the elements of
nν corresponding to the variables in X and Y , respecetively.
640
P. Sjöberg
The CME (1) for a splitting where x ∈ span X and y ∈ span Y is now written
∂p
(x, y, t) =
wν (x + μν , y + ην )p(x + μν , y + ην ) −
wν (x, y)p(x, y, t) .
∂t
ν
ν
Introduce qν (x, t) = wν (x)p(x, t) and apply the FPE approximation to the variables in Y
∂p
(x, y, t) =
qν (x + μν , y + ην , t) − qν (x, y, t)
∂t
ν
qν (x + μν , y + ην , t) − qν (x + μν , y, t) + qν (x + μν , y, t) − qν (x, y, t)
=
ν
=
ν
⎛
⎞
2
∂qν
∂ qν ⎠
1
⎝
ηνi
+
ηνi ηνj
(x + μν , y, t)+
∂y
2
∂y
i
i ∂yj
i
i
j
qν (x + μν , y, t) − qν (x, y, t) .
(3)
ν
For example, if the sets X and Y have one member each, x ∈ X and y ∈ Y
where x ∈ {0, 1, . . . , k}, and maxν |μν | = 1 then (3) can be written as the
following sum of block-diagonal matrices (zero elements not written)
⎛
⎛
⎞
⎞
p(0, y)
Aν,0
⎜p(1, y)⎟
⎟
⎜
Aν,1
∂p
⎜
⎜
⎟
⎟
p = ⎜ . ⎟,
=
⎜
⎟p
.
..
∂t
⎝ .. ⎠
⎝
⎠
ν,μν =0
Aν,k
p(k, y)
⎛
⎞
−wν (0, y) Aν,1 + wν (1, y)
⎜
⎟
−wν (1, y) Aν,2 + wν (2, y)
⎟
⎜
⎜
⎟
..
..
+
⎜
⎟p
.
.
⎜
⎟
ν,μν =1 ⎝
−wν (k − 1, y) Aν,k + wν (k, y)⎠
−wν (k, y)
⎛
⎞
−wν (0, y)
⎟
⎜
−wν (1, y)
⎜Aν,0 + wν (0, y)
⎟
+
⎜
⎟p,
.
.
..
..
⎝
⎠
ν,μν =−1
Aν,k−1 + wν (k − 1, y) −wν (k, y)
where Aν,i is the FPE operator from (2) for the y-variable and constant x = i.
Since negative molecule numbers are nonsense, wν (0, y) = 0 if μν = 1. The
numerical boundary at the truncation of the state space at k can be chosen
more freely, but here wν (k, y) = 0 if μν = −1, assuming the probability at this
boundary is zero.
3.4
Discretization of the Fokker-Planck Equation
The FPE is discretized and solved on a grid that is considerably coarser than
the state space [4]. A finite volume method (see [18]) is used to discretize the
Partial Approximation of the Master Equation
641
state space in the subspace where the master equation is approximated with an
FPE. Reflecting barrier boundary conditions [7] are used which prescribe that
there is no probability current over the boundary.
3.5
The Stochastic Simulation Algorithm (SSA)
The well-known stochastic simulation algorithm (SSA) was proposed by Gillespie
in 1976 [10] for simulation of stochastic trajectories through the state space of
chemical reaction networks. Since then several improvements, approximations
and extensions [2] [9], [12], [16], have been made to the original algorithm that we
summarize here. SSA can be used to solve the unapproximated master equation
and is relevant for comparison to the FPE-method with respect to accuracy and
efficiency.
SSA was not designed for the purpose of solving the CME, which at the
time seemed “virtually intractable, both analytically and numerically” [10]. That
statement is not out of date for the vast majority of CME problems, but numerical solution for small reaction networks has become feasible.
Define the system as in Definition 1. The probability that the next reaction
is fired in the interval (t + τ, t + τ + δt) and is of type Rλ is
p(τ, λ)δt = wλ exp(−
M
wν τ ).
ν
SSA samples p(τ, λ) in order to take a statistically correct Monte Carlo step.
The algorithm for generating a trajectory is:
1. Initialize the state x, compute reaction propensities wν (x), ν = 1 . . . M and
set t = 0.
2. Generate a sample (τ, λ) from the distribution p(τ, λ)
3. Increase time by τ and change the state according to reaction Rλ
4. Store population and check for termination, otherwise go to step 2
By simulating an ensemble of trajectories the state probability distribution can
be estimated and a solution to the master equation is obtained. The error of
SSA can be estimated in a statistical sense. With a certain probability the error
is within the prescribed error bound [18].
3.6
Computational Efficiency
The convergence rate of solution by the FPE approximation is derived in [18].
For a certain error the computational work for the SSA is
WSSA () = CSSA −2 ,
while the work for the FPE is
WF P E () = CF P E −( r + s ) ,
N
1
642
P. Sjöberg
where CSSA and CF P E are independent of , N is the dimension of the problem
and r and s are the order of accuracy of the space- and time-discretizations,
respectively. For some systems, numerical solution of the FPE approximation
can be much more efficient than SSA [18]. If, like here, second order accurate
discretizations schemes are used in time and space, less work is needed for FPE
when N < 3 and a small . For the hybrid method N is the number of variables
in Y .
High order schemes have a great potential of allowing solution of larger problems, but the dimensionality of the feasible problems will still be low compared
to the actual number of dimensions in most molecular biological models. There
are no principal impediments for using higher order schemes.
4
An Example Problem
To demonstrate a problem where the partial approximation is useful, consider a
gene which is regulated directly by its product. The gene products binds cooperatively at two sites, S1 and S2 in the regulatory region of the gene. There are five
molecular species in the model (notation within parenthesis): the gene product a metabolite (M ), the gene with S1 and S2 unoccupied (DNA), the gene with S1
occupied by an M -molecule (DNAM ), the gene with both S1 and S2 occupied by
M -molecules (DNA2M ) and mRNA (RNA). The number of molecules is denoted
by the token for the corresponding species in lowercase characters, i.e. m, dna,
dnaM , dna2M and rna. Figure 1 shows the reactions of the model. A metabo-
M
RN A
M
M
S1 S2
M
M
M
MM
S1 S2
S1 S2
Fig. 1. The reaction scheme of the example
lite bound to S1 activates transcription while a metabolite bound to S2 blocks
RNA polymerase and shuts down mRNA production. Cooperativity is necessary
for metabolite binding to S2 and so strong that a metabolite bound to S1 will not
Partial Approximation of the Master Equation
643
unbind while S2 is occupied. The metabolite and mRNA are actively degraded.
This example is a simplified version of the transcription regulation example in
[13].
No reactions change the number of gene copies. There are exactly two copies
during the entire simulation, i.e. the probability for dna + dnaM + dna2M =
2 is 1. Therefore the state space is actually a four-dimensional surface in the
five-dimensional state space and we can reduce the state description to x̃ =
(dna, dnaM , rna, m)T and substitute dna2M = 2 − dna − dnaM in the propensity
functions. Table 1 lists the reactions of the reduced model. The symbol ∅ denotes
that no molecules are created in the reaction.
Table 1. The reactions of the example system
Reaction
Stoichiometric Coeff. Propensity
w1
RNA −−→
RNA + M
(0, 0, 0, 1)T
w1 = 0.05 · rna
w2
M −−→ ∅
(0, 0, 0, −1)T
w2 = 0.001 · m
w3
DNAM −−→
RNA + DNAM
(0, 0, 1, 0)T
w3 = 0.1 · dnaM
w4
RNA −−→
0
(0, 0, −1, 0)T
w4 = 0.005 · rna
w5
DNA + M −−→
DNAM
(−1, 1, 0, −1)T
w5 = 0.02 · dna · m
w6
DNAM −−→
DNA + M
(1, −1, 0, 1)T
w6 = 0.5 · dna
w7
DNAM + M −−→
DNA2M
(0, −1, 0, −1)T
w7 = 2 · 10−4 · dnaM · m
w8
T
DNA2M −−→ DNAM + M
(0, 1, 0, 1)
w8 = 1 · 10−11 · (2 − dna − dnaM )
5
Experiments
The example in Section 4 was simulated for 35 minutes (approximately one cell
generation time) using SSA and the hybrid approach. The two variables dna and
dnaM were represented by the CME part (X) and rna and m were represented
by the FPE part of the hybrid method. The FPE part (Y ) was discretized in
the rna × m-plane using 20 × 60 cells of equal length in the m-dimension and
of increasing length in the RNA-dimension. Let hk be the length of the k:th
cell in the RN A-dimension. The step lengths were determined by hk = (1 +
θ) hk−1 , i = 2 . . . 20 and h1 = 0.5. For time integration of (3) the implict second
order backward differentiation formula scheme (BDF-2) [14] was used. The time
step was chosen adaptively according to [15] with an error tolerance, measured in
the L1 -norm, of one percent in each time step. The system of equations that arise
in each time step in BDF-2 is solved using BiCGSTAB [19]. The implicit method
is suitable for handling the ubiquitous stiffness of molecular biological systems.
As initial data, a normal distribution in the rna × m-plane was truncated at
the boundaries of the computational domain and rescaled. The mean E(x) and
variance V (x) of the normal distribution before truncation was E(x) = (1, 5)T
and V (x) = (1, 1)T . The probability for being in a state where dna = 2 and
dnaM = 0 was set to 1.
For SSA the simulation was used to compute mean and variance, but not
an approximation of the probability density function. The initial state of each
644
P. Sjöberg
trajectory was sampled from the initial distribution that was used for the CMEFPE-hybrid simulation.
5.1
Results
Due to numerical approximation errors the solution is slightly negative at some
parts of the state space. In order to calculate the mean and standard deviation
those negative values are set to zero. Figure 2 shows the means and standard
deviations for the hybrid method compared to what is obtained by SSA using
104 trajectories. Figure 3 shows a projection of the probability density function.
standard dev.
mean value
2
1.5
1
0.5
0
0
10
20
2
1.5
1
0.5
0
0
30 35
time (min)
standard dev.
mean value
0.8
0.6
0.4
0.2
0
0
10
20
0.4
0.2
0
0
standard dev.
mean value
6
4
2
6
4
2
standard dev.
mean value
10
20
30 35
time (min)
150
100
50
20
30 35
8
0
0
30 35
200
10
20
10
time (min)
0
0
10
time (min)
8
20
30 35
0.6
30 35
10
10
20
0.8
time (min)
0
0
10
time (min)
30 35
time (min)
200
150
100
50
0
0
10
20
30 35
time (min)
Fig. 2. Mean values (left) and standard deviations (right) for, from top to bottom,
dna, dnaM , rna and m. The hybrid solution (solid line) and the SSA solution (broken
line) is plotted for each case.
Isolines of
p̃(rna, m) =
p(dna, dnaM , rna, m)
dna,dnaM
are plotted at four different times as indicated in the figure.
Partial Approximation of the Master Equation
t = 1.1 min
645
t = 4.1 min
20
20
rna
25
rna
25
15
15
10
10
5
5
50
100
150
200
250
300
m
t = 7.3 min
350
400
20
20
100
50
100
150
200
150
200
250
300
350
400
250
300
350
400
m
t = 23 min
rna
25
rna
25
50
15
15
10
10
5
5
50
100
150
200
m
250
300
350
400
m
Fig. 3. Snapshots of the numerical solution projected on the rna × m-plane
6
Conclusion
The splitting of the state space in two subspaces where one subspace is suitable
for approximation with the chemical master equation and one is not, extends
the range of problems that can solved numerically using the FPE approximation
of the master equation.
The main reason for introducing this hybrid method is that the FPE often
is a good approximation for some, but not all, dimensions in the state space.
Typically genes and gene configurations will not be well approximated. Such
dimensions need to be resolved by more grid points than the actual number of
states in that dimension. Furthermore the error in the FPE approximation is
bounded by the third derivatives [18]. For low copy numbers the probability of a
single dominating state is high resulting in probability peaks that cannot be well
approximated due to large third derivatives. There is also a point in avoiding
approximations when they are not needed.
Acknowledgements
I want to thank Per Lötstedt and Johan Elf for helpful discussions. This work was
funded by the Swedish Research Council, the Swedish National Graduate School
in Scientific Computing and the Swedish Foundation for Strategic Research.
646
P. Sjöberg
References
1. Blake, W.J., Kærn, M., Cantor, C.R., Collins, J.J.: Noise in eukaryotic gene expression. Nature 422, 633–637 (2003)
2. Bratsun, D., Volfson, D., Tsimring, L.S., Hasty, J.: Delay-induced stochastic oscillations in gene regulation. Proc. Natl. Acad. Sci. USA 102(41), 14593–14598
(2005)
3. Cai, L., Friedman, N., Xie, X.S.: Stochastic protein expression in individual cells
at the single molecule level. Nature 440, 358–362 (2006)
4. Elf, J., Lötstedt, P., Sjöberg, P.: Problems of high dimensionality in molecular
biology. In: Hackbusch, W. (ed.) High-dimensional problems - Numerical treatment
and applications, Proceedings of the 19th GAMM-Seminar, Leipzig, pp. 21–30
(2003), available at http://www.mis.mpg.de/conferences/gamm/2003/
5. Elf, J., Paulsson, J., Berg, O.G., Ehrenberg, M.: Near-critical phenomena in intracellular metabolite pools. Biophys. J. 84, 154–170 (2003)
6. Elowitz, M.B., Levine, A.J., Siggia, E.D., Swain, P.S.: Stochastic gene expression
in a single cell. Science 297, 1183–1186 (2002)
7. Gardiner, C.W.: Handbook of Stochastic Methods, 2nd edn. Springer, Heidelberg
(1985)
8. Gardner, T.S., Cantor, C.R., Collins, J.J.: Construction of a genetic toggle switch
in Escherichia coli. Nature 403, 339–342 (2000)
9. Gibson, M.A., Bruck, J.: Efficient exact stochastic simulation of chemical systems
with many species and many channels. J. Phys. Chem. 104, 1876–1889 (2000)
10. Gillespie, D.T.: A general method for numerically simulating the stochastic time
evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976)
11. Gillespie, D.T.: A rigorous derivation of the chemical master equation. Physica
A 188, 404–425 (1992)
12. Gillespie, D.T.: Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 115(4), 1716 (2001)
13. Goutsias, J.: Quasiequilibrium approximation of fast reaction kinetics in stochastic
biochemical systems. J. Chem. Phys. 184102 (2005)
14. Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations,
Nonstiff Problems, 2nd edn. Springer, Heidelberg (1993)
15. Lötstedt, P., Söderberg, S., Ramage, A., Hemmingsson-Frändén, L.: Implicit solution of hyperbolic equations with space-time adaptivity. BIT 42, 134–158 (2002)
16. Lu, T., Volfson, D., Tsimring, L., Hasty, J.: Cellular growth and division in the
Gillespie algorithm. Syst. Biol. 1, 121–128 (2004)
17. McQuarrie, D.A.: Stochastic approach to chemical kinetics. J. Appl. Prob. 4, 413–
478 (1967)
18. Sjöberg, P., Lötstedt, P., Elf, J.: Fokker-Planck approximation of the master equation in molecular biology. Comput. Visual. Sci. 2006 (accepted for publication)
19. van der Vorst, H.A.: BiCGSTAB: A fast and smoothly converging variant of the
Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci and Stat.
Comp 13, 631–644 (1992)
20. van Kampen, N.G.: Stochastic Processes in Physics and Chemistry, 2nd edn. Elsevier, Amsterdam (1992)
21. Yu, J., Xiao, J., Ren, X., Lao, K., Xie, X.S.: Probing gene expression in live cells,
one protein molecule at a time. Science 311, 1600 (2006)
Download