Partial Approximation of the Master Equation by the Fokker-Planck Equation Paul Sjöberg Division of Scientific Computing, Department of Information Technology, Uppsala University, P.O. Box 337, SE-751 05 Uppsala, Sweden Paul.Sjoberg@it.uu.se Abstract. The chemical master equation (CME) describes the probability for each internal state of the cell or rather the states of a model of the cell. The number of states grows exponentially with the number of chemical species in the model, since each species corresponds to one dimension in the state space. The CME can be approximated by a Fokker-Planck equation (FPE), which can be solved numerically cheaper than the CME. The FPE approximation of the full CME is not always appropriate, while it can be suitable for a subspace in the state space. In order to exploit the lower cost of the FPE approximation a method for splitting the state space in two subspaces where one is approximated by the FPE and one remains unapproximated is presented. A biologically relevant problem in four dimensions is solved as an example. 1 Introduction Chemical reactions are often accurately described with a system of ordinary differential equations that is called the reaction-rate equations. Each equation accounts for the change in concentration for a chemical species. This system of equations describes a completely deterministic evolution of the chemical reactor. The reaction-rate equations assume large number of molecules and that the system is close to chemical equilibrium. These are reasonable assumptions in chemical engineering, but not in molecular biology. When the reactor volume is small with low copy numbers of some of the reacting molecules such a deterministic model is not always good enough. The discreteness of reactions and molecules introduces a noise that cannot be disregarded [20]. Even if copy numbers are comparatively large there can be large fluctuations [5]. Stochastic models describe the system in a probabilistic way, where the probability for every possible state of the system is considered. This is equivalent to determining the frequency of each state in an ensemble of cells, while the reaction-rate equations is a model for the evolution of the mean state in the ensemble. The theoretical importance of stochastic models has been known for long time (see for instance the review [17]), but in recent years the experimental results have been gathering as well [1], [6], [8] and as experimental methods are developed to examine single molecules in single cells [3], [21] more experiments can be expected. B. Kågström et al. (Eds.): PARA 2006, LNCS 4699, pp. 637–646, 2007. c Springer-Verlag Berlin Heidelberg 2007 638 P. Sjöberg The chemical master equation (CME) is used to describe the time-evolution of the probability density function of the state of the model. It is expensive to solve numerically. This paper describes a splitting of the CME into one part that is approximated by a Fokker-Planck equation (FPE) and one that is not. 2 Stochastic Biochemical Models The physical foundation for the master equation as a stochastic model of biochemical reactions requires some assumptions to be valid [11]. The basic assumption for our model is that the system is well approximated by a Markov process, that is a stochastic process with no memory. It is only the current state that may affect the probability of the next reaction. Definition 1. Let the system contain N chemical species Xi , i = 1 . . . N and M reaction channels Rν , ν = 1 . . . M . Denote the number of Xi -molecules by xi ∈ [0, ∞). Let the state of the system be determined by x = (x1 , x2 , . . . , xN )T . Define cν δt as the probability for reaction Rν between a given set of reactant molecules in the next time interval δt. Also define hν (x) as the number of such reactive sets and the propensity of reaction Rν , wν (x) = hν (x)cν . We assume that the biological cell is a homogeneous mixture of the reactive molecules and some solvent. This is necessary due to the basic assumption since there is no spatial component in the state variable and the process has no memory. The rationale for the well-stirred assumption is that the reactor volume can be considered to be well-stirred if each reactive molecule collides a large number of times with inert molecules between subsequent reaction events. The result is that there is an equal probability for any two molecules to collide, which is necessary for a reaction to take place. A reaction is specified by a reaction propensity wν (x), which is defined as the probability for reaction Rν per unit time, and stoichiometric coefficients s = (s1 , s2 , . . . , sN )T determining the change in molecule numbers as the reaction is fired. A state change can be written x → x +s. The propensity depends on the probability of collision and the probability for reaction given that a collision has occurred. The latter probability depends on the velocity distribution of the reactant molecules (hence the temperature), the activation energy of the reaction, the probability of the molecules to be positioned in a reactive orientation and so on. The probability for collision of molecules is naturally dependent on the volume of the reactor. Cells grow and divide which makes the propensities time dependent and introduce additional noise due to the random splitting of molecules between daughter cells. This complication is not difficult to include in the presented framework. Here a constant volume assumption is made to avoid these technicalities. Partial Approximation of the Master Equation 3 639 Methods 3.1 The Chemical Master Equation Consider a system as defined in Definition 1. Let −nν denote the stoichiometric coefficients of reaction Rν . Summing over all reactions we now can write the CME [20]: M M dp(x, t) = wν (x + nν )p(x + nν , t) − wν (x)p(x, t) . dt ν=1 ν=1 (1) The CME states that the change in probability for state x is simply the probability to reach state x from any other state x+nν (first sum), minus the probability to leave state x for any other state (second sum). The size of the state space grows exponentially with the number of molecular species in the model. To compute probability density functions for larger and larger biochemical models, the aim must be to allow solution of CMEs with state spaces of higher and higher dimension. In that perspective, the impact of parallelization is modest. It will add one or two dimensions to the set of solvable problems. For larger problems it is necessary to use approximations. 3.2 The Fokker-Planck Approximation For many problems the discrete CME has too many degrees of freedom to be computationally tractable, even for rather low dimensions. If it is replaced by a continuous approximation the discretization of the approximation has substantially fewer degrees of freedom. By Taylor expansion of the CME and truncation after the second order terms the following FPE is obtained [20] ⎞ N N 2 ∂p(x, t) ∂(wν (x)p(x, t)) nνi nνj ∂ (wν (x)p(x, t)) ⎠ ⎝ = nνi + ∂t ∂x 2 ∂xi ∂xj i ν=1 i=1 i=1 j=1 M ⎛ N ≡ M Aν p , (2) ν=1 where nνi is the i:th element of nν . 3.3 Merging the Discrete CME and the Continuous FPE Consider a CME for a system as described in Definition 1. Let X be the space of all physical states (i.e. fulfilling xi ≥ 0 and other restrictions due to the model). Divide the variables into two subsets X and Y such that span X ∪ Y = X so that Y is suitable for FPE approximation, but X is not. Split the stoichiometric coefficients accordingly so that μν and ην are vectors containing the elements of nν corresponding to the variables in X and Y , respecetively. 640 P. Sjöberg The CME (1) for a splitting where x ∈ span X and y ∈ span Y is now written ∂p (x, y, t) = wν (x + μν , y + ην )p(x + μν , y + ην ) − wν (x, y)p(x, y, t) . ∂t ν ν Introduce qν (x, t) = wν (x)p(x, t) and apply the FPE approximation to the variables in Y ∂p (x, y, t) = qν (x + μν , y + ην , t) − qν (x, y, t) ∂t ν qν (x + μν , y + ην , t) − qν (x + μν , y, t) + qν (x + μν , y, t) − qν (x, y, t) = ν = ν ⎛ ⎞ 2 ∂qν ∂ qν ⎠ 1 ⎝ ηνi + ηνi ηνj (x + μν , y, t)+ ∂y 2 ∂y i i ∂yj i i j qν (x + μν , y, t) − qν (x, y, t) . (3) ν For example, if the sets X and Y have one member each, x ∈ X and y ∈ Y where x ∈ {0, 1, . . . , k}, and maxν |μν | = 1 then (3) can be written as the following sum of block-diagonal matrices (zero elements not written) ⎛ ⎛ ⎞ ⎞ p(0, y) Aν,0 ⎜p(1, y)⎟ ⎟ ⎜ Aν,1 ∂p ⎜ ⎜ ⎟ ⎟ p = ⎜ . ⎟, = ⎜ ⎟p . .. ∂t ⎝ .. ⎠ ⎝ ⎠ ν,μν =0 Aν,k p(k, y) ⎛ ⎞ −wν (0, y) Aν,1 + wν (1, y) ⎜ ⎟ −wν (1, y) Aν,2 + wν (2, y) ⎟ ⎜ ⎜ ⎟ .. .. + ⎜ ⎟p . . ⎜ ⎟ ν,μν =1 ⎝ −wν (k − 1, y) Aν,k + wν (k, y)⎠ −wν (k, y) ⎛ ⎞ −wν (0, y) ⎟ ⎜ −wν (1, y) ⎜Aν,0 + wν (0, y) ⎟ + ⎜ ⎟p, . . .. .. ⎝ ⎠ ν,μν =−1 Aν,k−1 + wν (k − 1, y) −wν (k, y) where Aν,i is the FPE operator from (2) for the y-variable and constant x = i. Since negative molecule numbers are nonsense, wν (0, y) = 0 if μν = 1. The numerical boundary at the truncation of the state space at k can be chosen more freely, but here wν (k, y) = 0 if μν = −1, assuming the probability at this boundary is zero. 3.4 Discretization of the Fokker-Planck Equation The FPE is discretized and solved on a grid that is considerably coarser than the state space [4]. A finite volume method (see [18]) is used to discretize the Partial Approximation of the Master Equation 641 state space in the subspace where the master equation is approximated with an FPE. Reflecting barrier boundary conditions [7] are used which prescribe that there is no probability current over the boundary. 3.5 The Stochastic Simulation Algorithm (SSA) The well-known stochastic simulation algorithm (SSA) was proposed by Gillespie in 1976 [10] for simulation of stochastic trajectories through the state space of chemical reaction networks. Since then several improvements, approximations and extensions [2] [9], [12], [16], have been made to the original algorithm that we summarize here. SSA can be used to solve the unapproximated master equation and is relevant for comparison to the FPE-method with respect to accuracy and efficiency. SSA was not designed for the purpose of solving the CME, which at the time seemed “virtually intractable, both analytically and numerically” [10]. That statement is not out of date for the vast majority of CME problems, but numerical solution for small reaction networks has become feasible. Define the system as in Definition 1. The probability that the next reaction is fired in the interval (t + τ, t + τ + δt) and is of type Rλ is p(τ, λ)δt = wλ exp(− M wν τ ). ν SSA samples p(τ, λ) in order to take a statistically correct Monte Carlo step. The algorithm for generating a trajectory is: 1. Initialize the state x, compute reaction propensities wν (x), ν = 1 . . . M and set t = 0. 2. Generate a sample (τ, λ) from the distribution p(τ, λ) 3. Increase time by τ and change the state according to reaction Rλ 4. Store population and check for termination, otherwise go to step 2 By simulating an ensemble of trajectories the state probability distribution can be estimated and a solution to the master equation is obtained. The error of SSA can be estimated in a statistical sense. With a certain probability the error is within the prescribed error bound [18]. 3.6 Computational Efficiency The convergence rate of solution by the FPE approximation is derived in [18]. For a certain error the computational work for the SSA is WSSA () = CSSA −2 , while the work for the FPE is WF P E () = CF P E −( r + s ) , N 1 642 P. Sjöberg where CSSA and CF P E are independent of , N is the dimension of the problem and r and s are the order of accuracy of the space- and time-discretizations, respectively. For some systems, numerical solution of the FPE approximation can be much more efficient than SSA [18]. If, like here, second order accurate discretizations schemes are used in time and space, less work is needed for FPE when N < 3 and a small . For the hybrid method N is the number of variables in Y . High order schemes have a great potential of allowing solution of larger problems, but the dimensionality of the feasible problems will still be low compared to the actual number of dimensions in most molecular biological models. There are no principal impediments for using higher order schemes. 4 An Example Problem To demonstrate a problem where the partial approximation is useful, consider a gene which is regulated directly by its product. The gene products binds cooperatively at two sites, S1 and S2 in the regulatory region of the gene. There are five molecular species in the model (notation within parenthesis): the gene product a metabolite (M ), the gene with S1 and S2 unoccupied (DNA), the gene with S1 occupied by an M -molecule (DNAM ), the gene with both S1 and S2 occupied by M -molecules (DNA2M ) and mRNA (RNA). The number of molecules is denoted by the token for the corresponding species in lowercase characters, i.e. m, dna, dnaM , dna2M and rna. Figure 1 shows the reactions of the model. A metabo- M RN A M M S1 S2 M M M MM S1 S2 S1 S2 Fig. 1. The reaction scheme of the example lite bound to S1 activates transcription while a metabolite bound to S2 blocks RNA polymerase and shuts down mRNA production. Cooperativity is necessary for metabolite binding to S2 and so strong that a metabolite bound to S1 will not Partial Approximation of the Master Equation 643 unbind while S2 is occupied. The metabolite and mRNA are actively degraded. This example is a simplified version of the transcription regulation example in [13]. No reactions change the number of gene copies. There are exactly two copies during the entire simulation, i.e. the probability for dna + dnaM + dna2M = 2 is 1. Therefore the state space is actually a four-dimensional surface in the five-dimensional state space and we can reduce the state description to x̃ = (dna, dnaM , rna, m)T and substitute dna2M = 2 − dna − dnaM in the propensity functions. Table 1 lists the reactions of the reduced model. The symbol ∅ denotes that no molecules are created in the reaction. Table 1. The reactions of the example system Reaction Stoichiometric Coeff. Propensity w1 RNA −−→ RNA + M (0, 0, 0, 1)T w1 = 0.05 · rna w2 M −−→ ∅ (0, 0, 0, −1)T w2 = 0.001 · m w3 DNAM −−→ RNA + DNAM (0, 0, 1, 0)T w3 = 0.1 · dnaM w4 RNA −−→ 0 (0, 0, −1, 0)T w4 = 0.005 · rna w5 DNA + M −−→ DNAM (−1, 1, 0, −1)T w5 = 0.02 · dna · m w6 DNAM −−→ DNA + M (1, −1, 0, 1)T w6 = 0.5 · dna w7 DNAM + M −−→ DNA2M (0, −1, 0, −1)T w7 = 2 · 10−4 · dnaM · m w8 T DNA2M −−→ DNAM + M (0, 1, 0, 1) w8 = 1 · 10−11 · (2 − dna − dnaM ) 5 Experiments The example in Section 4 was simulated for 35 minutes (approximately one cell generation time) using SSA and the hybrid approach. The two variables dna and dnaM were represented by the CME part (X) and rna and m were represented by the FPE part of the hybrid method. The FPE part (Y ) was discretized in the rna × m-plane using 20 × 60 cells of equal length in the m-dimension and of increasing length in the RNA-dimension. Let hk be the length of the k:th cell in the RN A-dimension. The step lengths were determined by hk = (1 + θ) hk−1 , i = 2 . . . 20 and h1 = 0.5. For time integration of (3) the implict second order backward differentiation formula scheme (BDF-2) [14] was used. The time step was chosen adaptively according to [15] with an error tolerance, measured in the L1 -norm, of one percent in each time step. The system of equations that arise in each time step in BDF-2 is solved using BiCGSTAB [19]. The implicit method is suitable for handling the ubiquitous stiffness of molecular biological systems. As initial data, a normal distribution in the rna × m-plane was truncated at the boundaries of the computational domain and rescaled. The mean E(x) and variance V (x) of the normal distribution before truncation was E(x) = (1, 5)T and V (x) = (1, 1)T . The probability for being in a state where dna = 2 and dnaM = 0 was set to 1. For SSA the simulation was used to compute mean and variance, but not an approximation of the probability density function. The initial state of each 644 P. Sjöberg trajectory was sampled from the initial distribution that was used for the CMEFPE-hybrid simulation. 5.1 Results Due to numerical approximation errors the solution is slightly negative at some parts of the state space. In order to calculate the mean and standard deviation those negative values are set to zero. Figure 2 shows the means and standard deviations for the hybrid method compared to what is obtained by SSA using 104 trajectories. Figure 3 shows a projection of the probability density function. standard dev. mean value 2 1.5 1 0.5 0 0 10 20 2 1.5 1 0.5 0 0 30 35 time (min) standard dev. mean value 0.8 0.6 0.4 0.2 0 0 10 20 0.4 0.2 0 0 standard dev. mean value 6 4 2 6 4 2 standard dev. mean value 10 20 30 35 time (min) 150 100 50 20 30 35 8 0 0 30 35 200 10 20 10 time (min) 0 0 10 time (min) 8 20 30 35 0.6 30 35 10 10 20 0.8 time (min) 0 0 10 time (min) 30 35 time (min) 200 150 100 50 0 0 10 20 30 35 time (min) Fig. 2. Mean values (left) and standard deviations (right) for, from top to bottom, dna, dnaM , rna and m. The hybrid solution (solid line) and the SSA solution (broken line) is plotted for each case. Isolines of p̃(rna, m) = p(dna, dnaM , rna, m) dna,dnaM are plotted at four different times as indicated in the figure. Partial Approximation of the Master Equation t = 1.1 min 645 t = 4.1 min 20 20 rna 25 rna 25 15 15 10 10 5 5 50 100 150 200 250 300 m t = 7.3 min 350 400 20 20 100 50 100 150 200 150 200 250 300 350 400 250 300 350 400 m t = 23 min rna 25 rna 25 50 15 15 10 10 5 5 50 100 150 200 m 250 300 350 400 m Fig. 3. Snapshots of the numerical solution projected on the rna × m-plane 6 Conclusion The splitting of the state space in two subspaces where one subspace is suitable for approximation with the chemical master equation and one is not, extends the range of problems that can solved numerically using the FPE approximation of the master equation. The main reason for introducing this hybrid method is that the FPE often is a good approximation for some, but not all, dimensions in the state space. Typically genes and gene configurations will not be well approximated. Such dimensions need to be resolved by more grid points than the actual number of states in that dimension. Furthermore the error in the FPE approximation is bounded by the third derivatives [18]. For low copy numbers the probability of a single dominating state is high resulting in probability peaks that cannot be well approximated due to large third derivatives. There is also a point in avoiding approximations when they are not needed. Acknowledgements I want to thank Per Lötstedt and Johan Elf for helpful discussions. This work was funded by the Swedish Research Council, the Swedish National Graduate School in Scientific Computing and the Swedish Foundation for Strategic Research. 646 P. Sjöberg References 1. Blake, W.J., Kærn, M., Cantor, C.R., Collins, J.J.: Noise in eukaryotic gene expression. Nature 422, 633–637 (2003) 2. Bratsun, D., Volfson, D., Tsimring, L.S., Hasty, J.: Delay-induced stochastic oscillations in gene regulation. Proc. Natl. Acad. Sci. USA 102(41), 14593–14598 (2005) 3. Cai, L., Friedman, N., Xie, X.S.: Stochastic protein expression in individual cells at the single molecule level. Nature 440, 358–362 (2006) 4. Elf, J., Lötstedt, P., Sjöberg, P.: Problems of high dimensionality in molecular biology. In: Hackbusch, W. (ed.) High-dimensional problems - Numerical treatment and applications, Proceedings of the 19th GAMM-Seminar, Leipzig, pp. 21–30 (2003), available at http://www.mis.mpg.de/conferences/gamm/2003/ 5. Elf, J., Paulsson, J., Berg, O.G., Ehrenberg, M.: Near-critical phenomena in intracellular metabolite pools. Biophys. J. 84, 154–170 (2003) 6. Elowitz, M.B., Levine, A.J., Siggia, E.D., Swain, P.S.: Stochastic gene expression in a single cell. Science 297, 1183–1186 (2002) 7. Gardiner, C.W.: Handbook of Stochastic Methods, 2nd edn. Springer, Heidelberg (1985) 8. Gardner, T.S., Cantor, C.R., Collins, J.J.: Construction of a genetic toggle switch in Escherichia coli. Nature 403, 339–342 (2000) 9. Gibson, M.A., Bruck, J.: Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. 104, 1876–1889 (2000) 10. Gillespie, D.T.: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976) 11. Gillespie, D.T.: A rigorous derivation of the chemical master equation. Physica A 188, 404–425 (1992) 12. Gillespie, D.T.: Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 115(4), 1716 (2001) 13. Goutsias, J.: Quasiequilibrium approximation of fast reaction kinetics in stochastic biochemical systems. J. Chem. Phys. 184102 (2005) 14. Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations, Nonstiff Problems, 2nd edn. Springer, Heidelberg (1993) 15. Lötstedt, P., Söderberg, S., Ramage, A., Hemmingsson-Frändén, L.: Implicit solution of hyperbolic equations with space-time adaptivity. BIT 42, 134–158 (2002) 16. Lu, T., Volfson, D., Tsimring, L., Hasty, J.: Cellular growth and division in the Gillespie algorithm. Syst. Biol. 1, 121–128 (2004) 17. McQuarrie, D.A.: Stochastic approach to chemical kinetics. J. Appl. Prob. 4, 413– 478 (1967) 18. Sjöberg, P., Lötstedt, P., Elf, J.: Fokker-Planck approximation of the master equation in molecular biology. Comput. Visual. Sci. 2006 (accepted for publication) 19. van der Vorst, H.A.: BiCGSTAB: A fast and smoothly converging variant of the Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci and Stat. Comp 13, 631–644 (1992) 20. van Kampen, N.G.: Stochastic Processes in Physics and Chemistry, 2nd edn. Elsevier, Amsterdam (1992) 21. Yu, J., Xiao, J., Ren, X., Lao, K., Xie, X.S.: Probing gene expression in live cells, one protein molecule at a time. Science 311, 1600 (2006)