Stochastic Gene Expression: Modeling, Analysis, and Identification Mustafa Khammash io at n C en te r r fo y mic na m nd sa CC DC Control, D Comp ut University of California, Santa Barbara al Sys te Munsky; q-bio Stochastic Influences on Phenotype Fingerprints of identical twins Cc, the first cloned cat and her genetic mother J. Raser and E. O’Shea, Science, 1995. J. Raser and E. O’Shea, Science, 1995. gen variability in gene expression Piliated Elowitz et al, Science 2002 gen gen .. Unpiliated gen Modeling Gene Expression γp φ protein kp γr mRNA kr DNA φ Modeling Gene Expression Deterministic model γp φ protein kp γr mRNA kr DNA φ Modeling Gene Expression γp φ protein kp γr mRNA kr DNA φ Modeling Gene Expression Stochastic model γp φ protein • Probability a single mRNA is degraded in time dt is (#mRN A) · γr dt kp γr mRNA kr DNA • Probability a single mRNA is transcribed in time dt is kr dt. φ Modeling Gene Expression Stochastic model γp φ protein • Probability a single mRNA is transcribed in time dt is kr dt. • Probability a single mRNA is degraded in time dt is (#mRN A) · γr dt kp γr mRNA φ kr DNA ... γp φ protein Protein Molecules Fluctuations at Small Copy Numbers 800 600 400 200 0 0 kp DNA φ mRNA Molecules kr 2000 3000 4000 5000 Time (s) γr mRNA 1000 25 20 15 10 5 0 0 1000 2000 3000 Time (s) 4000 5000 Fluctuations at Small Copy Numbers γp φ (protein) protein kp γr mRNA kr DNA (mRNA) φ Cv = coefficient of variation = standard deviation mean Mass-Action Models Are Inadequate 400 Number of Molecules of P 350 stochastic 300 250 200 deterministic 150 100 50 0 10 20 30 40 50 60 70 80 90 Time ks reduced by 50% • Stochastic mean value different from deterministic steady state • Noise enhances signal! Johan Paulsson , Otto G. Berg , and Måns Ehrenberg, PNAS 2000 100 Stochastic Modeling: A Simple Example γ mRNA φ Transcription: Probability a single mRNA is transcribed in time dt is kr dt k Degradation: Probability a single mRNA is degraded in time dt is nγdt DNA k N mRNA copy number N (t) is a random variable 0 k 1 γ k ..... 2 2γ k 3γ k n n−1 (n − 1)γ k nγ n+1 (n + 1)γ ..... k k k k k k Key Question: 0 1 γ . 2 2γ 3γ n n−1 (n − 1)γ nγ n+1 . (n + 1)γ Find p(n, t), the probability that N (t) = n. P (n, t + dt) = P (n − 1, t) · kdt + P (n + 1, t) · (n + 1)γdt Prob.{N (t) = n − 1 and mRNA created in [t,t+dt)} Prob.{N (t) = n + 1 and mRNA degraded in [t,t+dt)} + P (n, t) · (1 − kdt)(1 − nγdt) Prob.{N (t) = n and mRNA not created nor degraded in [t,t+dt)} P (n, t + dt) − P (n, t) = P (n − 1, t)kdt + P (n + 1, t)(n + 1)γdt − P (n, t)(k + nγ)dt +O(dt2) Dividing by dt and taking the limit as dt → 0 The Chemical Master Equation d P (n, t) = kP (n − 1, t) + (n + 1)γP (n + 1, t) − (k + nγ)P (n, t) dt mRNA Stationary Distribution We look for the stationary distribution P (n, t) = p(n) ∀t d P (n, t) = 0 The stationary solution satisfies: dt From the Master Equation ... (k + nγ)p(n) = kp(n − 1) + (n + 1)γp(n + 1) n=0 kp(0) = γp(1) n=1 kp(1) = 2γp(2) n=2 kp(2) = 3γp(3) ... kp(n − 1) = nγ p(n) Poisson, a = 3 !#&$" probability !#&" !#%$" !#%" !#!$" !" 0 1 2 3 4 5 6 7 8 9 mRNA count (n) Stationary distribution: n a P (n) = e−a n! k a= γ Poisson Distribution Formulation of Stochastic Chemical Kinetics Reaction volume=Ω Formulation of Stochastic Chemical Kinetics Reaction volume=Ω Key Assumptions (Well-Mixed) The probability of finding any molecule in a region dΩ is given by dΩ Ω. Formulation of Stochastic Chemical Kinetics Reaction volume=Ω Key Assumptions (Well-Mixed) The probability of finding any molecule in a region dΩ is given by dΩ Ω. (Thermal Equilibrium) The molecules move due to the thermal energy. The reaction volume is at a constant temperature T . The velocity of a molecule is determined according to a Boltzman distribution: fvx (v) = fvy (v) = fvz (v) = ! m − 2km T v 2 B e 2πkB T Population: X(t) = [X1(t), . . . , XN (t)]T (integer r.v.) • (M -reactions) The system’s state can change through any one of M reaction: Rµ : µ ∈ {1, 2, . . . , M }.. population of S2 Example: R1 φ → S1 R2 S1 + S 2 → S1 S1 → φ R3 1 2 3 6 4 5 7 8 • (State transition) An Rµ reaction causes a state transition from x to x + sµ. s1 = ! " 1 ; s2 = 0 ! 0 −1 " ; s3 = ! Stoichiometry matrix: S= ! s1 s2 . . . s M population of S1 • (Transition Probability) Probability that Rµ reaction will occur in the next dt time units is: wµ(x)dt Example: w1(x) = c1; w2(x) = c2 · x1x2; w3(x) = c3x1; " −1 0 " Characterizing X(t) X(t) is Continuous-time discrete-state Markov Chain Sample Path Representation: X(t) = X(0) + M ! k=1 sk Yk "# t 0 wk (X(s))ds $ Yk [·] are independent unit Poisson The Chemical Master Equation (Forward Kolmogorov Equation) ! ! dp(x, t) wk (x) + p(x − sk , t)wk (x) = −p(x, t) dt k k p(x, t) := prob(X(t) = x) From Stochastic to Deterministic Define X Ω(t) = X(t) Ω . Question: How does X Ω(t) relate to Φ(t)? Fact: Let Φ(t) be the deterministic solution to the reaction rate equations dΦ = Sf (Φ), Φ(0) = Φ0. dt Let X Ω(t) be the stochastic representation of the same chemical systems with X Ω(0) = Φ0. Then for every t ≥ 0: ! ! ! Ω ! lim sup !X (s) − Φ(s)! = 0 a.s. Ω→∞ s≤t Simulation and Analysis Tools • • • • Sample Paths Computations Moment Computation SDE Approximation Density Computations 1. Sample Paths Computation Gillespie’s Stochastic Simulation Algorithm: To each of the reactions {R1 , . . . , RM } we associate a RV τi : τi is the time to the next firing of reaction Ri Fact 0: τi is exponentially distributed with parameter wi We define two new RVs: τ = min{τi } i µ = arg min{τi} i (Time to the next reaction) (Index of the next reaction) Fact 1: τ is exponentially distributed with parameter wk Fact 2: P (µ = k) = ! w i i ! i wi Stochastic Simulation Algorithm • Step 0 Initialize time t and state population x • Step 1 Draw a sample τ from the distribution of τ 1 Cumulative distribution of τ : F (t) = 1 − exp(− r1 ∈ U ([0, 1]) ! k wk t) 1 τ = ! 1w log 1−r 1 k k 0 time (s) • Step 2 Draw a sample µ from the distribution of µ Cumulative distribution of µ 1 ! (w1 + w2 + w3 + w4)/ k wk ! (w1 + w2 + w3)/ k wk r2 ∈ U ([0, 1]) µ (w1 + w2)/ w1 / 0 1 2 3 4 reaction index 5 • Step 3 Update time: t ← t + τ . Update state: x ← x + sµ . ! k wk ! k wk Stochastic Simulation Algorithm: Matlab code clear all t=0;tstop = 2000; x = [0; 0]; S = [1 -1 0 0; 0 0 1 -1]; w = inline('[10, 1*x(1), 10*x(1), 1*x(2)]','x'); while t<tstop a = w(x); w0 = sum(a); % t = t+1/w0*log(1/rand); if t<=tstop r2w0=rand*w0; % generate second i=1; while sum(a(1:i))<r2w0 % increment i=i+1; end x = x+S(:,i); end end %%specify initial and final times %% Specify initial conditions %% Specify stoichiometry %% Specify Propensity functions % compute the prop. functions compute the sum of the prop. functions % update time of next reaction random number and multiply by prop. sum % initialize reaction counter counter until sum(a(1:i)) exceeds r2w0 % update the configuration 2. Moment Computations Let w(x) = [w1 (x), . . . , wM (x)]T be the vector of propensity functions Moment Dynamics dE[X] = S E[w(X)] dt dE[XX T ] = SE[w(X)X T ] + E[XwT (X)]S T + S diag(E[w(X)]) S T dt • Affine propensity. Closed moment equations. • Quadratic propensity. Not generally closed. – Mass Fluctuation Kinetics (Gomez-Uribe, Verghese) – Derivative Matching (Singh, Hespanha) Affine Propensity Suppose the propensity function is affine: w(x) = W x + w0, (W is N × N , w0 is N × 1) Then E[w(X)] = W E[X]+w0, and E[w(X)X T ] = W E[XX T ]+w0E[X T ]. This gives us the moment equations: d E[X] = SW E[X] + Sw0 dt First Moment d E[XX T ] = SW E[XX T ] + E[XX T ]W T S T + S diag(W E[X] + w0)S T dt + Sw0E[X T ] + E[X]w0T S T Second Moment These are linear ordinary differential equations and can be easily solved! Application to Gene Expression Reactants X1(t) is # of mRNA; X2(t) is # of protein Reactions γp φ kr R1 : φ −→ mRN A γr R2 : mRN A −→ φ protein kp R3 : mRN A −→ protein + mRN A γp kp γr mRNA kr DNA R4 : protein −→ φ φ Stoichiometry and Propensity " ! 1 −1 0 0 S= 0 0 1 −1 k 0 0 kr r ' ( γ X γ 0 r 1 r 0 X1 w(X) = + = kpX1 kp 0 X2 0 γpX2 0 γp 0 W w0 Steady-State Moments ! " −γr 0 A = SW = , kp −γp ! kr Sw0 = 0 " kr γr X̄ = −A−1Sw0 = k k p r γp γr Steady-State Covariance 2kr 0 2kp kr 0 γr BB T = S diag(W X̄ + w0)S T = The steady-state covariances equation AΣ̄ + Σ̄AT + BB T = 0 can be solved algebraically for Σ̄. Σ̄ = kr γr kp kr γr (γr +γp ) kp kr γr (γr +γp ) kp kr kp (1 + γp γr γr +γp ) Lyapunov Equation 3. SDE Approximation Let X (t) := Ω X(t) Ω Write X Ω = Φ0(t) + √1 V Ω where Φ0(t) solves the deterministic RRE Ω dΦ = Sf (Φ) dt Linear Noise Approximation V Ω(t) → V (t) as Ω → ∞, d[Sf (Φ)] A(t) = (Φ0(t)), dΦ where dV (t) = A(t)V (t)dt + B(t)dWt ! B(t) := S diag[f (Φ0(t))] Linear Noise Approximation: X Ω(t) ≈ Φ(t) + √1 V (t) Ω Linear Noise Approximation: Stationary Case Multiplying X Ω(t) ≈ Φ̄ + √1 V (t) by Ω, we get Ω X(t) ≈ ΩΦ̄ + deterministic √ ΩV (t) zero mean stochastic E[X(t)] = ΩΦ̄ Let Σ̄ be the steady-state covariance matrix of AΣ̄ + Σ̄AT + ΩBB T = 0 √ Ω · V (t). Then (white gaussian noise) ω Ẏ = AY + √ ΩB ω Y (t) = √ ΩV (t) X(t) Ωφ̄ (mean) + 4. Density Computation Form the probability density state vector The Chemical Master Equation (CME): can now be written in matrix form: : The Finite State Projection Approach The Finite State Projection Approach • A finite subset is appropriately chosen The Finite State Projection Approach • A finite subset is appropriately chosen • The remaining (infinite) states are projected onto a single state (red) The Finite State Projection Approach • A finite subset is appropriately chosen • The remaining (infinite) states are projected onto a single state (red) • Only transitions into removed states are retained The projected system can be solved exactly! Finite Projection Bounds Theorem [Projection Error Bounds] Consider any Markov process described by the Forward Kolmogorov Equation: Ṗ(XJ ; t) = A · P(XJ ; t). If for an indexing vector J: 1T exp(AJ T )P(XJ ; 0) ≥ 1 − !, then !" # " #! ! ! P(X ; t) exp(A t) P (X ; 0) ! ! J J J − ! ! <! ! ! P(XJ # ; t) 0 1 Munsky B. and Khammash M., Journal of Chemical Physics, 2006 t ∈ [0, T ] Applications of FSP • • • • Feedback Analysis Synthetic Switch Analysis Epigenetic Switch Analysis System Identification Application: Noise Attenuation through Feedback 0.045 γp γp φ protein φ 0.04 protein kp p γr mRNA feedback 0.03 γr φ mRNA k1 = 0.2 more feedback no feedback k 0.035 φ 0.025 0.02 kr k0 − k1 · (# protein) DNA 0.015 DNA k1 = 0.1 k1 = 0.05 k1 = 0 k1 = −0.05 0.01 0.005 µ∗p = Mean Variance ! " b + 1 µ∗p 1+η = µ∗p Variance ! " 1−φ b · + 1 µ∗p 1 + bφ 1 + η k1 kp γp where φ = , b= , η= γp γr γr 0 0 50 100 γp = γr = 1 <1 Thattai, van Oudenaarden Protein variance is always smaller with negative feedback! 150 kp = 10; 200 Analysis of Stochastic Switchs s2 v Two repressors, u and v. s2 Gene Gardner, et al., Nature 403, 339-342 (2000) s2 Promoter s1 Promoter s1 u s1 Gene Analysis of Stochastic Switchs s2 v Two repressors, u and v. s2 Gene Gardner, et al., Nature 403, 339-342 (2000) s2 Promoter s1 Promoter s1 u v inhibits the production !of u:" 1 0 α1 a1 (u, v) = ν1 = β 1+v u inhibits the production of v: α2 a3 (u, v) = 1 + uγ ν3 = ! 0 1 " u and v degrade exponentially: a2 (u, v) = u ν2 = ! a4 (u, v) = v ν4 = ! −1 0 " 0 −1 " s1 Gene Analysis of Stochastic Switchs s2 v Two repressors, u and v. s2 Gene Gardner, et al., Nature 403, 339-342 (2000) s2 Promoter s1 Promoter s1 Gene s1 u v inhibits the production !of u:" 1 0 α1 a1 (u, v) = ν1 = β 1+v u inhibits the production of v: α2 a3 (u, v) = 1 + uγ ν3 = ! 0 1 " u and v degrade exponentially: a2 (u, v) = u ν2 = ! a4 (u, v) = v ν4 = ! −1 0 " 0 −1 " α1 = 50 β = 2.5 α2 = 16 γ=1 u(0) = v(0) = 0 Modeling of a DAM Epigenetic Switch using FSP Presented section contains unpublished data and is not included in the online version Using Noise to Identify Model Parameters Presented section contains unpublished data and is not included in the online version Conclusions Conclusions • Fluctuations may be very important Conclusions • Fluctuations may be very important • Cell variability Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) • Finite State Projection Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) • Finite State Projection • Cellular noise reveals network parameters and enables model identification Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) • Finite State Projection • Cellular noise reveals network parameters and enables model identification • Stationary moments are not sufficient for full identifiability Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) • Finite State Projection • Cellular noise reveals network parameters and enables model identification • Stationary moments are not sufficient for full identifiability • Small number of transient measurements of noise is sufficient for identifiability Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) • Finite State Projection • Cellular noise reveals network parameters and enables model identification • Stationary moments are not sufficient for full identifiability • Small number of transient measurements of noise is sufficient for identifiability • Finite State Projection allows the use of master equation solution for identification Conclusions • Fluctuations may be very important • Cell variability • Cell fate decisions • Some tools are available • Monte Carlo simulations (SSA and variants) • Moment approximation methods • Linear noise approximation (Van Kampen) • Finite State Projection • Cellular noise reveals network parameters and enables model identification • Stationary moments are not sufficient for full identifiability • Small number of transient measurements of noise is sufficient for identifiability • Finite State Projection allows the use of master equation solution for identification • Cellular noise (process noise) vs. measurement noise (output noise) Acknowledgement • Brian Munsky, UCSB (now at LANL) FSP, Pap Switch, and ID with Noise • David Low, UCSB Pap Switch • Brooke Trinh, UCSB Pap Switch, ID with Noise