Multistate Modeling and Simulation for Regulatory Networks Zhen Liu, Clifford A. Shaffer, Umme Juka Mobassera, Layne T. Watson, and Yang Cao Department of Computer Science Program in Genetics, Bioinformatics, and Computational Biology Virginia Tech Goal: Modeling the Cell Cycle (John Tyson) G1 S DNA replication M (mitosis) G2 Regulatory Network Modeling Model using a series of chemical reactions. The actors are proteins (“chemical species”) whose interaction rates are modeled by rate laws Species are created, consumed, combined Populations can rise and fall, under the control of other species Loops and cycles Decomposition of Models Modelers find it natural to divide into “bundles” of reactions. Multistate Phosphorylation Motif Blocks relate to naturally occurring motifs Example: antagonistic interaction between Clb2 and Cdh1, with Cdc14 as the control variable driving phosphorylation of Cdh1 Forms a bi-stable switch Multistate Version The reality is more complex, as a protein can undergo multiple levels of phosphorylation, which can affect the behavior of the larger system Multistate Modeling Equations on chemical species with multiple states, related in some meaningful way Expressing as single-state equations would require dozens of reactions. JigCell Model Builder Support Problems Complications arise from the potential combinatorial explosion of states in complexes Example: Two multistate species each with 10 states could form complexes with potentially 100 states. A{i} + B{j} -> AB{i,j} This presents challenges to simulation. Stochastic Simulation Reaction models have often been modeled using ODEs Track concentrations of chemical species ODE models cannot account for stochastic effects Small numbers for some species (RNA) Variations in inputs => Differing outputs Simulation ensemble => Distribution Gillespie’s SSA (1) N molecular species {S1, …, SN}. M reaction channels {R1, … RM}. For reaction channel Rj: Propensity function aj State change vector vj = (v1,j, …, vN,j) aj(x)dt gives probability that one Rj reaction will occur in next infinitesimal time interval given state vector x. Gillespie’s SSA (2) Select two random numbers r1 and r2 Let a0(x) be the sum for all the reaction propensities on state vector x. Time for next reaction to occur is t + t t = 1/a0(x) log (1/r1). Gillespie’s SSA (3) Index j for next reaction is given by smallest integer satisfying S al(x) > r2a0(x). System state updated after each reaction, including populations and propensities Observations: A population-based simulation SSA calculates propensities for reactions Rule-Based Modeling A rule defines how a molecular particle reacts with other particles k Aopen,?,? + B ---> AB,?,? Subscripts describe the matching configurations for binding sites Convenient for representation Updating propensities of rules faster(?) than updating propensities of reactions rule Network-Free Algorithm (1) (Sneddon et al. 2008) Alternative to turning rules into collections of reactions and performing SSA. Conceptually similar to SSA, but Calculate propensities for rules. Particle based (not population based) Keep list of particles associated with each rule Network-Free Algorithm (2) Simulation loop: Calculate propensity for each rule (cheaper than SSA) Calculate rule and time of next event Select particles from associated list Update the particle lists as necessary (major expense) Population-Based NFA (PNFA) (Our first contribution) Modification to NFA: (go back to) using populations for single-state species Hybrid particle/population approach Attempts to cut down on the size of the lists associated with the rules Can be viewed as an optimization to NFA at worst degrades to NFA Full-Scale SSA (FSSSA) (1) (Our second contribution) Use populations even for multi-state species Should work well unless there is a small population spread across many states Can view as more direct conversion of SSA to rules (pure population-based approach) Full-Scale SSA (FSSSA) (2) For each species, store an array of populations (one for each state) Might be a sparse array Store with each rule the population count for all associated reactants Full-Scale SSA (FSSSA) (3) Simulation loop: Calculate propensity for each rule (cheaper than SSA) Calculate rule and time of next event Select a state for each reactant from the population array Update populations of affected species (states) and population counts for associated rules (might require modifying arrays) Comparisons: Selection SSA does linear search through reactions NFA, PNFA do linear search through rules, then select qualifying objects from associated reactant lists FSSSA does linear search through rules, only needs to search state lists (populations) Comparisons: Update SSA updates populations of some reaction’s reactants and products NFA must create/destroy molecule objects, and update associate rule lists PNFA same, but does little work on singlestate species populations FSSSA updates sparse matrix info. Bi-stable Switch Model Reaction-based form: 12 species 44 reactions Rule-based form: 1 single-state species, 1 multi-state 7 rules Non-zero populations in each state Simulation Times: Switch Total CPU Propensity Reactant Time Update Selection System Update Other SSA 115 72.0 30.6 5.3 7.1 NFA 341 11.1 34.0 286.0 9.9 PNFA 246 9.9 26.2 200.8 9.1 FSSSA 117 9.2 32.4 66.2 9.2 Cell Cycle Model Reaction-based form: 58 species, 185 reactions Rule-based form: 17 single-state species, 6 multi-state 64 rules Half the states have zero population Observation: Affecting one multi-state species affects only a smaller fraction of all the rules Simulation Times: Cell Cycle Total CPU Propensity Reactant Time Update Selection System Update Other SSA 171 143.3 23.5 1.4 2.8 NFA 133 36.4 20.4 72.5 3.7 PNFA 113 34.0 17.6 58.6 2.8 FSSSA 64 32.8 18.2 10.5 2.5 Simulation Quality (1) Simulation Quality (2) This graph shows distribution of population for Clb2, one of the species in the cell cycle model. The significance is that it indicates that each simulation algorithm gives approximately the same ensemble of outputs. Complexity Analysis