Chapter 1 Network Dynamics for Systems Biology Eric Mjolsness Institute for Genomics and Bioinformatics, and Department of Computer Science University of California, Irvine emj@uci.edu Abstract Networks in biological systems form a complex graph structure in which dynamics is local. This applies to the dynamics of both graph node variables, such as concentrations, and graph link variables, such as evolving or cell-state-specific regulatory relationships. Unlike most models of physical systems, the forms of the equations commonly used to model cellular network dynamics are very diverse. This is due to diversity both in biological mechanisms, e.g. those associated with membership in metabolic, signaling, transcriptional or mechanical networks, and in the common modeling approaches to each mechanism. Plausible models for these networks can be classified according to (a) their network structure and (b) their choice of node dynamics for each major biological mechanism. Such a taxonomy can be important for example in representing signal transduction networks involving the interaction of protein complex and transcriptional regulation networks. 1. Introduction A regulatory network can be formalized as a labeled bipartite graph, in which “reaction” nodes alternate with molecule or “reactant” nodes to which they are connected by directed links. Both types of nodes, as well as the role membership directed links, have a hierarchy of possible types (e.g. phosphorylation reactions, performed by kinase molecules, playing the role of enzyme) as well as an individual identity. One advantage to formalizing regulatory networks as labeled graphs is that simulation software design can follow a well-defined schema based on such a formalization. Another is that one can define probability distributions on the graphs for use in machine learning via statistical inference. Plausible models for biological regulatory networks can be classified according to (a) their network structure and (b) their choice of node dynamics for each major biological mechanism. 2. Node Dynamics Two major classifications are possible for local regulatory dynamics at a reactant or reaction node: according to biological mechanism, and according to mathematical model. Neither is a refinement of the other. For example, single-substrate catalysis can be modeled with mass action dynamics or with the Michaelis-Menten approximation. The mass action catalytic model can in turn be used as a simplified model of a wide variety of mechanisms, from protein state modifications such as phosphorylation to receptormediated transport across membranes. This basic duality between biological and mathematical reaction hierarchies, and the need for explicit mappings between them, may be reflected in class hierarchies for systems biology modeling software such as Cellerator [Shapiro et al. 2003] and a forthcoming pathway modeling database “Sigmoid” which marshals regulatory interactions for use in Cellerator. In view of this distinction, network dynamic models can be built by first specifying the biological mechanism involved in a reaction that enters into the dynamics for a node, and then choosing one of a limited set of plausible mathematical models for that mechanism. The libraries of known biological mechanisms and mathematical reaction models can each be put into a specialization (inheritance) hierarchy or directed acyclic graph, with a consistent set of cross-hierarchy pointers showing plausible translations from biology to mathematics. Separate biological mechanisms include top-level categories such as transcriptional regulation, metabolic synthesis and degradation reactions, protein state modification including phosphorylation (e.g. in MAPK signal transduction pathway models) and ubiquitination, protein complex aggregation and disaggregation reactions (e.g. in the NFB signal transduction pathway models) A collection of mathematical reaction models that can be used to build node dynamics is provided by many cell simulation packages including Cellerator [Shapiro et al. 2003]. A top level classification of useful enzymatic reaction models alone would include a steadily growing subset of a growing cross product space of alternatives such as: {mass action, Michaelis-Menten style approximations using separation of time scales} x {single-substrate, multi-substrate reactions} x {uninhibited, noncompetitive inhibition, competitive inhibition, uncompetitive inhibition} [Yang et al. 2004] x {allosteric (often modeled with versions of the Monod-Wyman-Changeau (MWC) simplified model e.g. for Threonine Deaminase in E. coli metabolism), nonallosteric} x {elementary reactions, compound reactions made of many sub-reactions} x {deterministic, stochastic i.e. Langevin differential equation, stochastic master equation [Gillespie 1977], particle level simulations as in MCell [Stiles et al. 1996]}. The combinatorial profusion of reasonable modeling alternatives that should be available favors automatic model generation and flexible algebraic model representations such as provided in Cellerator and supported in Systems Biology Markup Language (SBML) Level 2. As an example of this rich space of mathematical reaction models, the MWC model may be generalized to incorporate multiple activators and inhibitors. For enzyme E, substrates {S}, activators {A}, inhibitors {I}, and product P, the Generalized MWC reaction may be denoted E S1, S2 ,... P. A , A ,..., I , I ,... 1 2 1 2 If all concentrations are normalized by their corresponding KM constants, the kinetics for production of the final product may be derived from statistical mechanics and written as dP dt S 1 S 1 A L cS 1 cS 1 I K E n1 k D k k k n k n1 k k k n k k 1 S 1 A L 1 cS 1 I n k k n k n k k k k k n k k (1) On the other hand, for reactions involving multistate complexes with largely unknown internal dynamics, such as eukaryotic transcriptional regulation, a somewhat different hierarchy of modeling alternatives are available [Gibson and Mjolsness 2001] including neural network style phenomenological models, MWC style near-equilibrium statistical mechanics models, and compound reaction schemes built of out of elementary (directional) catalytic reactions. From the mathematical point of view, a key difference is the need to describe the occupation of a very limited number of copies of a particular binding site – perhaps just one or two copies per cell in the case of particular transcription factor binding sites. This can still be done with deterministic modeling of an occupancy probability, if the shorter of (occupancy time interval, unoccupied time interval) is short enough compared to other time scales to be integrated over when its effect on downsteam processes such as transcription are modeled. 3. Network statistics and dynamics Much attention has rightfully been paid to the connectivity statistics of regulatory networks. Examples include degree distributions and the prevalence of network motifs represented as subgraphs [Ziv et al. 2003, Milo et al. 2003]. A wide variety of such network statistics can be represented using Boltzmann distributions defined on adjacency matrices for graphs [Newman 2003]. For example, the energy function governing degree distributions can be taken as Edegree G ia i Gia i f G , a ia G GT , (2) where G is a symmetric adjacency matrix. The resulting statistical system can then be simulated using auxiliary variables, as shown this calculation of the partition function: Z exp i Gij f Gij j j i i G G T G 0,1 1 G G N T i i ij j d L d G 1 G G N T d L d G G G T G 0,1 0 d1 L 0 i i ij j exp i i exp f i i i (3) exp i i 0 d i exp i i i i i d N exp i Gij i Gij i j j i i i Note that this partition function can also be written, using zi exp i , ci exp i , z0 c0 1 , as Z z G G T G 0,1 G G T G 0,1 G G T G 0,1 0 0 0 d1 L dc1 L dc1 L 0 0 0 N Gij d N ci zi j exp i i i 1 N N G dcN ci zi j ij ci i 1 i 1 N G dcN ci zi ij ci c j zi z j i 1 1i j N Gij N ci i 1 (4) N N dc 1 c z 1 c c z z i i i j i j ci 0 0 N i 1 i 1 1i j N N dc1 L dcN 1 ci c j zi z j ci 0 0 0 i j N i 1 dc1 L Thus, we derive the efficiently sampled distribution on G given by independent distributions on ci followed by conditionally independent Bernoulli distributions on elements of G: Z z G G T G 0,1 Pr G 0 dc1 L 0 dc1 L 0 0 dcN ci c j zi z j 0i j N c c Gij i j dcN 0i j N 1 ci c j Gij N ci i 1 N ci i 1 (5) Further progress in learning and exploiting such patterns on graph structure may require disaggregation of the networks and their statistics by reaction and/or reactant high-level node type. Then network graph statistics may be formalized in terms of a Boltzmann distribution on adjacency matrices for the network connections of a given type signature, as well as by course-scale graphs giving overall connectivity probabilities between nodes of different types such as genes, RNAs, proteins, and a combinatorial explosion of complexes and their states [Mjolsness 2004]. Other open frontiers in network connectivity modeling also include the sculpting of effective network topology and dynamics on an intermediate time scale by relatively infrequent but fast switching events in the full network dynamics, and its application to the description of a generally narrowing range of effective regulatory networks during developmental time in a multicellular organism. Molecular machines may be described in terms of highly correlated probability distributions, following the MWC example. Phylogenetic relationships between networks are also open to statistically based dynamic network modeling. 4. Labeled Graph Notations for Regulatory Models The foregoing observations about node and network level dynamics may be formalized with probabilistic models on graph structure and node state variables, described using a concise graph notation [Mjolsness 2004]. For example the modulation of dependency of one random variable xi on another random variable yj by a graph element Gij, whose degree structure is controlled by Edegree of equation (2) above, could be diagrammed as in Figure 1. y x gated Prob j G i EdegreeG Figure 1. Constraint nodes (hexagonal) and index nodes (square) for the sparse graph adjacency matrixvalued random variable G, which gates a conditional dependency between random variables xi and yj . Variable-structure relationships can provide the topology within which node dynamics are local, for example in a recurrent neural network or other feedback-capable model of transcriptional and other biological regulatory networks [Mjolsness Sharp and Reinitz 1991]. One key to this application is to introduce a (boxed) index node “t” for time in an indexed probabilistic diagram for a probabilistic model such as shown in Figure 1. This t index corresponds to loop unrolling in a conventional neural network. We can also introduce internal vector indices a (not shown) for the state variables (e.g. concentrations) v. t' t v v G j i Figure 2. Objects with node dynamics. The state vector v of object j at time t influences the change in state variable v in object i at time t’, if they are connected by graph G. Tangency of G to the conditional dependence arrow from v to v is another way to express the dependency “gating” relationship of Figure 1. The return arrow from v to v allows the change of state sampled at time t to be added into the state vector at time t+1. The resulting graph of dependencies is still a Directed Acyclic Graph. A generic energy function for dynamics, local to the network, can be represented for example as: E C1 Gija uia t F i j v j t , pija t ija C u t 2 2 2 ia t,i,a C3 v i t t v i t/ t gia uia t t i a 2 where G a Gija , (i) represents the type of node i, and a represents a reaction impinging on network player i. The forms of reaction kinetic functions F are diverse and can be generated for example by a cellular model-generation program such as Cellerator [Shapiro et al. 2003]. With implicit graph prior probability distributions, every network diagram like Figure 2 corresponds to a probability distribution over networks including their dynamics. Large amounts of data mapping cell states to changes in state variables, i.e. time derivatives, may be collected and fit to such models. The data can include zero derivatives observed in stationary states. Expected temporal behaviors of high probability dynamical networks in such a distribution could be characterized in many ways, including their approximability by simpler networks including but not limited to those exhibiting attractor dynamics such as fixed points and limit cycles. 5. Discussion Network dynamics for biological regulatory networks are richer than current models allow, due to the variety of different types for object nodes (molecules and other reaction participants) and process nodes (for example reactions). Generative probabilistic models can be developed to express the richness of such networks both in their local node dynamics and in their connection patterns. Acknowledgements Thanks for discussions with Chris Hart, Barbara Wold, Pierre Baldi, Chin-Rang Yang, Bruce Shapiro, and Lucas Scharenbroich. Support was provided by the National Institute for General Medical Sciences BISTI program, grant number GM069013. References Michael Gibson and Eric Mjolsness, “Modeling the Activity of Single Genes”, in Computational Methods in Molecular Biology, eds. J. M. Bower and H. Bolouri, MIT Press 2001. D. T. Gillespie. “Exact stochastic simulation of coupled chemical reactions”. J. Phys. Chem, 81:2340-2361, 1977. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. “Network motifs: simple building blocks of complex networks.” Science. Oct 25;298(5594):824-7, 2002. Eric Mjolsness, “Labeled Graph Notations for Graphical Models: Extended Report”, UCI ICS Technical Report #04-03, March 2004. M. E. J. Newman “The structure and function of complex networks”. SIAM Review 45, 167-256 (2003). Bruce E. Shapiro, Andre Levchenko, Elliot M. Meyerowitz, Barbara J. Wold , and Eric D. Mjolsness Cellerator: extending a computer algebra system to include biochemical arrows for signal transduction simulations. Bioinformatics vol. 19 no. 5, pages 677–678, 2003. Stiles J.R., Van Helden L., Bartol T.M., Salpeter E.E., Salpeter M.M., “Miniature endplate current rise times less than 100 microseconds from improved dual recordings can be modeled with passive acetylcholine diffusion from a synaptic vesicle” PNAS June 11; 93(12): 5747-5752, 1996. Etay Ziv, Robin Koytcheff, and Chris Wiggins, “Novel systematic discovery of statistically significant network features” Preprint, Arxiv Condensed Matter, abstract cond-mat/0306610.