Network Dynamics for Systems Biology Chapter 1

advertisement
Chapter 1
Network Dynamics for Systems Biology
Eric Mjolsness
Institute for Genomics and Bioinformatics, and
Department of Computer Science
University of California, Irvine
emj@uci.edu
Abstract
Networks in biological systems form a complex graph structure in which dynamics is
local. This applies to the dynamics of both graph node variables, such as concentrations,
and graph link variables, such as evolving or cell-state-specific regulatory relationships.
Unlike most models of physical systems, the forms of the equations commonly used to
model cellular network dynamics are very diverse. This is due to diversity both in
biological mechanisms, e.g. those associated with membership in metabolic, signaling,
transcriptional or mechanical networks, and in the common modeling approaches to each
mechanism. Plausible models for these networks can be classified according to (a) their
network structure and (b) their choice of node dynamics for each major biological
mechanism. Such a taxonomy can be important for example in representing signal
transduction networks involving the interaction of protein complex and transcriptional
regulation networks.
1. Introduction
A regulatory network can be formalized as a labeled bipartite graph, in which “reaction”
nodes alternate with molecule or “reactant” nodes to which they are connected by
directed links. Both types of nodes, as well as the role membership directed links, have a
hierarchy of possible types (e.g. phosphorylation reactions, performed by kinase
molecules, playing the role of enzyme) as well as an individual identity. One advantage
to formalizing regulatory networks as labeled graphs is that simulation software design
can follow a well-defined schema based on such a formalization. Another is that one can
define probability distributions on the graphs for use in machine learning via statistical
inference. Plausible models for biological regulatory networks can be classified
according to (a) their network structure and (b) their choice of node dynamics for each
major biological mechanism.
2. Node Dynamics
Two major classifications are possible for local regulatory dynamics at a reactant or
reaction node: according to biological mechanism, and according to mathematical model.
Neither is a refinement of the other. For example, single-substrate catalysis can be
modeled with mass action dynamics or with the Michaelis-Menten approximation. The
mass action catalytic model can in turn be used as a simplified model of a wide variety of
mechanisms, from protein state modifications such as phosphorylation to receptormediated transport across membranes. This basic duality between biological and
mathematical reaction hierarchies, and the need for explicit mappings between them, may
be reflected in class hierarchies for systems biology modeling software such as Cellerator
[Shapiro et al. 2003] and a forthcoming pathway modeling database “Sigmoid” which
marshals regulatory interactions for use in Cellerator.
In view of this distinction, network dynamic models can be built by first specifying the
biological mechanism involved in a reaction that enters into the dynamics for a node, and
then choosing one of a limited set of plausible mathematical models for that mechanism.
The libraries of known biological mechanisms and mathematical reaction models can
each be put into a specialization (inheritance) hierarchy or directed acyclic graph, with a
consistent set of cross-hierarchy pointers showing plausible translations from biology to
mathematics. Separate biological mechanisms include top-level categories such as
transcriptional regulation, metabolic synthesis and degradation reactions, protein state
modification including phosphorylation (e.g. in MAPK signal transduction pathway
models) and ubiquitination, protein complex aggregation and disaggregation reactions
(e.g. in the NFB signal transduction pathway models)
A collection of mathematical reaction models that can be used to build node dynamics is
provided by many cell simulation packages including Cellerator [Shapiro et al. 2003]. A
top level classification of useful enzymatic reaction models alone would include a
steadily growing subset of a growing cross product space of alternatives such as: {mass
action, Michaelis-Menten style approximations using separation of time scales} x
{single-substrate, multi-substrate reactions} x {uninhibited, noncompetitive inhibition,
competitive inhibition, uncompetitive inhibition} [Yang et al. 2004] x {allosteric (often
modeled with versions of the Monod-Wyman-Changeau (MWC) simplified model e.g.
for Threonine Deaminase in E. coli metabolism), nonallosteric} x {elementary reactions,
compound reactions made of many sub-reactions} x {deterministic, stochastic i.e.
Langevin differential equation, stochastic master equation [Gillespie 1977], particle level
simulations as in MCell [Stiles et al. 1996]}. The combinatorial profusion of reasonable
modeling alternatives that should be available favors automatic model generation and
flexible algebraic model representations such as provided in Cellerator and supported in
Systems Biology Markup Language (SBML) Level 2.
As an example of this rich space of mathematical reaction models, the MWC model may
be generalized to incorporate multiple activators and inhibitors. For enzyme E, substrates
{S}, activators {A}, inhibitors {I}, and product P, the Generalized MWC reaction may be
denoted
E

S1, S2 ,...
P.
A , A ,..., I , I ,...
1
2
1
2
If all concentrations are normalized by their corresponding KM constants, the kinetics for
production of the final product may be derived from statistical mechanics and written as
dP
dt
 S  1  S   1  A   L  cS  1  cS   1  I  
K E
n1
k
D
k
k
k
n
k
n1
k
k
k
n
k
k
 1  S   1  A   L  1  cS   1  I  
n
k
k
n
k
n
k
k
k
k
k
n
k
k
(1)
On the other hand, for reactions involving multistate complexes with largely unknown
internal dynamics, such as eukaryotic transcriptional regulation, a somewhat different
hierarchy of modeling alternatives are available [Gibson and Mjolsness 2001] including
neural network style phenomenological models, MWC style near-equilibrium statistical
mechanics models, and compound reaction schemes built of out of elementary
(directional) catalytic reactions. From the mathematical point of view, a key difference is
the need to describe the occupation of a very limited number of copies of a particular
binding site – perhaps just one or two copies per cell in the case of particular transcription
factor binding sites. This can still be done with deterministic modeling of an occupancy
probability, if the shorter of (occupancy time interval, unoccupied time interval) is short
enough compared to other time scales to be integrated over when its effect on downsteam
processes such as transcription are modeled.
3. Network statistics and dynamics
Much attention has rightfully been paid to the connectivity statistics of regulatory
networks. Examples include degree distributions and the prevalence of network motifs
represented as subgraphs [Ziv et al. 2003, Milo et al. 2003]. A wide variety of such
network statistics can be represented using Boltzmann distributions defined on adjacency
matrices for graphs [Newman 2003]. For example, the energy function governing degree
distributions can be taken as
Edegree G    ia i Gia   i f
 G ,
a
ia
G  GT ,
(2)
where G is a symmetric adjacency matrix. The resulting statistical system can then be
simulated using auxiliary variables, as shown this calculation of the partition function:
Z   






exp    i   Gij    f   Gij  
 j

 j
 
i
 i


G G T
G  0,1
  


1
G G 
N
T

i

i

ij
j

  d L  d       G

1
G G 
N
T


  d L  d       G

 
G G T 
G  0,1 
  

0
d1 L


0
i

i
ij
j
 




  exp    i i  exp    f i 
 
 i

 i

(3)
 




  exp    i i   0 d i exp   i i    i  
 
 i
 i







d N exp    i   Gij     i   Gij      i 
 j

 j

i
i
 i

Note that this partition function can also be written, using
zi  exp  i , ci  exp  i , z0  c0  1 ,
as
Z z  
 

 

 

G G T 
G  0,1 
  

G G T 
G  0,1 
  

G G T 
G  0,1 
  
0
0
0
d1 L
dc1 L
dc1 L






0
0
0


 N



  Gij  

d N  ci zi  j   exp      i 
 i

 i 1



 N
 N

 G  
dcN  ci zi  j ij     ci 

 i 1
  i 1
 N
G 
dcN  ci zi  ij    ci c j zi z j
 i 1
 1i  j  N


Gij
 N

   ci 
  i 1

(4)
 N
 N


dc
1

c
z
1

c
c
z
z






i i
i j i j    ci 
0
0 N 
  i 1
i 1
 1i  j  N




 N

  dc1 L  dcN   1  ci c j zi z j    ci 
0
0

 0 i  j  N
  i 1


dc1 L





Thus, we derive the efficiently sampled distribution on G given by independent
distributions on ci followed by conditionally independent Bernoulli distributions on
elements of G:
 
Z z  
G G T 
G  0,1 
  
Pr G  


0

dc1 L
0
dc1 L


0


0

dcN   ci c j zi z j
 0i  j  N

 

 c c Gij

i j
dcN   
 0i  j  N  1  ci c j

Gij
 N

   ci 
  i 1

  N


    ci 
   i 1


(5)
Further progress in learning and exploiting such patterns on graph structure may require
disaggregation of the networks and their statistics by reaction and/or reactant high-level
node type. Then network graph statistics may be formalized in terms of a Boltzmann
distribution on adjacency matrices for the network connections of a given type signature,
as well as by course-scale graphs giving overall connectivity probabilities between nodes
of different types such as genes, RNAs, proteins, and a combinatorial explosion of
complexes and their states [Mjolsness 2004].
Other open frontiers in network connectivity modeling also include the sculpting of
effective network topology and dynamics on an intermediate time scale by relatively
infrequent but fast switching events in the full network dynamics, and its application to
the description of a generally narrowing range of effective regulatory networks during
developmental time in a multicellular organism. Molecular machines may be described
in terms of highly correlated probability distributions, following the MWC example.
Phylogenetic relationships between networks are also open to statistically based dynamic
network modeling.
4. Labeled Graph Notations for Regulatory Models
The foregoing observations about node and network level dynamics may be formalized
with probabilistic models on graph structure and node state variables, described using a
concise graph notation [Mjolsness 2004]. For example the modulation of dependency of
one random variable xi on another random variable yj by a graph element Gij, whose
degree structure is controlled by Edegree of equation (2) above, could be diagrammed as in
Figure 1.
y
x
gated Prob


j


G
 i
EdegreeG
Figure 1. Constraint nodes (hexagonal) and index nodes (square) for the sparse graph adjacency matrixvalued random variable G, which gates a conditional dependency between random variables xi and yj .
Variable-structure relationships can provide the topology within which node dynamics
are local, for example in a recurrent neural network or other feedback-capable model of
transcriptional and other biological regulatory networks [Mjolsness Sharp and Reinitz
1991].
One key to this application is to introduce a (boxed) index node “t” for time in an indexed
probabilistic diagram for a probabilistic model such as shown in Figure 1. This t index
corresponds to loop unrolling in a conventional neural network. We can also introduce
internal vector indices a (not shown) for the state variables (e.g. concentrations) v.
t'
t


v
v
G

j



i
Figure 2. Objects with node dynamics. The state vector v of object j at time t influences the
change in state variable v in object i at time t’, if they are connected by graph G. Tangency of G
to the conditional dependence arrow from v to v is another way to express the dependency
“gating” relationship of Figure 1.
The return arrow from v to v allows the change of state sampled at time t to be added
into the state vector at time t+1. The resulting graph of dependencies is still a Directed
Acyclic Graph.
A generic energy function for dynamics, local to the network, can be represented for
example as:


E  C1   Gija uia t  F  i  j  v j t , pija
t
ija
  C  u t 
2
2
2
ia
t,i,a


 C3   v i t  t  v i t/ t   gia uia t 
t
i 
a

2
where
G  a Gija
,
(i) represents the type of node i, and a represents a reaction impinging on network
player i. The forms of reaction kinetic functions F are diverse and can be generated for
example by a cellular model-generation program such as Cellerator [Shapiro et al. 2003].
With implicit graph prior probability distributions, every network diagram like Figure 2
corresponds to a probability distribution over networks including their dynamics. Large
amounts of data mapping cell states to changes in state variables, i.e. time derivatives,
may be collected and fit to such models. The data can include zero derivatives observed
in stationary states. Expected temporal behaviors of high probability dynamical networks
in such a distribution could be characterized in many ways, including their
approximability by simpler networks including but not limited to those exhibiting
attractor dynamics such as fixed points and limit cycles.
5. Discussion
Network dynamics for biological regulatory networks are richer than current models
allow, due to the variety of different types for object nodes (molecules and other reaction
participants) and process nodes (for example reactions). Generative probabilistic models
can be developed to express the richness of such networks both in their local node
dynamics and in their connection patterns.
Acknowledgements
Thanks for discussions with Chris Hart, Barbara Wold, Pierre Baldi, Chin-Rang Yang, Bruce
Shapiro, and Lucas Scharenbroich. Support was provided by the National Institute for General
Medical Sciences BISTI program, grant number GM069013.
References
Michael Gibson and Eric Mjolsness, “Modeling the Activity of Single Genes”, in
Computational Methods in Molecular Biology, eds. J. M. Bower and H. Bolouri, MIT
Press 2001.
D. T. Gillespie. “Exact stochastic simulation of coupled chemical reactions”. J.
Phys. Chem, 81:2340-2361, 1977.
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. “Network motifs:
simple building blocks of complex networks.” Science. Oct 25;298(5594):824-7, 2002.
Eric Mjolsness, “Labeled Graph Notations for Graphical Models: Extended Report”, UCI
ICS Technical Report #04-03, March 2004.
M. E. J. Newman “The structure and function of complex networks”. SIAM Review 45,
167-256 (2003).
Bruce E. Shapiro, Andre Levchenko, Elliot M. Meyerowitz, Barbara J. Wold , and Eric
D. Mjolsness Cellerator: extending a computer algebra system to include biochemical
arrows for signal transduction simulations. Bioinformatics vol. 19 no. 5, pages
677–678, 2003.
Stiles J.R., Van Helden L., Bartol T.M., Salpeter E.E., Salpeter M.M., “Miniature
endplate current rise times less than 100 microseconds from improved dual recordings
can be modeled with passive acetylcholine diffusion from a synaptic vesicle” PNAS
June 11; 93(12): 5747-5752, 1996.
Etay Ziv, Robin Koytcheff, and Chris Wiggins, “Novel systematic discovery of
statistically significant network features” Preprint, Arxiv Condensed Matter, abstract
cond-mat/0306610.
Download