Structural and Behavioral Properties of Biochemical Networks Herbert M. Sauro University of Washington Seattle, WA Ambrosius Publishing Copyright © 2010 Herbert M. Sauro. All rights reserved. Draft Edition v0.6, first upload (January, 2011) Published by Ambrosius Publishing www.sysbioBooks.com Typeset using LATEX 2" , TikZ, PGFPlots, WinEdt and Math Time Professional 2 Fonts Limit of Liability/Disclaimer of Warranty: While the author has used his best efforts in preparing this book, he makes no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the author nor publisher shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Printed in the United States of America. Front-Cover: Protein images from RCSB Protein Data Bank and David Goodsell © (www.pdb.org). Contents 1 2 3 Quantitative Models 1 1.1 Different Kinds of Model . . . . . . . . . . . . . . . . . 2 1.2 Desirable Attributes . . . . . . . . . . . . . . . . . . . . 3 1.3 Variables and Parameters . . . . . . . . . . . . . . . . . 4 1.4 Dimensions and Units . . . . . . . . . . . . . . . . . . . 7 1.5 Model Approximations . . . . . . . . . . . . . . . . . . 9 1.6 Types of Mathematical Models . . . . . . . . . . . . . . 11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Graphs and Networks 13 2.1 Introduction to Graph Theory . . . . . . . . . . . . . . . 13 2.2 Example Network: Protein-protein Networks . . . . . . 15 2.3 Stoichiometric Networks . . . . . . . . . . . . . . . . . 18 Stoichiometric Networks 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Stoichiometry Matrix . . . . . . . . . . . . . . . . . . . 28 3.3 Mass-Balance Equations . . . . . . . . . . . . . . . . . 29 3.4 The System Equation . . . . . . . . . . . . . . . . . . . 32 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 iii CONTENTS iv 4 5 6 Flux Balance Laws 37 4.1 Flux Balance Laws . . . . . . . . . . . . . . . . . . . . 39 4.2 Determined Systems . . . . . . . . . . . . . . . . . . . 41 4.3 Flux Balance Analysis . . . . . . . . . . . . . . . . . . 52 4.4 Isotopic Flux Measurements . . . . . . . . . . . . . . . 62 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Steady State Flux Patterns 73 5.1 The Null Space . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Elementary Flux Modes . . . . . . . . . . . . . . . . . . 79 5.3 Definition of a Pathway . . . . . . . . . . . . . . . . . . 86 5.4 Maximum Yield Predictions . . . . . . . . . . . . . . . 87 5.5 Engineering a Pathway . . . . . . . . . . . . . . . . . . 90 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Species Conservation Laws 93 6.1 Moiety Conserved Cycles . . . . . . . . . . . . . . . . . 97 6.2 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . 99 6.3 Computational Approaches . . . . . . . . . . . . . . . . 104 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5 Behavioral Consequences . . . . . . . . . . . . . . . . . 116 6.6 Advanced Theory . . . . . . . . . . . . . . . . . . . . . 126 6.7 Numerical Methods . . . . . . . . . . . . . . . . . . . . 132 6.8 Design of Simulation Software . . . . . . . . . . . . . . 140 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Math Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 146 References 147 CONTENTS History v 157 vi CONTENTS 1 Quantitative Models The Oxford English dictionary defines a model in the following way: “A simplified or idealized description or conception of a particular system, situation, or process, often in mathematical terms, that is put forward as a basis for theoretical or empirical understanding, or for calculations, predictions, etc.” This definition embodies a number of critical features that defines a model. Probably the most important is that a model represents an idealized description, a simplification, of a real world process. This is important because it allows us to comprehend the essential features of a complex process without being burdened and overwhelmed by unnecessary detail. Models are therefore not replicas of reality, they are, by design, approximations. 1 2 CHAPTER 1. QUANTITATIVE MODELS 1.1 Different Kinds of Model There are many different ways of approximating reality that include mathematical as well as non-mathematical approaches. In biology a very common non-mathematical way to represent cellular networks is by cartoons. Such cartoons distill into a very concise form, the results of thousands of experiments undertaken to delineate the pathways responsible for mass and signal flow. Cartoon models are useful for giving a quick snapshot of a given process however they have limitations as reasoning tools. To increase their reasoning power cartoons can be converted into mathematical models. Of these there are many forms, ranging from simple graph interaction maps to sophisticated dynamic temporal and spatial models. Models also come in two different forms which might be called conceptual and concrete. Concrete models are proposed models of some particular real system, a metabolic pathway such as glycolysis or a signaling network such as the MAPK pathway. Such models are used to predict the behavior in the real pathway so that the assumptions we used in constructing the model can be tested. Alternatively we can also build conceptual models. These models are thought experiments, they allow us to investigate the properties of hypothetical networks. Conceptual models serve as test-beds for investigating basic principles of network design. A typical thought experiment might involve the investigation of how a particular enzyme in a linear pathway affects the level of metabolite levels. Conceptual models can also serve as a means to develop new hypotheses concerning the global properties of pathways. One hypothesis might concern the prediction that the control of flux in a linear pathway without regulation is always located on the enzymatic steps near the start of the pathway. Conceptual models also allow us to abstract and study generic pathway motifs, for example what are the distinguishing behavioral properties of a branched pathway compared to a cyclic pathway? What properties do negative feedback compared to positive feedback confer on the dynamical properties of a pathway? Many of the examples we will use in this course will be conceptual. Conceptual models are not intended to describe actual real biological systems but are instead employed to aid rea- 1.2. DESIRABLE ATTRIBUTES 3 soning about particular aspects of biological networks. The importance of conceptual model is that understanding conceptual models makes it much easier to understand concrete models. 1.2 Desirable Attributes What makes a good model? There are a range of properties that a good model should have, but probably the most important are accuracy, predictability and falsifiablity. B A model is considered accurate if the model is able to describe current experimental observations, that is a model should be able to reproduce the current state of knowledge. B A predictive model should be able to generate insight and/or predictions that are beyond current knowledge. Without this ability a model is considerably less useful, some would even suggest useless. B Finally, a model should be falsifiable. Since no model can ever be proven to be correct, the only means to validate a model is to refute it, that is, to show that through some experimental test, the model is insufficient. For example, the statement, RNA is never transcribed into DNA can be falsified simply by finding one instance where it happens (e.g. the life cycle of the HIV virus). A more common means of testing a model is through ‘validation’. This simply means testing whether a prediction made by the model is correct. When the model correctly makes a prediction a model is not falsified but instead our confidence that the model will be able to make further correct predictions in increased. Many models are of this kind, they have never been falsified but we have high confidence in them. There are other attributes of a model that are desirable but not essential, these include parsimonious and selective. A parsimonious model is a model that is as simple as possible, but no simpler. This is related to OccamŠs infamous razor which states that “Entities should not be multiplied beyond necessity” and argues that given competing and equally good models, the simplest is preferred. Finally, since no model can represent 4 CHAPTER 1. QUANTITATIVE MODELS everything in a given problem, a model must be selective and represent those things most relevant to the task at hand. 1.3 Variables and Parameters In both numerical simulations and experiments it is impossible and unnecessary to deal with the entire universe. Instead we select a region of interest, an organ, a tissue, a cell, a segment of a pathway or even a single enzyme. The region of interest will have a boundary that marks the division between the system and the surrounding environment. The choice of boundary is important, it shouldn’t be too small so that the interesting behavior is no longer observable but it must not be to large to make the study difficult to achieve. In any study we therefore divide quantities into two broad groups, variables and parameters. Intensive and Extensive Properties. In science a distinction is made between physical quantities termed intensive and extensive. An intensive property is a physical quantity whose value does not depend on the size of the system. Examples include pressure, density, concentration and temperature. An extensive property is a physical quantity whose value does depend on the size of the system, examples include mass, volume, energy and entropy. A variable is a quantity that changes during the course of a simulation or experiment. For example, changes in the level of a phosphorylated protein, the level of mRNA, the concentration of a metabolite or the voltage across a membrane are all examples of variables. Variables are also called state variables because they determine the state of the system. If one quantity depends on the other, we call the first quantity the dependent variable; it is the controlled value. The other quantity, the cause, is called the independent variable. Often independent variables are also called parameters. The main characteristic of a parameter is that it is not a function of the dependent variables and in many cases is under the control of the experimenter. Examples of parameters include kinetic rate constants, equilibrium constants, a clamped voltage or a constant external molecular species that supplies a pathway. Some of these parameters can 1.3. VARIABLES AND PARAMETERS 5 Environment B1 ; B2 ; Bi ; : : : S1 ; S2 ; Si ; : : : System Figure 1.1: System and Environment: S1 ; S2 ; Si ; : : : are state variables that will change during the evolution of the system; B1 ; B2 ; Bi ; : : : are boundary variables that are clamped to certain values by the observer. The exchange arrows represent the exchange of mass between the environment and the system. be controlled by the experimenter, for example an external concentration, while others, such as an equilibrium constant may be very difficult or even impossible to change. Parameters are also divided into two groups, external variables such as boundary species of clamped voltages and internal parameters such as kinetic constants. External concentrations are often called boundary species because they are at the boundary of the system and the external environment. In an experiment, boundary species are clamped by some kind of buffering mechanism. The buffering mechanism can simply be a large external reservoir so that any exchange of mass between the system and the external environment has a negligible effect on the external concentration. Alternatively there may be active mechanisms maintaining an external concentration. A classic example of active maintenance of an external variable is the voltage clamp used in electrophysiology. Finally, the external concentrations may simply be slow moving compared to the timescale of the model so that over the study period, the external concentrations change very little. A typical example of the latter is the study of a metabolic response over a timescale that is shorter than gene expression. This permits a modeler or experimentalist to study a metabolic pathway 6 CHAPTER 1. QUANTITATIVE MODELS without considering the effect of changes in gene expression Figure 1.2 illustrates a simplified model of glycolysis. The corresponding Table 1.1 lists the various variables and parameters that have been identified in the model. Glucose and ethanol are assumed to be boundary variables, that is controlled by the observer. This can be arranged by supplying glucose from a large volume compartment so that when consumed by the pathway there is only a negligible change in its concentration. Likewise we assume that ethanol is discharged into a large volume. In this highly simplified model we also make a possibly unreasonable assumption that NAD and NADH do not change appreciably during the duration of the study. Such choices are necessary when building a model. To justify this assumption we would need to carry out experiments to ascertain what actual changes occur in the NAD/NADH couple. We make an additional assumption about ATP. Since glycolysis is ostensibly the pathway for generating ATP, some way to simulate ATP consumption is necessary, this is achieved by including a single step that hydrolyzes ATP to ADP even though we know that ATP consumption is a complex process involving many separate reactions. The response of the pathway to changing ATP demand can be simulated by perturbing the ATP demand step. NAD Glucose ATP F-16-BisP G3P NADH NADH NAD Pyruvate Ethanol 2ADP 2ATP ADP ATP ADP + Pi Figure 1.2: A simplified glycolytic pathway. Many reactions have been condensed and ATP consumption has been simplified to a single process ATP ! ADP C P i 1.4. DIMENSIONS AND UNITS 7 State Variables System Parameters Boundary Variables F-16-BisP G3P Pyruvate ATP ADP Kinetic Constants Enzyme Activities Volume Temperature Glucose Ethanol NAD NADH Pi Table 1.1: Variables and parameters for the simplified glycolytic model ??fig:GlycolysisFigure). We assume that glucose and ethanol are clamped by the observer using large volume sinks. We assume that during the period of study that the concentrations of NAD and NADH remain essentially unchanged. This latter assumption may be unreasonable and would need to be justified experimentally. F-16-BisP = Fructose-1,6-bisphosphate; G3P = Glyceraldehyde-3-Phosphate; Pi = Phosphate 1.4 Dimensions and Units Variables and parameters that go into a model will be expressed in some standard unit of measurement. In science the recognized standard for units are the SI units. These include units such as the meter for length, kilogram for mass, second for time, Joules for energy, kelvin for temperature and the mole for amount. The mole is of particular importance because it is a means to measure the number of particles of substance irrespective of the mass of substance itself. Thus 1 mole of glucose is the same amount as 1 mole of the enzyme glucose-6-phosphate isomerase even though the mass of each type of molecule is quite different. The actual number of particles in 1 mole is defined as the number of atoms in 12 grams of carbon-12 which has been determined empirically to be 6:0221415 1023 . This definition means that 1 mole of substance will have a mass equal to the molecular weight of the substance, this makes is easy to calculate the number of moles using the following relation 8 CHAPTER 1. QUANTITATIVE MODELS moles D mass molecular weight The concentration of a substance is expressed in moles per unit volume and is usually termed the molarity. Thus a 1 molar solution means 1 mol of substance in 1 litre of volume. Dimensional analysis is a simple but effective method for uncovering mistakes when formulating kinetic models. This is particularly true for concrete models where one is dealing with actual quantities and kinetic constants. Conceptual models are more forgiving and don‘t usually require the same level of attention because they are much simpler. Amounts of substance is usually expressed in moles and concentrations in moles per unit volume (mol l 1 ). Reaction rates can be expressed either in concentrations or amounts per unit time depending on the context (mol t 1 , mol l 1 t 1 ). Rate constants are expressed in differing units depending on the form of the rate law, the rate constants in simple first order kinetics are expressed in per unit time (t 1 ), while in second order reactions the rate constant is expressed per concentration per unit time (mol 1 t 1 ). In dimensional analysis, units on the left and right-hand sides of expressions must have the same units (or dimensions). There are certain rules for combining units when checking consistency in units. Only like units can be added or subtracted, thus the expression S C k1 cannot be summed because the units of S are likely to be mol l 1 and the units for k1 , t 1 . Even something as innocent looking as 1 C S can be troublesome because S has units of concentration but the constant value “1” is unit-less. Quantities with different units can be multiplied or divided with the units for the overall expression computed using the laws of exponents and treating the unit symbols as variables. Example 1.1 Determine the overall units for the expression, k1 S=Km where the units for each variable are k1 : t 1 l; S : mol and Km : mol l 1 . We first write out the expression in terms of the individual units: t 1 l mol=.mol l 1 / 1.5. MODEL APPROXIMATIONS 9 by treating the symbols are algebraic variables we can see that the symbol mol will cancel and using exponents rules we can bring the l 1 term to the denominator to yield: t 1 2 l In exponentials such as exp x, the exponent term must be dimensionless, or at least the expression should resolve to dimensionless, thus exp .k t / is permissible but exp .k/ is not. Trigonometric functions will always resolve to dimensionless quantities because the argument will be an angle which can always be expressed as a ratio of lengths which will by necessity have the same dimension. 1.5 Model Approximations By their very nature, models involve making assumptions and approximations. The best modelers are those individuals who can make the most shrewd and viable approximations without compromising the accuracy of the model’s predictions. In many cases it is only through direct experience that a modeler will learn what are the best approximations to make. There are however some kinds of approximations which are useful in most problems: Neglecting small effects. Assuming that the environment is unchanged by the system. Replacing complex subsystems with lumped or aggregate laws. Sometimes it is possible to assume simple linear cause-effect relationships even though the underlying process is complex. Assuming that the physical characteristics of the system do not change with time. Neglect noise and uncertainty. 10 CHAPTER 1. QUANTITATIVE MODELS Neglecting small effects. Neglecting small effects includes such things as changes in the local ionic strength during a catalytic reaction or small changes in pH or the volume of a cell. Sometimes these small effects can be important however Replacing complex subsystems with lumped or aggregate laws. Lumping subsystems is a commonly used technique in simplifying cellular models. The most important of these are aggregate rate laws, such as MichaelisMenten or Hill like equations to model cooperativity. Sometimes entire sequences of reactions can be replaced with a single rate law. However care must be taken when selecting these approximations. In particular, aggregate rate laws do not model the effects of substrate sequestration which in some systems, such as protein networks can critically affect the behavior. In addition replacing entire sections of enzymes with one lumped rate law does not model the delay in the transmission of perturbations caused by the sequence of enzymes. Assuming simple linear cause-effect relationships. In some cases it is possible to assume a linear cause-effect between an enzyme reaction rate and the substrate concentration, this is especially true when the substrate concentration is below the Km of the enzyme. Another common approximation is to assume that the rate of degradation of protein is first-order even though degradation involved a highly complex process. Linear approximations can greatly simplify analytical studies of model and in some cases they can also simplify numerical analysis. Physical characteristics do not change with time. A modeler will often assume that the physical characteristics of a systems do not change, for example the volume of a cell, the values of the rate constants or the temperature of the system. In many cases such approximations are quite reasonable. Neglecting noise and uncertainty. Most models make two important approximations. The first is that noise in the system is either negligible 1.6. TYPES OF MATHEMATICAL MODELS 11 or unimportant. In many non-biological systems such an approximation might be reasonable. However biological systems operate at the molecular level. As a result, biological systems are susceptible to noise generated from thermal effects as a result of molecular collisions. For many systems the large number of particles ensures that the noise generated in this way is insignificant and in many cases can be safely ignored. For some systems such as prokaryotic organisms, the number of particles in some of the cellular systems is very small. In such cases the effect of noise can be significant and therefore must be included as part of the model. 1.6 Types of Mathematical Models Models can be divided up into a number of broad groups, the most notable include: NonLinear and Linear Models Discrete and Continuous Deterministic and Stochastic Deterministic Models A deterministic model is one where the state of the system at any time is determined entirely by the initial conditions. This implies that the parameters and variables of the model are not subject to random fluctuations. Repeated runs of a deterministic model with the same initial conditions will yield identical results. There are a number of approaches to building deterministic models in cellular biology. The most common is the use of ordinary differential equations to describe the rate of change of molecular species in time. Such models are the primary focus of this book. Other researches have examined the use of Boolean models and models based on partial differential equations. 12 CHAPTER 1. QUANTITATIVE MODELS Stochastic Models Another important class of model that is frequently used in building cellular models is the stochastic model. The deterministic model based on ordinary differential equations assumes a continuum of values for concentrations. This clearly ignores the fact that biological process are a result of particulate interactions at the molecular level and strictly speaking concentrations should be described by discrete values. However, because we often deal with systems containing tens of thousands of particles we assume that we can describe concentration as a continuous variable. For systems where the particulate number is very low, of the order of tens of particles, the use a continuum measure is unreasonable. However, an additional and more important problem arises when dealing with low particulate numbers. At low concentrations, Brownian motion becomes a significant factor in determining reaction rates such that when a molecule binds or is transformed, the time at which the event occurs becomes a statistical property. As a result of these factors, models of systems containing low particulate numbers are better modeled using a stochastic approach. Exercises 1. Choose from the following options. A model is: (a) an attempt to form an exact replica of reality. (b) something that bears no resemblance to the real system. (c) a simplification of the real world. 2. List the three most desirable attributes of a model. 3. When we “validate” a model which of the following do we most likely mean: (a) We show that the model represents the truth about the real system. (b) We increase our confidence in the model’s predictive power. (c) We prove that the model is correct. 2 Graphs and Networks 2.1 Introduction to Graph Theory Mathematically, a graph is described by a set of nodes (often called vertices) and a set of edges that connect the nodes. Apart from their theoretical interest to mathematicians, there are many real-world problems that can be represented as graphs. For example, the links between web sites, an ecological food web or a set of protein interactions are common examples of systems which can be represented by graphs. In many cases, the nodes of a graph represent physical entities such as web sites, organisms or proteins and edges represent the relationships between the nodes. A typical example of a graph in cell biology is the protein interaction graph where the set of nodes represent proteins in a cell and the edges between two nodes represent a physical interaction between two proteins. Such graphs are called protein-protein interaction graphs. In practice graphs are commonly represented using either lists or matrices (Fig. 2.1) with lists being the most concise. 13 14 CHAPTER 2. GRAPHS AND NETWORKS Fig 2.1 illustrates a small graph and its corresponding symmetrical adjacency matrix. If a graph has n nodes, then the adjacency matrix will be a symmetrical n n matrix. The rows and columns of the matrix correspond to the nodes in the graph. An intersection of a row and column (i.e. between two nodes) is marked by a one if a connection is present, otherwise it is marked by a zero. The number of edges that are incident on a particular node is called the degree, k, of the node. 1 3 2 5 6 4 Graph 1 2 3 4 5 6 1 0 1 0 1 0 0 2 1 0 1 1 0 1 3 0 1 0 0 0 1 4 1 1 0 0 1 0 5 0 0 0 1 0 0 6 0 1 1 0 0 0 Adjacency Matrix 1 2 3 4 5 6 (2,4) (1,3,4,6) (2,6) (1,2,5) (4) (2,3) Adjacency List Figure 2.1: Equivalent representations of a graph, an adjacency matrix and an adjacency list. Note that the adjacency matrix is symmetrical, i.e. A D A T Many published graphs are undirected, that is an edge joining two nodes has no specific direction. This means it is not possible with undirected graphs to indicate flows, direction or specific dependency information. Visually an undirected edge is simply a straight line while a directed edge is usually depicted as a line with an arrow head, with the direction of the 2.2. EXAMPLE NETWORK: PROTEIN-PROTEIN NETWORKS 15 arrow head indicating the direction of dependence, Fig. 2.2. Graphs can also be annotated, that is both the edges and nodes can be labeled with additional information, for example whether a particular protein is essential. Graphs that have annotated edges are also called weighted graphs. Interaction graphs are useful for a number of reasons. First they Undirected Graph Directed Graph Figure 2.2: Simple undirected and directed graphs. provide a formal way to represent biological knowledge on a large scale. Once represented formally, such graphs can be visualized (Chapter 2) to give an overview of structure and connectivity. In addition various measures, such as the degree distribution, clustering coefficient, path length or centrality can be employed to characterize the graphs [37, 1, 38]. 2.2 Example Network: Protein-protein Networks Work on uncovering protein networks has be ongoing since the 1950s and considerable detail has accumulated on many different pathways across different organisms. More recently, high throughput techniques have been employed to describe large network protein interaction maps. In this work, an interaction is defined if two proteins, A and B are known to associate. Such information however generally ignores stoichiometry, mass conservation and kinetics, hence they will be termed non-stoichiometric networks. Traditional methods, though laborious [13, 11] have been used extensively to gain detailed knowledge on phosphorylation sites, protein structure, the 16 CHAPTER 2. GRAPHS AND NETWORKS nature of membrane receptors and the constitution and function of protein complexes. More recently high-throughput methods, though more course grained, have been used to elucidate large swaths of protein-protein interaction networks. For example, in yeast, large scale studies have identified approximately 500 different protein complexes [19, 33] and their relationships to each other. A popular high-throughput technique that has been used to uncover proteinprotein interaction networks is the Yeast two-hybrid method [18, 44] but other methods such as phage display [59, 22] and particularly affinity purification and mass spectrometry have also been employed [19, 33]. The Yeast two-hybrid method is based on the idea that eukaryotic transcriptional activators consist of two domains, a DNA binding domain (DB) and an activation domain (AD). The activation domain is responsible for recruiting the RNA polymerase to begin transcription. What is remarkable is that the two domains do not have to be covalently linked in order to function correctly but merely need to be in close proximity. It is this property that is the basis of the Yeast two-hybrid method. Let us assume that it is required to know whether two proteins, X and Y interact with each other. In the two-hybrid method, protein X is fused with the DB domain (known as the bait protein) and the second protein, Y, is fused with the AD domain (known as the prey protein). These two fused proteins are now expressed in Yeast and if the two proteins, X and Y, interact in some way they will also bring the DB and AD domains close to each other resulting in an active transcriptional activator. If the gene downstream of the DNA binding sequence is a reporter gene, then the interaction of X and Y can be detected. A common reporter gene is the lacZ gene which codes for ˇ-galactosidase and which produces a blue coloring in Yeast colonies through the metabolism of exogenously supplied X-gal (5-bromo-4-chloro-3-indolyl-ˇ-D-galactoside). There are some caveats with the Yeast two-hybrid method. Although two proteins may be observed to interact, the protein in their natural setting may not be expressed at the same time or may be expressed but in different compartments. In addition using the method to identify interactions between non-yeast proteins may be invalid because of the alien environ- 2.2. EXAMPLE NETWORK: PROTEIN-PROTEIN NETWORKS 17 ment in the yeast cell. As with many high-through-put methods caution is advised when interpreting the data. Wild Type AD BD Reporter Gene Bait BD-Bait BD Reporter Gene Prey AD AD-Prey Reporter Gene Bait BD-AD Prey AD BD Reporter Gene Figure 2.3: Yeast two-hybrid. The wild-type transcription fact is composed of two domains, BD and AD. Both are essential for transcription. Two fusion proteins are made, BD-Bait and AD-Prey. Bait and Prey are two proteins under investigation. If the two protein, Bait and Prey interact bringing BD and AD together resulting in a viable transcription fact that can be used to express a reporter gene. Using techniques such as Yeast two-hybrid, one of the first interaction graphs to be published was the protein interaction graph of Saccharomyces cerevisiae [62, 28]. Subsequent analysis of this map was conducted by Jeong et al. [29] and included 1870 proteins nodes and 2240 interaction edges. Such graphs give a birds-eye view of protein interactions (Fig. 2.4). 18 REVIEWS CHAPTER 2. GRAPHS AND NETWORKS Figure 2 | Yeast protein interaction network. A map of protein–protein interactions18 in Saccharomyces cerevisiae, which is based on early yeast two-hybrid measurements23, illustrates mathematical properties of random networks14. T much-investigated random network model assumes a fixed number of nodes are connected randomly to other (BOX 2). The most remarkable property of the m is its ‘democratic’ or uniform character, characterizin degree, or connectivity (k ; BOX 1), of the individual no Because, in the model, the links are placed rando among the nodes, it is expected that some nodes co only a few links whereas others collect many more random network, the nodes degrees follow a Poi distribution, which indicates that most nodes roughly the same number of links, approximately e to the network’s average degree, <k> (where <> den the average); nodes that have significantly more or links than <k> are absent or very rare (BOX 2). Despite its elegance, a series of recent findings i cate that the random network model cannot exp the topological properties of real networks. deviations from the random model have severa signatures, the most striking being the finding tha contrast to the Poisson degree distribution, for m social and technological networks the number of n with a given degree follows a power law. That is probability that a chosen node has exactly k l follows P(k) ~ k –γ, where γ is the degree exponent, its value for most networks being between 2 a (REF. 15). Networks that are characterized by a power degree distribution are highly non-uniform, mo the nodes have only a few links. A few nodes with a large number of links, which are often called hubs, these nodes together. Networks with a power de distribution are called scale-free15, a name that is ro in statistical physics literature. It indicates the abs of a typical node in the network (one that coul used to characterize the rest of the nodes). This strong contrast to random networks, for which degree of all nodes is in the vicinity of the ave degree, which could be considered typical. Howe scale-free networks could easily be called scale-ric well, as their main feature is the coexistence of nod widely different degrees (scales), from nodes with or two links to major hubs. Figure 2.4:thatThe poster child of areinteraction one of the early a few highly connected nodes (which also known as hubs)networks, hold the network together. The largest cluster, which contains ~78% of all proteins, is shown. The colour of a node indicates Yeast protein interaction networks yeast two-hybrid the phenotypic effect of removing the correspondinggenerated protein (red = lethal, from green = non-lethal, orange = slow growth, yellow = unknown). Reproduced with permission from REF. 18 © measurements. Each Ltd. node represents a protein and each edge an inMacmillan Magazines teraction. In addition the graph nodes have been annotated so that red Depending on theif nature of the interactions, net- non-lethal, nodes indicate lethal phenotypic effect removed, green works can be directed or undirected. In directed orange slow growth and yellow unknown. Adapted Barabási and networks, the interaction between any twofrom nodes has a well-defined direction, which represents, for example, Oltvai [5] but originally published in arxiv and Nature [29] the direction of material flow from a substrate to a product in a metabolic reaction, or the direction of information flow from a transcription factor to the gene that it regulates. In undirected networks, the links do not have an assigned direction. For example, in protein interaction networks (FIG. 2) a link represents a mutual binding relationship: if protein A binds to protein B, then protein B also binds to protein A. 2.3 Stoichiometric Networks Architectural features of cellular networks Cellular networks are scale-free. An important deve ment in our understanding of the cellular netw architecture was the finding that most networks wi the cell approximate a scale-free topology. The first dence came from the analysis of metabolism, in w the nodes are metabolites and the links repre enzyme-catalysed biochemical reactions (FIG. 1). As m of the reactions are irreversible, metabolic network directed. So, for each metabolite an ‘in’ and an degree (BOX 1) can be assigned that denotes the num of reactions that produce or consume it, respecti The analysis of the metabolic networks of 43 diffe organisms from all three domains of life (eukary bacteria, and archaea) indicates that the cellular met lism has a scale-free topology, in which most metab substrates participate in only one or two reactions, b few, such as pyruvate or coenzyme A, participa dozens and function as metabolic hubs16,17. From randomsome to scale-freekind networks.of Probably the most process such Almost all cellular processes involve chemical important discovery of network theory was the realization that despite the remarkable diversity of networks as binding or unbinding, oftenin nature, in particular stoichiometric amounts. In their architecture is governed by a few simple that are common most networks of major addition, such processes haveprinciples direction and toshow conservation of mass. scientific and technological interest . For decades graph theory — field of mathematics that deals None of these properties are captured bythethe simple interaction graphs. In with the mathematical foundations of networks — fact there has been some criticism [2, 34] that simple modelled complex networks eitherthe as regular objects, graph models such as a square or a diamond lattice, or as completely fail to capture the most important ofapproach biological and as a was rooted networks in the randomaspects network . This influential work of two mathematicians, Paul Erdös, result lead to misleading or unimportant conclusions. Toofillustrate the difand Alfréd Rényi, who in 1960 initiated the study the ficulties in representing stoichiometric networks using simple undirected FEBRUARY 2004 | VOLUME 5 graphs consider |how one might go about representing a metabolic net- www.nature.com/reviews/gen work. The problem lies in deciding whether a node should be a reaction or a substance and thereby what an edge should be. In the literature various 9,10 13 104 2.3. STOICHIOMETRIC NETWORKS 19 approaches have been taken under the headings of substance, reaction and the less commonly used enzyme graphs [66, 12, 25]. A substance graph is constructed from nodes that correspond to the substances and an edge exists between two substances, A and B, if there exists a reaction where one reaction is a substrate and the other a product. A reaction graph is where the node corresponds to a reaction and an edge exists between two reactions if there exists a substance that is produced by one and consumed by anther. Finally an enzyme graph is where nodes represent enzymes and an edge exists if two enzymes catalyze a reaction that shares a substance. One troubling property of both substance and reaction graphs is that they are not unique, that is different reaction schemes can generate identical substance and reaction graphs. These representations are therefore lossy. A further problem with these representations is whether to include linking substrates such as ATP and NAD. These substances can cross-link distant pathways and their presence or absence from a graph can have profound effects on the structural characteristics of the resulting graph [66]. In the end the choice in these matters seems at times to be ad hoc and one wonders whether there is any sensible approach to represent cellular networks in a meaningful way using simple graphs. The same arguments apply equally to both protein and gene networks although there is little discussion of these limitations in the literature. Bipartite Graphs An improved graph model for representing cellular networks is the directed bipartite graph. Whereas a simple graph is made from one kind of node, a directed bipartite graph is made from two different kinds of nodes joined by simple directed edges. An important constraint on bipartite graphs is that like nodes cannot be connected by an edge. As a result, bipartite graphs can easily describe cellular networks by representing substances as one node type and reactions as the other node type. The constraint that like nodes cannot be connected works well in this case because it makes no sense to connect two reactions together or to connect two substances together, but it does make sense to connect a substance to a reaction. Formally, a bipartite graph is a graph whose nodes are separated into two 20 CHAPTER 2. GRAPHS AND NETWORKS disjoint (no overlap) sets, U and V such that every edge connects a node from V into U . Fig. 2.6 illustrates both an undirected and directed bipartite graph. Hypergraphs Graphs In text books on biochemistry, cellular pathways have almost always being represented using directed hypergraphs, Fig. 2.7. Hypergraphs are graphs where the edges, now called hyperedges, can have more than two end points – recall that in a simple graph, edges only have at most two end points. Hypergraphs are then clearly similar to bipartite graphs. Whereas in bipartite graphs the reaction node is explicit; in hypergraphs, the reaction is replaced by a hyperedge but both graph types can be considered equivalent. Stoichiometric networks can be represented easily using either hypergraphs or bipartite graphs. The edges in each case can be weighted to specify the stoichiometry while directedness can be used to indicate the direction that signifies the positive reaction rate. To depict a bimolecular reaction such as A C B ! C as a bipartite graph, four nodes are used, one node to depict the reaction, called a reaction node, and three other nodes to represent the species A, B and C , called the species nodes (Fig.2.8). In order to indicate the positive reaction rate, the bipartite graph is directed. The individual stoichiometries can be attached to the edges (called labeled edges) while the reaction rate law can be attached to the reaction node. In textbooks hypergraphs generally predominate as they tends to follow a visual convention used by chemists and biochemists. Occasionally bipartite graphs are also used, most notably the KEGG database uses bipartite graphs to display metabolic pathways, Fig. 2.10. 2.3. STOICHIOMETRIC NETWORKS Figure 2.5: A Small Protein-Protein Interaction Map. This image was taken from the STRING web site (Search Tool for the Retrieval of Interacting Genes/Proteins, http://string.embl.de/). The image displays a small segment of the protein interaction map centered around LEU3, the transcription factor that regulates genes involved in leucine and other branched chain amino acid biosynthesis. The number of lines between nodes indicates the number of lines of evidence that supports the interaction. Of interest are the genes shown in green (LEU1,LEU2,LEU4,ILV2,ILV3) related to leucine and valine biosynthesis with strong evidence supporting that claim. Such maps provide a useful snapshot of potential interactions and may highlight relationships that were not previously noted. 21 22 CHAPTER 2. GRAPHS AND NETWORKS Undirected Bipartite Graph Directed Bipartite Graph Figure 2.6: Bipartite graph containing two kinds of node. Figure 2.7: A directed hypergraph, commonly used in biochemistry textbooks to depict metabolic and signaling pathways. 2.3. STOICHIOMETRIC NETWORKS a) Reaction 23 b) A A C C B B Bipartite Graph Hypergraph Figure 2.8: Bipartite and hypergraph graph representing a bimolecular reaction, A C B ! C . The bipartite graph (a) contains two kinds of nodes, one type of node represents the molecular species (A, B, and C) while the other node type represents the reaction. In a bipartite graph, like nodes cannot connect to each other, that is species nodes can only connect to reaction nodes. The hypergraph, (b), uses hyperedges which have multiple end points. 24 CHAPTER 2. GRAPHS AND NETWORKS a) Reaction Scheme A+B C D C D B b) Hypergraph A C B D c) Bipartite Graph A D C B d) Substance Graph C A e) Reaction Graph D R1 R2 B R3 Figure 2.9: Five different representations for the same reaction scheme. Note that the substance and reaction graphs are not unique and similar graphs could be generated from different reaction schemes. 2.3. STOICHIOMETRIC NETWORKS Figure 2.10: Glycolysis Pathway depicted here from the KEGG database is a Bipartite Graph. The two kinds of nodes in this graph represent metabolites (e.g. Pyruvate) and reactions, for example the reaction catalyzed by pyruvate kinase. The reaction nodes are represented as rectangles containing the Enzyme Commission number of the enzyme and the metabolites by unfilled circles. 25 26 CHAPTER 2. GRAPHS AND NETWORKS 3 Stoichiometric Networks 3.1 Introduction Stoichiometry refers to the molar proportions of reactants and products in a chemical reaction. Given a hypothetical reaction such as: 3A C 4B ! 2C C D with reactants A and B and products C and D, the stoichiometry is indicated by the number of participating reactant and product molecules. Thus the stoichiometry for A is three, for B, four, for C , two and for D, one. See Chapter 1 of “Introduction to Kinetics for Systems Biology” (www. sysbiobooks.com) for a more detailed discussion of stoichiometry. 27 28 CHAPTER 3. STOICHIOMETRIC NETWORKS 3.2 Stoichiometry Matrix When describing multiple reactions in a network, it is convenient to represent the stoichiometries in a compact form called the stoichiometry matrix, N . This matrix is a m row by n column matrix where m is the number species and n the number reactions. The columns of the stoichiometry matrix correspond to the distinct chemical reactions in the network, the rows to the molecular species, one row per species. Thus the intersection of a row and column in the matrix indicates whether a certain species takes part in a particular reaction or not, and, according to the sign of the element, whether there is a net loss or gain of substance, and by the magnitude, the relative quantity of substance that takes part in that reaction. The elements of the stoichiometry matrix thus concern the relative mole amounts of chemical species that react in a particular reaction; it does not concern itself with the rate of reaction. For example, consider the simple chain of reactions which has five molecular species and four reactions. The four reactions are labeled, v1 to v4 . S1 v1 S2 v2 v3 S3 S4 v4 S5 The stoichiometry matrix for this simple system is given by: 2 6 N D 6 6 6 4 v1 1 1 0 0 0 v2 0 1 1 0 0 v3 0 0 1 1 0 v4 0 0 0 1 1 3 S1 7 S2 7 7 S3 7 5 S4 S5 Entries in the stoichiometry matrix are computed as follows. Given a species Si and reaction vj , the corresponding entry in the stoichiometry matrix at ij (row i and column j ) is given by the total stoichiometry for Si on the product side minus the total stoichiometry of Si on the reactant 3.3. MASS-BALANCE EQUATIONS 29 side. Thus, considering species S1 in reaction v1 , we note that the total stoichiometry on the product size is zero (no S1 molecules are formed on the product side) and the total stoichiometry of S1 on the reactant side is C1. Subtracting one from the other (0 1) we obtain 1, which is entered into the stoichiometry matrix. This rather long winded approach to computing stoichiometries avoids errors that arise when a species occurs as both reactant and product. For example, for a more complex reaction such as: 3A ! 2A C B the stoichiometry entry for A is .2 3/ that is 1, because the stoichiometry for A on the product side is 2 and on the reactant side is 3. 3.3 Mass-Balance Equations According to the law of conservation of mass, any observed net change in the amount of a species must be due to the difference between the inward and outward flows from the species pool (Figure 3.1). Inflows dX=dt D P X Outflows Inflow P Outflows Figure 3.1: Mass Balance: The rate of change in species X is equal to the difference between the sum of the inflows and the sum of the outflows The equations which describe such flows are called the mass balance equations and are central to building mathematical models of cellular networks: X X dSi D Inflows Outflows (3.1) dt 30 CHAPTER 3. STOICHIOMETRIC NETWORKS A reaction rate which contributes to a flow is given by the term, cij vj , where cij is the stoichiometry coefficient and vj the reaction rate. Therefore the balance equation can be written (under constant volume conditions) in a more formal way as: X dSi D cij vj dt (3.2) j that is the sum over all flows into and out and of a particular species pool. In the equation, Si is the concentration of species i , cij is the stoichiometric coefficient for species i with respect to reaction j and vj is the rate of reaction for reaction j . Stoichiometric coefficients for reactants are negative and for products, positive. Consider a simple linear chain of reactants from S1 to S5 shown in Figure 3.2. The mass-balance equations for this simple system can be written as shown in equation 3.3. S1 v1 S2 v2 S3 v3 S4 v4 S5 Figure 3.2: Simple Straight Chain Pathway. dS1 D dt v1 dS3 D v2 dt v3 dS5 D v4 dt dS2 D v1 dt v2 dS4 D v3 dt v4 (3.3) Each species in the network is assigned a mass-balance equation which accounts for the flows into and out of the species pool. For a branched system such as the following: 3.3. MASS-BALANCE EQUATIONS 31 v2 v1 S1 v4 v3 S2 v5 Figure 3.3: Multi-Branched Pathway. the mass-balance equations are given by: dS1 D v1 dt v2 v3 dS2 D v3 dt v4 v5 Finally consider a more complex pathway such as: ACX X CY Z v1 ! 2X v2 ! Z v3 ! Y CB This example is more subtle because we must be careful to take into account the stoichiometry change between the reactant and product side in the first reaction (v1 ). In reaction v1 , the overall stoichiometry for X is C1 because two X molecules are made for every one consumed. Taking this into account the rate of change of species X can be written as: dX D dt or more simply as v1 v1 C 2v1 v2 v2 . The full set of mass-balance equations can 32 CHAPTER 3. STOICHIOMETRIC NETWORKS therefore be written as: dA D dt v1 dY D v3 dt v2 dX D v1 dt v2 dZ D v2 dt v3 dB D v3 dt It is therefore fairly straight forward to derive the balance equations from a visual inspection of the network. Many software tools exist that will assist in this effort by converting network diagrams, either represented visually on a computer screen or provided as a text file listing the reactions in the network. 3.4 The System Equation Equation 3.2, which describes the mass balance equation, can be reexpressed in terms of the stoichiometry matrix to form the system equation. dS D Nv dt (3.4) where N is the m n stoichiometry matrix and v is the n dimensional rate vector, whose i th component gives the rate of reaction i as a function of the species concentrations. Looking again at the model depicting the simple chain of system equation can be written down as: 2 3 2 1 0 0 0 v 6 1 7 1 0 0 7 6 1 6 dS 6 v2 0 1 1 0 7 D Nv D 6 6 7 4 v3 dt 4 0 0 1 1 5 v4 0 0 0 1 reactions, the 3 7 7 5 3.4. THE SYSTEM EQUATION 33 If stoichiometry matrix is multiplied into the rate vector, the mass-balance equations show earlier (3.3) are recovered. As already stated, the stoichiometry matrix represents the connectivity of the network and contains information on the network’s structural characteristics. These characteristics fall into two groups, relationships among the species and relationships among the reaction rates. Each will be considered in turn. Let us first consider the relationships among the species, that is relationships between the rows of the stoichiometry matrix. Exercises 1. Explain the difference between the terms: Stoichiometric amount, Stoichiometric coefficient, rate of change (dX=dt) and reaction rate (vi ). 2. Determine the stoichiometric amount and stoichiometric coefficient for each species in the following reactions: A !B ACB !C A !B CC 2A ! B 3A C 4B ! 2C C D ACB !ACC A C 2B ! 3B C C 3. Derive the set of differential equations for the following model in terms of the rate of reaction, v1 , v2 and v3 : 34 CHAPTER 3. STOICHIOMETRIC NETWORKS v1 A ! 2B v2 B ! 2C v3 C ! 4. Derive the set of differential equations for the following model in terms of the rate of reaction, v1 , v2 and v3 : v1 A!B v2 2B C C ! B C D v3 D !C CA 5. Write out the stoichiometry matrix for the networks in question 3 and 4 6. Derive the stoichiometry matrix for each of the following networks. In addition write out the mass-balance equations in each case. (a) v1 B D A v2 C v4 B D A v2 C v4 v3 v1 v3 A B v2 C v4 A B v2 (b) v1 v3 v1 v3 C v4 3.4. THE SYSTEM EQUATION 35 (c) v1 ACX !B CY v3 B !C DCY X CW v5 !X v7 ! 2Y v2 B CX !Y v4 C CX !DCY v6 X !Y 2Y v8 !X CW 7. A gene G1 expresses a protein p1 at a rate v1 . p1 forms a tetramer (4 subunits), called p14 at a rate v2 . The tetramer negatively regulates a gene G2 . p1 degrades at a rate v3 . G2 expresses a protein, p2 at a rate v9 . p2 is cleaved by an enzyme at a rate v4 to form two protein domains, p21 and p22 . p21 degrades at a rate v5 . Gene G3 expresses a protein, p3 at a rate v6 . p3 binds to p22 forming an active complex, p4 at a rate v10 , which can bind to gene G1 and activate G1 . p4 degrades at a rate v7 . Finally, p21 can form a dead-end complex, p5 , with p4 at a rate v8 . (a) Draw the network represented in the description given above. (b) Write out the differential equation for each protein species in the network in terms of v1 ; v2 ; : : :. (c) Write out the stoichiometric matrix for the network. 36 CHAPTER 3. STOICHIOMETRIC NETWORKS 4 Flux Balance Laws The study of metabolism, that is the chemical reactions that are involved in breaking down nutrients and building up more complex molecules, was one of the earliest topics of study in biochemistry. Glycolysis, which concerns the breakdown of glucose in to pyruvate, was one of the first metabolic pathways to be investigated during the early part of the 20th century. In the period since, numerous other pathways have been uncovered. One of the most widely studied organisms, E. coli, has been shown at last count to have at least 918 enzymes catalyzing a wide range of metabolic functions [31]. In any particular pathway, enzymes catalyze the conversion of substances from one form to another. The rate of conversion is often called the flux which is simply another word for a reaction rate but refers specifically to the reaction rate through a pathway. Figure 4.1 shows a simplified metabolic map from Corynebacterium glutamicum [43]. The numbers next to the reaction steps indicate the flux through each step and shows how the flow of mass through the different metabolic pathways are distributed. The network topology has a significant bearing on how flux is distributed 37 38 CHAPTER 4. FLUX BALANCE LAWS through a pathway. This chapter will focus on a number of areas related to this topic. Figure 4.1: Metabolic Map of Corynebacterium glutamicum central metabolism adapted from [43]. 4.1. FLUX BALANCE LAWS 39 4.1 Flux Balance Laws The steady state of a system is defined when the rates of change of all species are zero, Nv D 0 In addition, to distinguish the steady state from thermodynamic equilibrium it is also assumed that at steady state there is a net flow of mass between the system boundaries of the network. Box 2.2 Steady State - Recap The steady state is defined when all dSi =dt are equal to zero while one or more reaction rates are non-zero. dS D Nv D 0 dt vi ¤ 0 By illustration, let us look at the very simple branched pathway shown in Figure 4.2. The stoichiometry matrix for this pathway is: N D Œ1 1 1 and the balance equation at steady state is given by: 2 3 v1 1 1 1 4 v2 5 D 0 v3 The mass balance equation for this system at steady state is given simply by v1 . v2 v3 D 0 40 CHAPTER 4. FLUX BALANCE LAWS v2 v1 S v3 Figure 4.2: Simple branched pathway. Flux Distributions A common need by metabolic engineers is to know the flux distribution throughout a reaction network. One approach to obtain this information is to measure every individual flux in the network. This can be done, at least in principle, by measuring the consumption or turnover rates of all the metabolites in the network. The easiest rates to measure are on the reaction steps that connect directly to the external environment, such steps might be involved in nutrient and oxygen consumption, carbon dioxide, ethanol or biomass production, quantities that can be measured experimentally. However, the internal fluxes that are deep inside the metabolic networks are much more difficult to measure, although the use of 13 C labeled substrates has made such measurements more accessible. In practice it is extremely difficult to measure every reaction rate directly, instead the steady state balance equations can be exploited to reduce the number of necessary flux measurements. To illustrate, the balance equation for the simple branched pathway shows us that only two rates actually need be measured because the third can be computed. For example, if v2 and v1 were measured, the third rate, v3 , could be calculated from the balance equation v3 D v2 v1 , taking note that the pathway must be in steady state. For an experimentalist this is a great benefit because it reduces the number of measurements that need to be made. One of the practical aims of flux balance analysis is to devise methods that allow all the fluxes in a pathway to be determined with the minimum effort. To devise such methods however, a number of questions need to be answered. For example, are there a minimum number of fluxes that can be measured experimentally to fully determine all fluxes in a pathway? In the 4.2. DETERMINED SYSTEMS 41 simple branch pathway (Figure 4.2) a minimum of two flux were required. Alternatively it may not be possible to measure even the minimum number, in such cases can a best estimate for the flux distribution in a pathway be computed? The following sections will consider approaches to answering all these questions, particularly for arbitrary networks where systematic approaches are required. 4.2 Determined Systems Consider the more complicated pathway shown in Figure 4.3. The stoi- v1 S1 v2 S2 v4 v3 v5 S3 v6 Figure 4.3: Complex branched pathway. chiometry matrix for this pathway is: v1 S1 1 N D S2 4 0 S3 0 2 v2 1 1 0 v3 0 1 0 v4 1 0 1 v5 0 1 1 v6 3 0 0 5 1 (4.1) 42 CHAPTER 4. FLUX BALANCE LAWS which corresponds to the following three balance equations: v1 v2 C v4 D 0 v2 v3 C v5 D 0 v6 v4 v5 D 0 Assume we wish to determine all the fluxes through this simple pathway. Is there a minimum number of fluxes we can measure, from which we can compute the remaining? Since there are three equations and six unknowns, at least three of the fluxes must to be measured so that number of unknowns can be reduced to three. However, of the six, which of the three fluxes should be measured? For example, measuring v1 , v2 and v4 , will not help because it is not possible to compute the others from these fluxes. The problem arises because there are dependencies among the columns of the stoichiometry matrix. In order to answer this question let us divide the fluxes into two groups, call one the measured fluxes (JM ) and the other the computed fluxes (JC ). The computed fluxes will be calculated from some combination of the measured fluxes. Consider the system equation at steady state: Nv D 0 Let us apply row reduction to the system equation until N is in reduced echelon form (See Box 3.2). Since the right-hand side is zero, it remains unchanged in the process. These operations lead to: I M vD0 (4.2) 0 0 The process is likely to result in column as well as row exchanges and as a result the linearly independent columns will move to the left partition forming the identity matrix and the linearly dependent columns will be found in the partition corresponding to M . Let us partition the v vector to correspond to the partitioning in the echelon matrix, so that: I M v1 D0 0 0 v2 4.2. DETERMINED SYSTEMS 43 which when multiplied out gives v1 D M v2 . This tells us that the flux terms in the v1 partition correspond to the computed fluxes, JC , and v2 to the measured fluxes, JM , that is JC D M JM . This relation describes a set of computed fluxes, JC , as a function of a set of measured fluxes, JM via a transformation matrix, M . To follow conventional notation, the term M will be renamed to K0 (that is M D K0 ) so that JC D K0 JM : (4.3) and equation 4.2 can be reexpressed as: I 0 K0 0 JC D0 JM (4.4) Returning to the example shown in Figure 4.3, let us apply a series of elementary operations to the stoichiometry matrix to reduce the stoichiometry to its reduced echelon form (Equation 4.4): 1. Start with the stoichiometry matrix. 2 1 4 0 0 1 1 0 0 1 0 1 0 1 0 1 1 3 0 05 1 0 1 0 1 0 1 0 1 1 3 0 05 1 1 0 1 1 1 1 3 0 05 1 1. Multiply the 3rd row by -1. 2 1 4 0 0 1 1 0 2. Add the 2nd row to the 1st row. 2 1 4 0 0 0 1 0 1 1 0 44 CHAPTER 4. FLUX BALANCE LAWS Box 2.1 Echelon Forms - Recap There are two kinds of matrices that one frequently encounters in the study of linear equations. These are the row echelon and reduced echelon forms. Both matrices are generated when solving sets of linear equations. The row echelon form is derived using forward elimination and the reduced echelon form by Gauss-Jordan Elimination. A row echelon matrix is defined as having the following characteristics: 1. All rows that consist entirely of zeros are at the bottom of the matrix. 2. In each non-zero row, the first non-zero entry is a 1, the leading one. 3. The leading 1 in each row is to the right of all leading 1’s above it. This means there will be zeros below each leading 1. The following three matrices are examples of row echelon forms: 2 3 3 2 1 4 3 0 1 5 3 0 1 1 0 40 0 1 75 40 1 7 25 0 1 0 0 0 0 0 0 0 0 1 The reduced echelon form has one additional characteristic: 4. Each column that contains a leading one has zeros above and below it. The following three matrices are examples of reduced echelon forms: 2 3 2 3 1 0 4 0 1 0 0 1 0 0 40 1 1 75 40 1 0 5 0 1 0 0 0 0 0 0 0 1 Sometimes the columns of a reduced echelon can be ordered such that each leading one is immediately to the right of the leading one above it. This will ensure that the leading 1’s form an identity matrix at the front of the matrix. The reduced echelon form will therefore have the following general block structure: I A 0 0 It is always possible to reduce any matrix to its echelon or reduced echelon form by an appropriate choice of elementary operations. The function rref() implemented in many math applications will generate a reduced row echelon. 4.2. DETERMINED SYSTEMS 45 3. Add the 3rd row times -1 to the 1st row. 2 1 4 0 0 0 1 0 1 1 0 0 0 1 3 1 05 1 0 1 1 4. And finally, exchange the 3rd and 4th columns. 2 1 4 0 0 0 1 0 0 0 1 1 1 0 3 1 05 1 0 1 1 These operations lead to the following reduced echelon matrix (leading ones are shown in red): v1 Reduced Echelon D 2 4 1 0 0 v2 v4 v3 v5 v6 0 1 0 0 1 1 0 0 1 1 1 0 3 1 0 5 1 (4.5) Note that during the reduction, the third and forth columns were exchanged. The partition that holds the identity matrix marks the computed fluxes and the right-hand partition which holds the K0 matrix marks the measured fluxes. Thus the computed fluxes correspond to the independent columns and the measured fluxes to the dependent columns. If we extract the K0 partition, equation 4.3 can be used to relate the computed to the measured fluxes as follows: 2 3 2 32 3 v1 1 0 1 v3 4 v2 5 D 4 1 5 4 1 0 v5 5 (4.6) v4 0 1 1 v6 Or v1 D v3 v6 v2 D v3 v5 v4 D v6 v5 46 CHAPTER 4. FLUX BALANCE LAWS This shows that in principle only v3 , v5 and v6 need be measured from which all remaining rates can be calculated. A visual inspection of the pathway in Figure 4.3, will reveal this to be true, thus, v4 can be computed from v5 and v6 ; v2 can be computed from v5 and v3 ; and lastly, v1 can be computed from v2 and v4 . Software tools such as PySCeS [39] can be used to automatically compute the K0 matrix along with an appropriately reordered stoichiometry matrix. In summary, the method outlined above enables us to derive the minimum set of fluxes to measure in order to determine all fluxes in an arbitrary pathway. Linear Algebra of Determined Systems An alternative but related approach to derive the computed from the measured fluxes is as follows. Let us assume we can reorder the columns of the stoichiometry matrix so that all the dependent columns are moved to the left-side of the matrix and the independent columns are moved to the right-side of the matrix. Note this is the opposite order to the columns in equations 4.5 and 4.2. Furthermore, let us also assume that the rows have also been reordered so that the independent rows are moved to the top and the dependent rows to the bottom of the matrix. These prerequisites means that the stoichiometry matrix has a partitioned structure shown in Figure 4.4. The partition, NR represents the set of independent species and at steady state: JM NR D0 JC NR can be partitioned as shown in Figure 4.4: JM NDC NIC D0 JC where NDC represents the set of linearly dependent columns and NIC the set of linearly independent columns. To reemphasize again, the order of the computed and measured fluxes are exchanged compared to that shown in equation 4.4. 4.2. DETERMINED SYSTEMS 47 n0 m0 NDC NIC m0 m N= N0 NR n Figure 4.4: Partitioned Stoichiometry Matrix: n D number of reactions; m D number of species; NDC D partition of linearly dependent columns; NIC D partition of linearly independent columns; NR D reduced stoichiometry matrix; N0 partition of linearly dependent rows. Multiplying out this equation gives NDC JM C NIC JC D 0. This equation can be rearranged and both sides multiplied by the inverse of NIC to obtain: JC D .NIC / 1 NDC JM (4.7) This result gives us a relationship between the computed and measured fluxes. The term .NIC / 1 NDC can be replaced by, K0 , so that JC D K0 JM . This equation is identical to equation 4.3 but offers an alternative approach to computing K0 and is the method often cited in the literature [60, 14]. The inverse of NIC is guaranteed to exist because NIC is square and all rows and columns are guaranteed by construction to be linearly independent. The equation, K0 D .NIC / 1 NDC can be rearranged into the follow- 48 CHAPTER 4. FLUX BALANCE LAWS ing form: NDC NIC I K0 D0 (4.8) or more simply: NR K D 0 (4.9) This shows that the K0 matrix is related to the null space of the reordered stoichiometry matrix. We will return to the interpretation of equation 4.9 in the next chapter. Examples The following examples illustrate the application of equation 4.7. a) Consider the branched pathway shown in Figure 4.3. The columns of the stoichiometry matrix can be reordered so that the linearly dependent columns (NDC ) are first, followed by the linearly independent columns (NIC ). Row reduction to the reduced echelon form (equation 4.4) can be used to determine which are the linearly independent and dependent columns (equation 4.5). In the stoichiometry matrix below, the partitions have been exchanged so that the linearly independent columns are first, followed by the linearly dependent columns: v3 v5 N D 2 4 0 1 0 0 1 1 v6 0 0 1 v1 1 0 0 v2 1 1 0 v4 3 1 0 5 1 From the reordered matrix, the NDC and NIC partitions can be extracted from which the dependency relations can be derived by applying equation 4.7. 2 K0 D 4 1 0 0 1 1 0 3 1 0 5 1 12 4 0 1 0 0 1 1 3 2 0 0 5D4 1 1 1 0 0 1 1 The derived K0 corresponds to the same result found in equation 4.6. 3 1 0 5 1 4.2. DETERMINED SYSTEMS b) 49 A more complex example of a pathway is shown in Figure 4.5. The v5 E B v2 v1 v4 v6 A v3 D v8 F v 9 C v7 Figure 4.5: Complex Network incorporating two input fluxes and two output fluxes, coupled internally by multiple branches and one reaction that exhibits non-unity stoichiometry (v4 ). stoichiometry matrix for this network is given by: A B C N D D E F 2 6 6 6 6 6 6 4 v1 1 0 0 0 0 0 v2 1 1 0 0 0 0 v3 1 0 1 0 0 0 v4 0 1 0 2 1 0 v5 0 0 0 0 1 0 v6 0 1 1 0 0 1 v7 0 0 1 1 0 0 v8 0 0 0 1 0 0 and the balance equations by: v2 v3 D 0 v3 C v6 v5 v1 v4 v6 D 0 v7 D 0 2v4 C v7 v8 D 0 v4 D 0 v6 v9 D 0 v2 v9 0 0 0 0 0 1 3 7 7 7 7 7 7 5 50 CHAPTER 4. FLUX BALANCE LAWS Let us reorder the columns of the stoichiometry matrix so that the linearly dependent columns are on the left and linearly independent columns are on the right (Figure 4.4). Note that there are no dependent rows in the network so that there is no N0 partition in the reordered matrix. Reordering can be accomplished by carrying out a row reduction on the matrix to reduced echelon form (equation 4.2) and recording the column changes in the stoichiometry matrix. Note that the partitions must be exchanged to match the structure shown in equation 5.1. The simplest reordering is given by the following stoichiometry matrix: A B C N D D E F 2 6 6 6 6 6 6 4 v7 0 0 1 1 0 0 v8 0 0 0 1 0 0 v9 0 0 0 0 0 1 v1 1 0 0 0 0 0 v2 1 1 0 0 0 0 v3 1 0 1 0 0 0 v4 0 1 0 2 1 0 v5 0 0 0 0 1 0 v6 0 1 1 0 0 1 3 7 7 7 7 7 7 5 The K0 matrix can be computed from the null space (4.9) of this reordered matrix: 2 1 0 0 3 6 v8 6 6 6 v9 6 6 6 v1 6 6 K D v2 6 6 6 v3 6 6 6 v4 6 6 v5 6 4 v6 0 1 0 0 0 1 0:5 0:5 0 0:5 0:5 1 0 1 0:5 0:5 0 0:5 0:5 0 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 0 1 v7 1 0 v1 2 6 v2 6 6 6 v3 6 6 K0 D 6 v4 6 6 v5 6 4 v6 0:5 0:5 0 3 0:5 0:5 1 0 1 0:5 0:5 0 0:5 0:5 0 0 1 7 7 7 7 7 7 7 7 7 7 5 1 0 From the K0 matrix the relation between the measured and computed fluxes can be determined. From the reordering of the stoichiometry matrix it should be apparent that the measured fluxes are v7 , v8 , and v9 , that 4.2. DETERMINED SYSTEMS 51 is a minimum of three fluxes must be measured in order to fully determine the remainder. Of the three measured fluxes, v7 is the most problematic because it is an internal flux which experimentally would not be easy to determine. It is however possible persuade the algorithm to set the measured fluxes by assigning the right most columns of the stoichiometry matrix to those flux which are considered the easiest to measure. Edge fluxes should me moved to the right-hand edge of the stoichiometry matrix prior to carrying out the row reduction, this will ensure that a solution that uses the edge fluxes can be derived. It is however not guaranteed that the measured fluxes will be dominated by the edge fluxes especially if it is not possible to determine the internal fluxes from the edge fluxes. This is true in the case of complex branched model, Figure 4.3 where knowing the input and output edge fluxes is not sufficient to determine all the internal fluxes. In the case of the more complex pathway, Figure 4.5, it is possible move three of the edge fluxes (v5 , v8 and v9 ) to the right-hand side of the stoichiometry matrix. These three fluxes are sufficient to calculate all fluxes inside the pathway. The stoichiometry matrix can be reordered as follows: A B C N D D E F 2 6 6 6 6 6 6 4 v5 0 0 0 0 1 0 v8 0 0 0 1 0 0 v9 0 0 0 0 0 1 v6 0 1 1 0 0 1 v2 1 1 0 0 0 0 v3 1 0 1 0 0 0 v4 0 1 0 2 1 0 which yields the following K0 matrix from the null space: v6 v2 v K0 D 3 v4 v1 v7 2 6 6 6 6 6 6 4 0 1 2 1 1 2 0 0 1 0 1 1 1 1 1 0 0 0 3 7 7 7 7 7 7 5 v1 1 0 0 0 0 0 v7 0 0 1 1 0 0 3 7 7 7 7 7 7 5 52 CHAPTER 4. FLUX BALANCE LAWS In turn this gives the dependency equations using equation 4.3: v6 D v9 v2 D v5 C v9 v3 D v8 v9 2 v5 v4 D v5 v1 D v8 v5 v7 D v8 2 v5 In summary, measuring only v5 , v8 and v9 allows us to completely determine all the fluxes in the network. Unfortunately in real systems the internal structure of the network will be much more complex and will include many more degrees of freedom. This means that in many cases there will be insufficient information to fully determine the internal fluxes. Such cases are called underdetermined systems and alternative strategies must be used to gain access to the unknown fluxes. Two common strategies to the study of underdetermined systems include flux balance analysis and metabolic flux analysis. Flux balance analysis relies on linear programming while metabolic flux analysis uses 13 C -labeled substrates to estimate fluxes. 4.3 Flux Balance Analysis The previous section described how one can determine the set of computed and measured fluxes and how to calculate one set from the other. It assumed that it was possible to measure all the measured fluxes. However it is often the case that experimentally it is very difficult to measure all the required measured fluxes. In this situation, the problem becomes underdetermined and alternative strategies are required to determine the fluxes in a pathway. One method is to use linear programming. By its nature, linear programming only gives an estimate of the fluxes and predictions based on linear programming should be supported by additional measurements, however the approach has proved to be popular in the metabolic community [41]. 4.3. FLUX BALANCE ANALYSIS 53 Linear Programming Linear programming has it’s origins during the 1940s and whose development was motivated by a need during wartime to solve complex planning problems. Once developed, the method was rapidly taken up by many private industries as a means to determine the optimal allocation of a finite set of resources given an objective and a set of constraints. Applications in industry cover a wide range of areas, including airline crew scheduling, stock and bond portfolio selection and oil refining and blending. In the last few decades linear programming has also been employed as a means to estimate the optimal allocation of fluxes in a metabolic pathway [67, 68, 16, 32]. Linear programming is an optimization method that requires two inputs, a linear objective function that is generally a sum of terms that contains weighted measurable elements from a metabolic model and a set of linear constraints. The maximizing linear programming problem can be expressed by the relations shown in equation 4.10. Maximize: Z D c1 x1 C c2 x2 C D c T x Subject to: a11 x1 C a12 x2 C C a1n xn b1 a12 x1 C a22 x2 C C a2n xn b2 :: : am1 x1 C am2 x2 C C amn xn bm Or: Ax b where all: xi 0 (4.10) There are a number of algorithms that can be used to solve linear programming problems, but by far the most popular is the simplex method – not to be confused with the simplex method developed by Nelder and Mead 54 CHAPTER 4. FLUX BALANCE LAWS for solving nonlinear optimization problems. The simplex method can be motivated by a simple example. Consider a pharmaceutical company that manufactures two drugs, say x and y, from two genetically engineered organisms, A and B. Let us assume that organism A can produce at maximum 4 kg of drug x per day and organism B a maximum of 2 kg of y per day. Let us also assume that the factory can only process a total of 5 kg of any drug per day due to packaging equipment limitations. If the company can make a profit of $100 per kg for drug x and a profit of $150 per kg for drug y, what is the optimal rate at which each drug should be manufactured in order to maximize profit? This problem is sufficiently small that it can be easily solved manually. To maximize profit, it would be prudent to first produce the maximum amount of most profitable drug first, y, then to use what ever spare capacity remains in the packaging department to manufacture drug x. This would mean producing 2 kg per day of drug y, which leaves 3 kg capacity left in the packaging department to produce 3 kg per day of drug x. Therefore the total profit for this scenario is 2 150 C 3 100 D $600. The problem of drug manufacture allocation can be easily expressed as a linear programming problem. For example, the objective function for the problem is to maximize profit, that is to maximize: Maximise: Z D $100 x C $150 y The constraints on the problem can also be easily expressed. For example, the quantity of drug manufactured cannot be negative, that is: x 0 and y 0 In addition, the problem states that a maximum of 4 kg of x can be manufactured per day and a maximum of 2 kg of y per day, that is: x 4 and y 2 Finally, the packaging department can only process a maximum of 5 kg per day, that is: xCy 5 4.3. FLUX BALANCE ANALYSIS 55 This problem can be reexpressed in graphical form as shown in Figure 4.6. The figure plots all the linear constraints that define the problem, including, x 4, y 2 and x C y 5. The limits of the object function is indicated by the hashed line. Points where two or more constraints intersect are called cornerpoints or vertices. Figure 4.7 illustrates the feasible solu- x =0 y 6 5 4 x+y =� 5 x =� 4 3 24 y =� 2 3 1 0 Cornerpoint 2 1 2 3 1 4 5 6 y =0 x Figure 4.6: Linear Programming: Constants displayed as edges on a graph for the drug manufacturing problem. tion bounded by the constraints and the maximum value of the objective function. The simplex method works by traversing the cornerpoints one by 1 one. The method first starts at one of the cornerpoints, say cornerpoint and then attempts to move to an adjacent cornerpoint which yields a better value for the objective function. If the method is unable to move to a better objective function it stops and reports the last cornerpoint as the optimal solution. For example, the value of the objective function at cornerpoint 1 is 400 dollars. An adjacent cornerpoint is cornerpoint . 2 The value of the objective function at this point is $550. Since the objective function at 56 CHAPTER 4. FLUX BALANCE LAWS the new cornerpoint is larger, the method moves to this cornerpoint. From the second cornerpoint the method moves to the next adjacent cornerpoint, 3 3 is cornerpoint . The value of the objective function at cornerpoint 2 $600. This again is larger than the value at cornerpoint . Once again, 4 the method moves to the next adjacent cornerpoint, cornerpoint . The 4 is $300 which is less that the value value of the objective function at 3 Since there are no other cornerpoints to traverse, the method stops at . 3 and assigns the optimal value at $600 on cornerpoint . When a single point is located it represents a unique solution. However, it is possible for optimum solutions to lie on a line that joins two cornerpoints, that is two cornerpoints yield the same value for the objective function. In higher dimensions, optima may lie on hyperplanes connecting multiple cornerpoints. In such situations the solution is termed degenerate because there are now an infinite number of optimal solutions and other non-quantifiable criterion may be used to judge the ‘best’ solution. For example, a degenerate solution may indicate that two different combinations of drug x and y are equally profitable. However, one of the drugs may have toxicity issues in which case the optimum with the lowest level of this drug is better. Another important aspect that Figure 4.7 illustrates what would happen to the optimal solution if the constraints change. This question leads to the idea of sensitivity and what are called shadow prices. A shadow price is the change in the optimal solution if a constraint is changed by one unit. For example what would happen to the optimal solution if the manufacture of drug y were to be increased from 2 to 3 kgs per day? Sensitivity analysis can answer these questions and provide additional information on interpreting the optimal solutions and to gauge how robust the solutions are to the constraints and/or objective function. The drug manufacturing example was a relatively simple problem and could be solved without recourse to the simplex method. For problems with more variables the number of cornerpoints rises considerably. In addition, rather than being a simple two dimensional problem real problems are invariable hyper-dimensional. Linear programming is therefore rarely done by hand, instead software is employed to find solutions. Given the popularity of linear programming in general, there is a very wide range of software tools available, including well known tools such as Excel, Mat- 4.3. FLUX BALANCE ANALYSIS 6 57 y 5 Infeasible solution 4 3 24 Optimum Cornerpoint 3 1 Feasible Region 0 1 2 2 3 1 4 5 6 x Feasible solution Figure 4.7: Linear Programming: Area within the confinement of the constraints is marked as the feasible region. All potential solutions to the problem reside in this region. Linear Programming attempts to locate the optimum solution within this region given an objective function. The simplex method moves from cornerpoint (vertex) to cornerpoint searching for the maximum value of the objective function. In this problem, the third cornerpoint indicates the optimal solution. lab and Mathematica or more specialized tools such as LINDO (http: //www.lindo.com) or CPLEX (http://www.ilog.com). However there is also a wide range of equally good open source alternatives. Probably the most notable of these include the GNU Linear Programming Kit (GLPK) or better still the lp_solve library by Peter Notebaert. lp_solve is notable for a number of reasons, its licence has less restrictions (LGPL) and there are language bindings that allows lp_solve to be easily called from many different computer languages, including for example, Java, Delphi, C#, Matlab, Excel, Python and SciLab. Both GLPK and lp_solve have a very active community forums. One enterprizing individual (Henri Gourvest) 58 CHAPTER 4. FLUX BALANCE LAWS has written an excellent graphical front end to lp_solve, called the LPSolve IDE. This front-end makes it very easy to specify the objective function and constraints and solve the linear programming problem with the press of a single button. Further discussion of LPSolve IDE will be given in the next section. Objective Functions The choice of objective function is critical for the linear programming approach to be effective and there has been much discussion in the literature on what a suitable objective function might be for biological systems. For example, one of the earliest reported efforts to use linear programming in metabolic modeling was by Fell and Small [16]. These authors investigated fat synthesis in adipose tissue, and used a variety of objective functions, include minimizing the amount of glucose used per triacylglycerol formed or maximizing the generation of NADH from the pentose pathway. The authors subsequently used the model to study how the efficiency of conversion was affected by the availability of ATP. One of the early attempts to determine the flux distribution in E. coli was conducted by Palsson’s group [53, 64, 65]. An objective function used in this work involved maximizing the production of biomass, the assumption being that growing single celled organisms have been selected for growth (Unlike cells in multicellular organism where the objective function is more obscure). In order to relate biomass to a metabolic map, the authors obtained data [27, 65] that described how 1 gram of E. coli biomass was derived from various metabolic precursors and cofactors (See Table 4.1). The objective function used to optimize the flux distribution was then defined as the sum of all the fluxes that produce each of the precursors, weighted by the amount of precursor required. Thus, a suitable objective function may be written as: Z D 41:257 vATP 3:547 vNADH C 18:225 vNADH C 0:205 vG6P C : : : The use of this objective function yielded results which overdetermined 4.3. FLUX BALANCE ANALYSIS Metabolite ATP NADH NADPH G6P F6P R5P E4P T3P 3PG PEP PYR AcCoA OAA AKG 59 Demand (mmol) 41.2570 -3.5470 18.2250 0.2050 0.0709 0.8977 0.3610 0.1290 1.4960 0.5191 2.8328 3.7478 1.7867 1.0789 Table 4.1: Number of mmoles of precursors and cofactors that are required to yield 1 gram of biomass of E. coli [27, 65] the experimentally determined glucose yield. This suggested that the stoichiometry model was missing an important component. In order to correct the discrepancy, the authors introduced ATP maintenance into the calculation since cells will use energy not just to achieve growth but also to maintain other non-growth functions such as maintenance of transmembrane gradients and cellular motility. The addition of ATP maintenance into the calculation yielded better estimates for glucose yield. Another but quite different example of an objective function relates to the flux balance analysis of the mycolic acid pathway in Mycobacterium tuberculosis. In the work by Raman et al [46], the authors selected an objective function based on maximizing the different proportions of mycolates that make up the cell wall. Given that cell wall composition is important to the 60 CHAPTER 4. FLUX BALANCE LAWS structural integrity of the cell wall, optimal production of mycolates would appear to be an appropriate optimum for the organism to achieve. With the objective function set, linear programming then requires a set of linear constraints that will restrict the limits of the objective function and allow one to find the maximum. Flux Balance Constraints In addition to an objective function, linear programming also requires a set of constraints to limit the scope of the solution space. Of these, the most important group are the steady state constraints on the pathway, that is N v D 0. There is one restriction on the steady state constraints, all rates must be positive. This means that reversible reactions must be split into their separate forward and reverse reactions. In addition to the steady state constraints, other constraints can be added to the mix. The most common of these include constraints on the values of the external fluxes. Such fluxes, which might include nutrient uptake or oxygen consumption, will most likely be known and will contribute an important source of constraints on the model. Other constraints include thermodynamic and capacity constraints. Capacity constraints impose upper bounds on a flux (0 vi bi ). Such limits can be set by the Vmax of the enzyme catalyzing the reaction. Sometimes lower bounds may be set so that in general capacity constraints are set with the inequality (ai vi bi ). In addition some reaction steps under specific growth conditions may be absent all together due to catabolite repression, the rates through such reactions can be constrained to zero. Thermodynamic constraints are more difficult to set and require the use of plausible ranges for metabolite levels. Thermodynamic constraints attempt to impose flux directions that are consistent with changes in the Gibb’s free energy across each reaction which naturally require knowledge of metabolite levels (ref). Finally, there will sometimes be available internal fluxes that have been measured. This means that such reactions have specific rates and can be added to the list of model constraints. 4.3. FLUX BALANCE ANALYSIS 61 Through a judicious use of constraints it is possible to reduce the solution space and thus improve the reliability of the optimized solution. In summary, a linear programming problem for estimating the fluxes in a metabolic pathway takes the form: Maximize: Z D ci vi C cj vj C Subject to: N v D 0 where: v 0 (4.11) Example Consider again the network shown in Figure 4.5. Let us assume that only v5 and v1 have been measured. Clearly there is insufficient information to compute the remaining fluxes in the pathway without recourse to linear programming. To solve the problem using linear programming, an objective function and a set of constraints will be required. For illustration, the model will be optimized for maximum production of biomass and for the sake of argument let us assume that fluxes v8 and v9 contribute to biomass. The objective can then be some weighted sum of the fluxes that contribute to biomass, that is Z D c1 v8 C c2 v9 . As for the constraints, the most important are the steady state conditions on each of the nodes in the network. In this case the steady state constraints include: v1 v2 v3 D 0 v2 v6 v4 D 0 v3 C v6 v7 D 0 2v4 v8 C v7 D 0 v5 v4 D 0 v6 v9 D 0 62 CHAPTER 4. FLUX BALANCE LAWS Two other constraints include the measured fluxes on v1 and v5 . For illustration assume that v1 D 10 flux units and v5 D 6 flux units. This sets up the problem. Figure 4.8 shows a screen-shot of the LPSolve IDE software where the problem has been setup. The following code illustrate the problem expressed in the script language used by LPSolve. /* Objective function */ max: 0.5*v9 + 0.75*v8; /* Steady v1 - v2 v2 - v4 v3 + v6 v6 - v9 = v5 - v4 = 2 v4 - v8 /* v1 v5 v3 State Constraints */ v3 = 0; /* A */ v6 = 0; /* B */ v7 = 0; /* C */ 0; /* F */ 0; /* E */ + v7 = 0; /* D */ Known Flux Constraints */ = 10; = 6; >= 1; Running this script through LPSolve (click the green go button in the tool bar) yields the following computer optimal solution: v1 D 10I v2 D 9I v3 D 1I v4 D 6I v5 D 6I v7 D 4I v8 D 16I v9 D 3 The maximum flux was 13.5 units exiting at v8 and v9 . 4.4 Isotopic Flux Measurements In the previous section linear programming and its application to flux balance analysis was described as a method for estimating fluxes in undetermined systems. The method carried with it a number of assumptions, one in particularly was the choice of objective function which can in some systems be difficult to describe or justify. In addition, flux balance analysis has difficulties in estimating the fluxes in certain cases without more 4.4. ISOTOPIC FLUX MEASUREMENTS 63 Figure 4.8: LPSolve IDE used to model a simple metabolic model problem. information, in particular the flux in parallel pathways, metabolic cycles such as futile cycles, and cofactor linked cycles cannot always be resolved by the method (See Figure 4.9). For this reason, other more experimentally based approaches have been devised to try and gather data on fluxes more directly. The most important approach by far is the use of isotopic tracer techniques, often referred to as metabolic flux analysis or MFA. The method proceeds in two phases, one experimental and another computational. The computational analysis is very important as the data analysis is complex owing to the size of the data sets and the resulting combinatory expansion of the system equations. Let us first consider the experimental phase. 64 CHAPTER 4. FLUX BALANCE LAWS a) b) c) a1 a2 Figure 4.9: Typical situations where linear programming based flux balance analysis cannot resolve fluxes: a) Parallel pathways; b) Metabolic Cycles; c) Pathways with closed cofactor cycles. Table 4.2: Isotopes commonly used in biological research. Common Isotope 1H 12 C 14 N 16 O Rare Stable Isotope 2H (0.02%) 13 C (1.1%) 15 N (0.37%) 18 O (0.04%) Radioactive Isotope 3H 14 C 13 N 11 O Isotopes are atoms that have the same number of protons but differ in the number of neutrons. For example, carbon has three naturally occurring isotopes, the common and stable 12 C (6 protons and 6 neutrons), the stable and relatively uncommon ( 1%) 13 C (6 protons and 7 neutrons) and trace amounts of radioactive 14 C (6 protons and 8 neutrons), Table 4.2. In practice a given substrate, such as glucose will be labeled, that is one or more of the atoms in the glucose molecule will be replaced by a different isotope. For example, the 12 C on position one might be replaced with an atom of 13 C. In this case the glucose is referred to as [1-13 C]glucose to distinguish it from natural glucose. The main advantage to using isotopes is that they can be measured, that is in a mixture of labeled and unlabeled glucose it is possible to distinguish between the two molecules. The way labeled molecules are identified de- 4.4. ISOTOPIC FLUX MEASUREMENTS 65 pends on whether radioactive or stable isotopes are used. Radioactive isotopes can clearly be identified by their decay emissions, for example ˇ decay in 14 C and 3 H by using scintillation counters. The advantage to using radioactive isotopes is their great sensitivity. However they are also difficult to handle due to the radiation hazard. Stable isotopes can be identified by measuring the difference in mass between labeled and unlabeled molecules using mass spectroscopy combined with gas chromatography (GC/MS). Gas chromatography is used to separate the initial mixture of compounds based on differential equilibration between a gas and solid phase. Once separated, each compound is fed into the mass spectrometer where each compound is broken into fragments by an electron beam. The fragments, now charged, are first accelerated in an electric field then travel through a magnetic field on a circular path. The path that an individual fragment actually takes will depend on its charge and mass. The end results is a MS spectrum which records the relative proportion of the different fragments that were detected. If similar fragments contain different isotopes then different peaks will emerge in the spectrum and the proportional of the different labeled compounds can be determined. The introduction of high performance GC/MS in the last 10 years or so has revolutionized metabolic flux analysis and is now probably the preferred choice for estimating fluxes. The basis for MFA is that when a labeled substrate is fed to an organism, the labeled atoms distribute themselves throughout the chemical composition of the organism. In microbial studies, commonly used substrates include specifically labeled glucose such as [1-13 C]glucose, uniformly labeled glucose ([U-13 C]glucose) or labeled amino acids. Once administered, the labeled molecules are metabolized by the organism and through various metabolic processes the atoms in the labeled substrate are rearranged by separation and recombination of molecular fragments. In addition some labeled isotope is either lost as metabolic waste, for example, CO2 or incorporated into biomass. Assuming no further changes take place and the substrate is constantly applied, the distribution of the isotopes will reach what is called isotopic steady state. This can occur quite rapidly in about an hour. Once in isotopic steady state, GC/MS or NMR is used to determine how the label has been distributed in the various metabolites of 66 CHAPTER 4. FLUX BALANCE LAWS interest. This is the raw data that is used to determine the fluxes through the various pathways. In order to understand the process of generating fluxes from the isotopic data a number of terms must first be defined and understood. Isotopomer One of the most important concepts in MFA is the isotopomer. Consider a molecule of alanine which has three carbon atoms; there are eight different ways to label a three carbon alanine molecule, Figure 4.10. As label enters the metabolic pathways from an external source there is the potential for the label to partition itself into every possible isotopomer. In general for a molecule with n potentially labeled atoms there will be 2n different isotopomers, for example alanine with three atoms has 23 D 8 possible isotopomers. Most often it is the relative mole fraction of isotopomers for a given molecular type that is considered and the vector of that holds the fractional contribution of each isotopomer is usually called the isotopomer distribution vector, or IDV. Mass Distribution Vector An- Figure 4.10: Alanine is a three carbon amino acid. If Alanine were labeled with 13 C, there would be eight possible different labeling patterns. These different labeled forms are call Isotopomers. For a molecules with n potentially labeled atoms, there will 2n possible isotopomers. other useful concept is the mass distribution vector, often abbreviated to MDV in the literature. An element from the mass distribution vector gives 4.4. ISOTOPIC FLUX MEASUREMENTS 67 the proportion of mass in a group of isotopomers of the same mass. For n potentially labeled atoms in a molecule there will be n C 1 elements in the MDV. The C1 element corresponds to the fully unlabeled molecule. Figure 4.11 illustrates the relationship between the IDV and MDV measures. The key reason for considering these two different descriptions is that the MDV are measurable while the IDV are on the whole more difficult to obtain experimentally, although a careful study of the fragmentation patterns from the mass spectrometry can sometimes give information on the IDV itself. In addition NMR can also be used to gain some information on the relative distribution of specific isotopomers, but the MDVs are the primary experiential data. Mass Distribution Vector (MDV) 9% 33% 49% 9% 9% 5% 23% 5% 36% 4% 9% 9% Isotopomers Fractions (IDV) C1 C2 C3 Isotopomers Figure 4.11: This figure illustrates the relationship between the isotopomer fraction (IDC) and the mass distribution vector (MDV). The example uses a three carbon molecule of which there are eight possible isotopomers. For each labeled molecule there is a fraction that is labeled, for example the unlabeled molecule is 9% of the total fraction. To compute the mass distributions, we collect all isotopomers having the same number of labeled atoms, for example, the 2nd, 3rd and 4th isotopomers have one labeled atom each, therefore this group constitutes a particular element in the MDV, in this case 33% Figure 4.12 shows a simple hypothetical network that illustrates three ways 68 CHAPTER 4. FLUX BALANCE LAWS to view a such a network, as a stoichiometric network, as an atom transition network and as a isotopomer network. The stoichiometric network, a), is the simplest and most familiar, with six species and five connecting reactions. If we assume that the species, A, B, E, and F contain two atoms that could be potentially labeled, and species C and D contain one atom each that could be potentially labeled, then b) in Figure 4.12 shows the species with their atomic structure explicitly given, hence the atom transition network. A number of assumptions are invoked in order for the subsequent analysis to be valid. The most important is that the system is at steady state, that is the fluxes and the isotopic distribution are steady. Some of the fluxes in the system can be measured directly, for example most of the external fluxes such as substrate uptake and product and biomass formation are known. What is left are the intracellular fluxes and it is these that will be estimated from the isotopic data. The second phase in MFA is the computational effort. This is a fairly sophisticated and computationally procedure. Here we describe the basic approach but many refinements have been introduced in recent years [70, 74, 69]. The essential idea behind the computational phase is the construction of a set of differential equations that describe the time evolution of the isotopomer distribution vector. These equations include two kinds of terms, fluxes and elements from the isotopomer distribution vector. The equations are used to predict the steady state levels of the various isotopomers, or more precisely the fractional distribution of the isotopomers at steady state. The nature of these equations will be described more fully later, for now let us designate the isotopomer distribution vector with the symbol p so that the set of differential equations can be written as: dp D f .p; J / dt At steady state the left-hand side is zero and the isotopomer can be written, at least in principle, as a function of the fluxes, J . p D g.J / 4.4. ISOTOPIC FLUX MEASUREMENTS v2 v1 v2 v5 v4 69 v1 v5 v3 v4 a) v3 b) v1 v1 v1 v2 v2 v4 v5 v5 v5 v3 c) Figure 4.12: Label distribution in a simple network: a) Stoichiometry network, b) Atom transition network, c) Isotopomer network. Figure adapted with permission from Weitzel et al. [69], BioMed Central We say in principle because the equations will tend to be non-linear, rendering an analytical solution difficult if not impossible to obtain, instead numerical methods are used to find the solution, p. Once a solution has been found, the vector p is compared to the real measurements and a difference computed. The procedure now makes small adjustments to the flux values and the steady state equations is solved again to obtain a new p vector. If the difference between the new values and the measured values is small then the flux values are accepted and the procedure repeated otherwise the fluxes are adjusted again. The actual strategy for adjusting the fluxes will be described later but what we have is an iterative procedure where the flux values are adjusted until the measured values of the 70 CHAPTER 4. FLUX BALANCE LAWS isotopomers match the computed values. The procedure just outlined is of course a classic optimization problem and many strategies exist for adjusting the flux values at each iteration including gradient search methods such as Levenberg-Marquardt or better still evolutionary algorithms [54, 75] that are less likely to fail to converge. In practice the measured values for the isotopomer distribution are not usually available, instead the model values are converted to the mass distributions and it is these that are compared to the measured mass distributions. One can imagine that in a large network, particularly where the metabolites have many potentially labeled atoms (say six or more carbon atoms) then the number of isotopomers can become very large with a corresponding increase in the number of model differential equations. Large models can have thousands of differential equations that need to be solved at each iteration. The computational cost is therefore relatively high although with the availability of cheap and powerful personal computers the issue is not so significant as it used to be. One question remains which relates to the exact nature of the model equations that are used to predict the isotopomers. Of all the steps required during the computational phase, generating the model equations is probably the most tedious and error prone, especially given the large number of equations that need to be deployed. With this in mind a number of authors have devised specialized software that can automate this phase and much else. Here a brief description of the equations themselves will be given. What may not be obvious is that the model equations do not assume any kinetics from the reaction steps themselves, that is there are no rates that depend on Michaelis-Menten rate laws or other more complicated functions. Instead linear equations are devised that assume that the rate of reaction between two particular label molecules is a linear function of the isotopomer concentrations. This is possible because the underlying metabolic state is assumed to be at steady state. In addition, the individual rates are simply scaled terms containing the fluxes. Consider the system depicted in Figure 4.13. The overall reaction is given as A ! B ! 2 C in the upper panel. In the lower panel we see the individual species represented by their groups of isotopomers. For simplicity the species are assumed to only contain two potentially labeled carbon atoms. 4.4. ISOTOPIC FLUX MEASUREMENTS 71 The first reaction, v1 swaps the carbon atoms and the second reaction, v2 , dissociates the species into two one carbon units, C and D. The fractional distribution of isotopomers in the A species is given by A1 and A2 , and in the B species by B1 and B2 . Note that in each case the following is also true, A1 C A2 D 1 and B1 C B2 D 1. At steady state the flux from species a) Overall Reaction A v1 B D v2 C a) Reaction in Terms of Isotopomers v 1.A 1 A1 v 1.A 2 A2 v 2.B 1 B1 v 2.B 2 B2 A B Figure 4.13 A to B and from species B to C plus D is v1 and v2 respectively, these are the fluxes we would like to know. However the isotopomer computational model considers each isotopomer reaction transition as a separate reaction such that the rate from from one isotopomer to another is proportional to the fraction of isotopomer. For example, the rate of reaction from isotopomer A1 to B1 is the fraction of the overall rate, v1 A1 . Likewise for the other isotopomers. For this 72 CHAPTER 4. FLUX BALANCE LAWS system, the rate of change of the fraction B1 and B2 is then give by: dB1 D v1 A 1 dt v2 B 1 dB2 D v1 A 2 dt v2 B 2 Note that these equations compute the rate of change on the fraction of isotopomers, not the absolute amount of isotopomers. This approach eliminates the need for a complex kinetic model whose construction would be extremely difficult to construct and suspect at best. The computational effort required to estimate the fluxes are as formidable as the experimental effort and for this reason a number of authors have devised software for the automatic construction and solution to the equations. One of the earliest and most comprehensive is the software tool by Wiechert [73], 13C-FLUX1 who was one of the pioneers in developing the current state of MFA [72, 71, 70]. Other tools of note include FluxSimulator from Binsl ([6]) and FiatFlux from [76]. There are many other details of MFA that have not been mentioned and the area is still under rapid development with an ever increasing number of researchers turning to use the approach to estimate fluxes [47, 50, 35, 58]. Flux ratios, negligible isotopic mass effects No need to fit external fluxes. Cumomer allow model equations to be solved analytically. If carbons atoms are not mixed up then those fluxes cannot be easily determined. Statistics, sensitivity tests. Exercises 1 see http://www.uni-siegen.de/fb11/simtec/software/13cflux/ 5 Steady State Flux Patterns One of the interesting aspects of the stoichiometry matrix is how the columns of the matrix constrains flux patterns particularly at steady state. In this chapter we will be looking at two approaches that help us understand these constraints. These related approaches involve examining the null space of the elementary modes of the stoichiometry matrix. 5.1 The Null Space The null space of the stoichiometry matrix and its transpose provides important information on the structural constraints in a network. The null space was introduced briefly in the last chapter in the form of equation 4.9. In this chapter we will consider more closely its physical interpretation. Equations 5.1 and 5.2 are the null space equations that were introduced in the last chapter. The fact that the right-hand side is zero, means that the null space vectors must indicate some particular aspect of the steady state. I NDC NIC D0 (5.1) K0 73 74 CHAPTER 5. STEADY STATE FLUX PATTERNS Box 5.1 The Null Space Given a matrix equation of the form Ax D 0 where A is an m n matrix and x is a column vector of n elements, the solution, that is all the vectors x that satisfy this equation, is called the null space of A. The number of vectors required to fully describe the null space is called the dimension of the null space and is equal to the rank of the matrix rank.A/ minus the number of columns, n. These vectors form what is called a basis for the space and linear combinations of these vectors can generate any other vector in the null space. In order to form a basis, the vectors must also be linearly independent. Many tools can compute the basis for the null space, for example null (A, 'r') will compute the basis in Matlab, while NullSpace[A] can be used to compute the basis in Mathematica. or more simply: NR K D 0 (5.2) Equation 5.2 is a homogeneous linear equation who solutions are given by the columns of the K matrix. The equation below illustrates the null space vectors for the complex branched pathway in Figure 4.3. The partitioning of K is shown by a horizontal dotted line. 2 2 4 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 6 36 1 6 6 0 56 6 1 6 6 4 1 0 0 0 1 0 1 1 0 0 1 1 0 0 1 3 7 7 7 7 7D0 7 1 7 7 0 5 1 (5.3) The simplest interpretation of the K matrix is that the vectors that make 5.1. THE NULL SPACE 75 up K represent possible steady state flow patterns in the network. In addition, any linear combination of the vectors is also a valid steady state flow pattern. Thus for the network shown in Figure 4.3, the null space can be shown to be: 2 3 1 0 0 6 0 1 0 7 6 7 6 0 0 1 7 6 7 6 1 0 1 7 6 7 4 1 1 0 5 0 1 1 The null space contains three vectors which can be interpreted as flow patterns which satisfy the steady state condition. These flow patterns are shown in Figure 5.1 below. Any flow pattern in vivo is some linear combi- a) S2 S1 b) S1 S3 S2 S3 c) S1 S2 S3 K = 1 0 0 1 1 0 0 1 0 0 -1 -1 0 0 1 -1 0 1 v3 v5 v6 v1 v2 v4 Figure 5.1: Flow Patterns Based on the Null Space of the Stoichiometry Matrix. nation of the three basic patterns indicated in the null space. For example, the following combination is a potential flow pattern: 76 CHAPTER 5. STEADY STATE FLUX PATTERNS 2 6 6 6 J D6 6 6 4 1:5 0:6 0:8 0:7 0:9 0:2 3 2 1 0 0 1 1 0 7 6 7 6 7 6 7 D 1:5 6 7 6 7 6 5 4 3 2 7 6 7 6 7 6 7 C 0:6 6 7 6 7 6 5 4 0 1 0 0 1 1 3 2 7 6 7 6 7 6 7 C 0:8 6 7 6 7 6 5 4 0 0 1 1 0 1 3 7 7 7 7 7 7 5 The one problem with this interpretation is that negative terms in the K 0.7 S1 0.9 S2 0.2 1.5 0.6 S3 0.8 Figure 5.2: A Possible Flow Pattern Based on the Null Space. matrix indicate that the flow is in the opposite direction to that indicated in the network diagram. Such flows might be thermodynamically unlikely. For example, pattern (c) in Figure 5.2 shows the reaction v1 operating in the opposite direction to that indicted in the original Figure 4.3. It may be the case that v1 is reversible in which case pattern c) in Figure 5.1 is a legitimate flow pattern. If however, v1 , irreversible the flow pattern is not likely to occur in vivo. The use of elementary modes (see next section) eliminates this problem by forbidding patterns that include irreversible reactions. In summary, the null space vectors, and combinations thereof, can be interpreted as possible steady state flows through a given network. There is however another interpretation which is also very useful with respect to metabolic engineering. Let us consider the system equation again: N vD0 Let us assume that it is possible, by some means, to change the rates through the reactions such that the species levels remain unchanged but 5.1. THE NULL SPACE 77 the flux changes. This will be justified in a later volume when we consider dynamics and control coefficients. In particular it can be shown that the unscaled concentration control coefficients and the null space are related by the expression: C sK D 0 (5.4) where elements of the C s matrix equal dSi =dEj , that is, how a given enzyme, Ej effects the steady state concentration of a given molecular species, Si . The equation tell us that perturbations in reaction rates that match entries in the K vector results in no changes in concentrations. The same applies to linear combinations of vectors in the K matrix. With this in mind we can state that there is a set of perturbations, ıv, that satisfies the following: N .v C ıv/ D 0 which can be simplified to: N ıv D 0 This equation tells that the null space of N can be interpreted as the vector ıv. That is, ıv, can be interpreted as a set of disturbances to the reaction rates that leaves the steady state species levels unchanged but changes the fluxes. Such perturbations could be achieved by changing the level of gene expression at each reaction which has a non-zero entry in the K matrix. For example the first column of the K matrix in Figure 5.1 is Œ1 0 0 1 1 0T . This means that changing rates v3 , v1 and v2 by a ıv amount will leave the steady state concentration of S1 , S2 and S3 unchanged but will increase the net flow from v1 to v3 by a ıv amount. In practice such changes might not be realizable but in principle one could imagine changing the enzyme activities at v1 , v2 and v3 through changes in gene expression. Since enzyme activity is proportional to the concentration of enzyme it must be true that proportional changes in an enzyme concentration, Ei will lead to proportional changes in the reaction rate vi - assuming other factors such as substrate and product concentrations remain unchanged. Since the later condition can be guaranteed, to make a given relative change in vi , ıvi =vi , we need only make the same proportional change in Ei , that is: 78 CHAPTER 5. STEADY STATE FLUX PATTERNS ıEi ıvi D Ei vi (5.5) From the first column of K , we can state that ıv1 D ıv2 D ıv3 or equivalently: ıv1 ıv2 v2 ıv3 v3 D D v1 v2 v1 v3 v1 or ıE1 ıE2 v2 ıE3 v3 D D E1 E 2 v1 E3 v1 In practice, if we change the activity of enzyme, E1 by a percentage, ˛, then the percentage changes we must make in E2 and E3 will equal: ıE1 D˛ E1 v1 ıE2 D˛ E2 v2 (5.6) v1 ıE3 D˛ E3 v3 This result indicates the changes in enzyme activity that are necessary in order to increase the flux through v1 ; v2 and v3 while keeping all other fluxes and metabolite levels the same. It shows that the relative changes in enzyme concentrations is related to the proportion of flux that the particular step carries. Note that this result applies to large as well as small changes in enzyme concentrations. The ability to alter fluxes independently of metabolite concentrations is a desirable goal in metabolic engineering because when metabolites change regulation is invoked with resulting unpredictable effects. 5.2. ELEMENTARY FLUX MODES 79 The null space basis have one disadvantage, the flow patterns that the null space basis admit may not necessarily be thermodynamically viable. In order to circumvent this problem a different approach was devised called elementary flux modes. 5.2 Elementary Flux Modes A closely related concept to the null space of the stoichiometry matrix is the set of elementary flux modes. As previously discussed, the vectors in the null space of the stoichiometry matrix can be interpreted as steady state flow patterns in a network. However, one criticism is that the vectors in the null space can admit patterns that are thermodynamically unlikely (See Figure 5.1). In addition, the set of null space vectors is not unique. Elementary flux modes avoid these issues. Elementary flux modes are minimal realizable flow patterns through a network that can sustain a steady state. This means that elementary modes cannot be decomposed further into simpler pathways. Elementary flux modes provide a comprehensive description of all metabolic routes for a group of enzymes that are stoichiometrically and thermodynamically feasible [56]. As a result, metabolic pathways can be defined in terms of their elementary modes. Mathematically elementary modes are defined as follows. An elementary mode, ei , is defined as a vector of fluxes, v1 ; v2 ; : : :, such that the following three conditions must be met (Table 5.1). In the following examples, all elementary models were computed using Metatool via JDesigner. Figure 5.3 shows the two elementary modes that exist for a simple branched pathway. All steps in Figure 5.3 are assumed to be irreversible. Let us show that each mode in this system satisfies the three conditions (Table 5.1). The first condition is steady state, that is for each mode ei , N ei D 0 The two modes are given by: 80 CHAPTER 5. STEADY STATE FLUX PATTERNS 1. The vector must satisfy: N ei D 0, that is the steady state condition. 2. For all irreversible reactions, vi 0. This means that all flow patterns must use reactions that proceed in their most natural direction. This makes the pathway described by the elementary mode a thermodynamically feasible pathway. 3. The vector ei must be elementary, that is, it should not be possible to generate ei by combining two other vectors that satisfy the first and second requirements using the same set of enzymes that appear as non-zero entries in ei . In other words it should not be possible to decompose ei into two other pathways that can themselves sustain a steady state. Table 5.1: Conditions necessary to define an Elementary Mode. a) b) S1 S1 Figure 5.3: Elementary mode patterns in a simple branched pathway assuming irreversibility at all reaction steps. Highlighted reactions in bold signify steps that belong to the elementary mode. 2 3 1 415 0 and 2 3 1 405 1 (5.7) By substituting each of these vectors into N ei D 0, it is easy to show that condition one is satisfied. For condition two we must ensure that all reactions that are irreversible have positive entries in the corresponding elements of the elementary modes. Since all three reactions in the branch are irreversible and all entries in the elementary modes are positive then condition two is satisfied. 5.2. ELEMENTARY FLUX MODES 81 Finally to satisfy condition three we must ask whether we can decompose the two elementary modes into other paths that can sustain a steady state while using the same non-zero entries in the elementary mode. In this example it is impossible to decompose the elementary modes any further without disrupting the ability to sustain a steady state. Therefore with all three conditions satisfied we can conclude that the two vectors given previously are elementary modes. Like the basis for the null space, all possible flows through a network can be constructed from linear combinations of the elementary modes, that is: vD X i ei (5.8) where 0 such that the entire space of flows through a network can be described. i must be greater than or equal to zero to ensure that irreversible steps aren’t inadvertently made to go in the reverse direction. For example, the following is a possible flow in the branched pathway: 2 3 2 3 2 3 1 1 3:0 4 5 4 5 4 v D 2:5 1 C 0:5 0 D 2:55 0 1 0:5 If one of the outflow steps in the simple branched pathway is made reversible an additional elementary mode becomes available that represents the flow between the two outflow branches (Figure 5.4). An additional mode emerges because with only the first two modes it is impossible to represent a flow between the two branches because the scaling factor, i , cannot be negative which would be required to reverse the flow. Equation 5.4 indicated that if specific perturbations are made along the route indicated by a vector in the null space, then all species remain unchanged while the net flux increases. This equation can be extended to also 82 CHAPTER 5. STEADY STATE FLUX PATTERNS include elementary modes, so that if E is the vector of elementary modes, then since an elementary mode can be generated from a suitable combination of null space vectors (personal communication: Stefan Schuster), it must be true that: C sE D 0 (5.9) This is an important results because it indicates that pathways represented by individual elementary modes can also be perturbed such that species levels remain unchanged which has a significant bearing on metabolic engineering strategies. a) b) S1 S1 c) S1 Figure 5.4: Elementary mode patterns in a simple branched pathway assuming reversibility at one of the outflow branches. Cyclic Branched Model Figure 5.6 lists the elementary modes for a cyclic branched model. Whereas the null space vectors admit flow patterns which violate thermodynamic considerations, elementary modes do not. For example, pattern (b) and (c) in Figure 5.1, the reactions v1 and v4 are going in the reverse direction. This also means that there are likely to be more elementary mode vectors than the dimension of the null space. Figure 5.6 illustrates four elementary modes when the first reaction is considered reversible. Figure 5.5 on the 5.2. ELEMENTARY FLUX MODES 83 other hand shows only three elementary modes when the first reaction is assumed to be irreversible. a) S1 S2 S3 b) S1 S2 c) S1 S2 S3 S3 Figure 5.5: Elementary mode patterns in a multi-branched pathway assuming irreversibility at each reaction step. Comment on Condition Three Condition three in Table 5.1 requires further explanation. Condition three relates to the non-decomposability of an elementary mode and is partly what makes elementary modes interesting, the two other important features include are uniqueness and thermodynamic plausability. Decomposition implies that it is possible to represent a mode as a combination of two or more other modes. For example, a mode e1 might be composed from two other modes, e2 and e3 : e1 D 1 e2 C 2 e3 If a mode can be decomposed does it mean that the mode is not an elementary mode? Condition three provides a rule to determine whether a decomposition means that a given mode is an elementary mode or not. If it is only possible to decompose a given mode by introducing enzymes that 84 CHAPTER 5. STEADY STATE FLUX PATTERNS a) S2 S1 S1 b) S3 S1 c) S2 S3 S2 S1 d) S2 S3 S3 Figure 5.6: Elementary mode patterns in a multi-branched pathway assuming reversibility at the first reaction step. are not used in the mode, then the mode is elementary. That is, is there more than one way to generate a pathway (i.e something that can sustain a steady state) with the enzymes currently used in the mode? If so, then the mode is not elementary. To illustrate this subtle condition consider the pathway shown in Figure 5.7. 1 S1 2 S2 4 S3 5 3 S3 6 Figure 5.7: Stylized Glycolytic Pathway This pathway represents a stylized rendition of glycolysis. Two of the steps in the network are reversible, that is step three and six are reversible and correspond to triose phosphate isomerase and glycerol 3-phosphate dehydrogenase respectively. 5.2. ELEMENTARY FLUX MODES 85 The network has four elementary flux modes which are shown in Figure 5.8. The elementary flux mode vectors are shown below: a) 1 S1 4 S2 2 b) S1 6 4 S2 2 c) S1 4 S2 2 S3 5 3 d) S1 5 6 S3 1 S3 3 S3 1 5 3 S3 1 S3 6 4 S2 2 S3 5 3 S3 6 Figure 5.8: Stylized Glycolytic Pathway illustrating the four elementary flux modes. Elementary modes are shown as bold arrows. 2 6 6 6 6 6 6 4 e1 1 1 1 0 0 2 e2 1 1 0 1 1 1 e3 1 1 1 2 2 0 e4 0 0 1 1 1 1 3 7 7 7 7 7 7 5 (5.10) 86 CHAPTER 5. STEADY STATE FLUX PATTERNS Note that it is possible to have negative entries in the set of elementary modes because they will correspond to the reversible steps. Of interest is the observation that the fourth vector, e4 D Œ1 1 0 1 1 1 T (where T represents the transpose) can be formed from the sum of the first and second vectors (5.11). This suggests that the fourth vector is not an elementary mode. 2 6 6 6 6 6 6 4 e4 1 1 0 1 1 1 3 2 7 7 7 7D 7 7 5 6 6 6 6 6 6 4 e1 0 0 1 1 1 1 3 2 7 7 7 7C 7 7 5 6 6 6 6 6 6 4 e2 1 1 1 0 0 2 3 7 7 7 7 7 7 5 (5.11) However, this decomposition only works because we have introduced a new enzyme, E4 (triose phosphate isomerase) which is not used in the second vector. It is in fact not possible to decompose e4 into pathways that can sustain the steady state with only the five steps, E3 ; E4 ; E5 and E6 , used in the elementary mode. We conclude therefore that e4 is an elementary mode. 5.3 Definition of a Pathway Unlike the basis of the null space, the set of elementary modes for a given network is unique (up to an arbitrary positive scaling factor). Given the fundamental nature of elementary modes, particularly their uniqueness and non-decomposability, they are a vehicle with which to define the notion of a pathway. That is every elementary mode and every positive linear combination of elementary modes is by definition, a pathway. A single elementary mode can therefore be thought of as an elementary pathway. Note that the set of elementary modes will change as the set of expressed enzymes change during transitions from one cell state to another. 5.4. MAXIMUM YIELD PREDICTIONS 87 5.4 Maximum Yield Predictions An important application of elementary modes is finding pathways that give the maximum molar yield, that is the largest product/substrate rate ratio: Yield D Synthesis Rate of Product Consumption Rate of Substrate (5.12) In many situations the biosynthesis of a product can be achieved by a number of different pathway routes and the question then arises what are the routes that achieve the maximum yield of product relative to a given starting material. A very interesting property of elementary modes is that the set of elementary modes in a particular pathway represent the highest yielding pathways. The argument for this is as follows. Any flux distribution can be described as a non-negative linear combination of elementary modes (5.8), for example, 1 e1 C 2 e2 . The yield of a given product and substrate is the weighted average of the yields of each of the elementary modes that make up the pathway. However, the average yield will always be smaller – the average of two numbers is always smaller than the highest of the two individual numbers – than the elementary mode in the set that has the highest yield. Hence, given that elementary modes cannot be decomposed, the elementary modes must represent the highest yielding pathways. Consider the network (from [61]) shown in Figure 5.9. The stoichiometry for the network is given by: S1 S2 N D S3 S4 S5 2 6 6 6 6 4 v1 1 0 0 0 0 v2 1 0 1 0 0 v3 0 0 1 1 1 v4 0 0 0 0 1 v5 1 1 0 0 0 v6 0 1 1 0 0 v7 0 1 0 0 2 v8 0 1 0 0 0 v9 0 0 0 1 0 3 7 7 7 (5.13) 7 5 88 CHAPTER 5. STEADY STATE FLUX PATTERNS X1 v8 S2 v5 Xo v1 S1 v7 v6 v2 S3 v3 S4 v4 S5 v 9 P Q Figure 5.9: Example Network to Illustrate Computation of Maximum Yields. Xo , Xo , P and Q are boundary species. Reactions v6 and v8 are reversible. It is straight forward to show using software such as JDesigner/Metatool that the network in Figure 5.9 has eight elementary modes, labeled EM1 , EM2 , EM3 , EM4 , EM5 , EM6 , EM7 and EM8 (See Figure 5.10). Let us suppose that we are interested in maximizing the production of product P from the feed substrate, Xo . Of the eight elementary modes, only six start with Xo . However, only four of these six result in the production of product P , that is EM4 , EM6 , EM7 and EM8 . The four elementary modes that connect Xo to P are given below: 5.4. MAXIMUM YIELD PREDICTIONS EM Yield 4 6 7 8 1 2 2 1 89 Table 5.2: Yield for each elementary mode that consumes input Xo and produces product P . 2 6 6 6 6 6 6 N D 6 6 6 6 6 6 4 EM4 EM6 EM7 EM8 3 1 1 1 1 1 0 1 0 7 7 1 0 0 1 7 7 1 2 2 1 7 7 0 1 0 1 7 7 0 0 1 1 7 7 0 1 1 0 7 7 0 0 0 0 5 1 0 0 1 (5.14) The question now is which of these elementary modes achieves the highest yield? Equation 5.12 will allow us to compute the yields for the elementary modes. Recall that the entries in the elementary mode vectors represent relative flux values and since the yield equation is a ratio of fluxes we can use the entries in the elementary modes to compute the yields for each elementary mode. For example, consider EM4 . The yield for this mode is given by v4 =v1 D 1=1 D 1. Table 5.2 summaries the yields for each of the four elementary modes. From the table (Table 5.2) it should be clear that two of the modes, EM6 and EM7 produce twice the yield as EM4 and EM8 . From this information it would be logical therefore to over express the enzymes along EM6 and EM7 pathways. However examination of EM6 and EM7 shows that EM6 includes four enzymatic steps whereas EM7 includes five enzymatic steps. 90 CHAPTER 5. STEADY STATE FLUX PATTERNS We can therefore narrow down the choice further and suggest that EM6 , which has fewer steps, would be the initial target for engineering. Having chosen the pathway to engineer we now need to determine by how much each enzyme should be over expressed. From equation 5.6 we know that in a branched system not every enzyme must be over expressed by the same amount. Instead we must compute the relative over expression in each enzyme from the known fluxes through the pathway (possibly computed using Flux Balance Analysis). We are also assured from equation that during this engineering, none of the metabolites will change. Flux balance analysis using linear programming can also be used to compute pathways with the highest yields by suitable adjustment of the objective function. However, elementary modes provides a systematic approach to uncovering all high yielding pathways [55, 56]. Linear programming will sometimes inadvertently uncover pathways that represent elementary modes and the work by Varma and Palsson [64, 65] on biomass yields in E. coli did just that. Computing elementary models efficiently is a non-trivial calculation but a small number of tools are available. In particular METATOOL (4.3 series) developed by a number of authors including, Thomas Pfeiffer, Stefan Schuster, Juan Carlos Nuno and Ferdinand Moldenhauer is highly recommended. This tool has been incorporated into the systems biology workbench and can be access via the JDesigner application. 5.5 Engineering a Pathway Most approaches used to engineer metabolic pathways stem from an intuitive understanding of how metabolism operates. For example, to increase the output of some product it seems logical to first increase the level of enzymes that are involved directly in the production of the product and secondly to reduce enzyme activities of those pathways that may divert flux away from the product pathway. This approach has been shown to work in certain cases [51]. However the method can fail due to inadvertent changes in metabolite levels that cause metabolites to increase to toxic levels. In addition changes to enzyme levels can also disrupt cofactor levels such 5.5. ENGINEERING A PATHWAY 91 as NAD that have a global and disruptive impact on cellular metabolism. What is needed is a more systematic approach to engineering pathways. Two such approaches will be described here. The simplest approach is to use flux balance analysis. In a flux balance model, one or more enzymatic steps can be eliminated to investigate the effect this has on the pathway of interest. However it does not take into account the effect that regulation which may be due to changes in metabolite levels. Here we describe the approach developed in this chapter that uses elementary modes as its basis. The strategy is as follows: 1. Enumerate all elementary modes in the metabolic network. 2. Find all modes which end at the desired product. 3. Select one of the modes in step two for engineering. The choice of mode will depend on the number of steps (which should be minimized), the costs involved in genetically engineering each step and the yield that the mode can deliver. 4. Use equation 5.5 and ?? to compute the degree of over-expression of each enzyme along the elementary mode. In theory this strategy should work, however there are a number of pitfalls. This include the inability to up regulate all the necessary enzymes and secondly the possibility be not begin able to be precise enough when a particular enzyme needs to be up regulated by a specific amount. Exercises 92 CHAPTER 5. STEADY STATE FLUX PATTERNS X1 EM 1 X1 EM 2 v8 v8 S2 v5 Xo v1 S1 S2 v7 v6 v2 S3 v5 S4 v3 v4 S5 v 9 P Xo S1 v2 v1 S1 S3 v3 v5 S4 v4 S5 v 9 P Xo v1 S1 v2 v1 S1 S3 v7 v5 v1 S4 v3 v4 P Xo v1 S1 v2 S3 S4 v3 v4 P S5 v 9 Q S4 P X1 EM 8 v8 S2 v7 v6 v2 S3 Q v8 S1 Q v7 v6 S2 Xo P v4 v8 X1 v5 S4 S2 S5 v 9 EM 7 v3 X1 EM 6 v6 v2 Q S5 v 9 S2 Xo S3 Q v8 v5 S5 v 9 v7 v6 X1 EM 5 P S2 v7 v6 v2 v4 v8 S2 Xo S4 v3 X1 EM 4 v8 v5 S3 Q X1 EM 3 v1 v7 v6 v3 v5 S4 v4 S5 v 9 P Q Xo v1 S1 v7 v6 v2 S3 v3 v4 S5 v 9 Q Figure 5.10: Example Network to Illustrate Computation of Maximum Yields. Xo , Xo , P and Q are boundary species. Reactions v6 and v8 are reversible. The network admits eight elementary mode. Each mode is indicated in red (thickened reactions). 6 Species Conservation Laws Many cell processes operate on different time scales. For example, metabolic processes tend to operate on a faster scale than protein synthesis and degradation. Such time scale differences have a number of implications to model builders, software designers and model behavior. In this chapter we will examine these aspects in relation to species conservation laws. To introduce this topic consider a simple protein phosphorylation cycle such as the one shown in Figure 6.1. This shows a protein undergoing phosphorylation (upper limb) and dephosphorylation (lower limb) via a kinase and phosphatase respectively. The depiction in Figure 6.1 is however a simplification. The ATP used during phosphorylation is not shown as well as the release of free phosphate during the dephosphorylation. In addition synthesis and degradation of protein is also absent. In many cases we can leave these aspects out of the picture. ATP for instance is held at a relatively constant level by strong homeostatic forces from metabolism so that within the context of the cycle, changes in ATP isn’t something we need worry about. More interestingly is that within the time scale of phosphorylation and dephosphorylation we 93 94 CHAPTER 6. SPECIES CONSERVATION LAWS Figure 6.1: Phosphorylation and Dephosphorylation Cycle forming a Moiety Conservation Cycle between Unphosphorylated (left species) and Phosphorylated protein (right species). can assume that the rate of protein synthesis and degradation is negligible. This assumption is more significant and leads to the emergence of a new property of the cycle called moiety conservation [49]. In chemistry a moiety is described as a subgroup of a larger molecule. In this case the moiety is a protein. During the interconversion between the phosphorylated and unphosphorylated protein, the amount of moiety (protein) remains constant. More abstractly we can draw a cycle in the following way (Figure 6.2), where S1 and S2 are the cycle species: A B v2 S1 S2 v1 D C Figure 6.2: Simple Conserved cycle where S1 C S2 D constant. The two species, S1 and S2 are conserved because the total S1 C S2 remains constant over time (at least over a time scale shorter than protein synthesis and degradation). Such cycles are collectively called conserved cycles. 95 Protein signalling pathways abound with conserved cycles such as these although many are more complex than this and may involve multiple phosphorylation reactions. In addition to protein networks other pathways also possess conservation cycles. One of the earliest conservation cycles to be recognized was the adenosine triphosphate (ATP) cycle. ATP is a chain of three phosphate residues linked to a nucleoside adenosine group, Figure 6.3. NH 2 N N N N OH O HO OH P -O O OH P -O O OH P OH O -O Figure 6.3: Adenosine Triphosphate: Three phosphate groups plus an adenosine subgroup. The linkage between the phosphate groups involves unstable phosphoric acid anhydride bonds and these can be cleaved by hydrolysis one at a time leading in turn to the formation of adenosine diphosphate (ADP) and adenosine monophosphate (AMP) respectively. The hydrolysis provides much of the free energy to drive endergonic processes in the cell. Given the insatiable need for energy, there is a continual and rapid interconversion between ATP, ADP and AMP as energy is released or captured. One thing that is constant during these interconversions is the amount of adenosine group (Figure 6.4). That is adenosine is a conserved moiety. Over longer time scales there is also the slower process of AMP degradation and biosynthesis via the purine nucleotide pathway but for many models we assume that this process is negligible compared to ATP turn over by energy metabolism. 96 CHAPTER 6. SPECIES CONSERVATION LAWS ATP ADP Degradation, Synthesis AMP Fast Slow There are many other examples of conserved moieties such enzyme/enzymesubstrate complex, NAD/NADH, phosphate and coenzyme A. In all these cases the basic assumption is that the interconversions of the subgroups is rapid compared to their net synthesis and degradation. We should emphasize that in reality conserved moieties do not exist since all molecular subgroups will at some point be subject to synthesis and degradation. However, over sufficiently short time scales, the sum total of these groups can be considered constant. In this chapter we will consider conserved moieties in detail. In particular we will look at how to detect them in our models, what effect they have on model dynamics and how they influence the design of simulation software. ATP ADP NH 2 NH 2 N N HO OH P O O OH P O O OH P N N OH O N N N N OH NH 2 N N N N AMP OH O OH O O HO OH P O O OH P O O OH O HO OH P OH O O Figure 6.4: The adenosine moiety, indicated by the boxed molecular group, is conserved during the interconversion of ATP, ADP and AMP. Moiety: Conserved Moiety: A subgroup of a larger molecule. A subgroup whose interconversion through a sequence of reactions leaves it unchanged. 6.1. MOIETY CONSERVED CYCLES 97 6.1 Moiety Conserved Cycles Any chemical group that is preserved during a cyclic series of interconversions is called a conserved moiety. Examples of conserved moiety subgroups include species such as phosphate, acyl, nucleoside groups or covalently modifiable proteins, As a moiety gets redistributed through a network, the total amount of the moiety is constant and does not change during the time evolution of the system. For any particular subgroup, the total amount is determined solely by the initial conditions imposed on the model. Figure 6.5: Conserved Moiety in a Cyclic Network. The blue species are modified as they traverse the reaction cycle, but the red subgroup (small circle) remains unchanged. This creates a conserved cycle, where the total number of moles of moiety (red subgroup) stays constant. There are rare cases when a ‘conservation’ relationship arises out of a nonmoiety cycle. This does not affect the mathematic analysis but only the physical interpretation of the relationship. For example, in Figure 6.6 the constraint B C D T applies even though there is no moiety involved. The presence of conserved moieties is an approximation introduced into 98 CHAPTER 6. SPECIES CONSERVATION LAWS B A D C Figure 6.6: Conservation due to stoichiometric matching. In this system, B C D constant. a model, however, over the time scale in which the conservations hold, their existence can have a profound effect on the dynamic behavior of the model. For example the hyperbolic response of a simple enzyme (in the form of enzyme conservation between E and ES ), or the sigmoid behavior observed in protein signalling networks is due in significant part to moiety conservation laws (see section ??). Figure 6.7 illustrates the simplest possible network which displays a conserved moiety, the total mass, S1 C S2 is constant during the evolution of the network. A B v2 S1 S2 v1 D C Figure 6.7: Simple Conserved cycle. The dotted lines signify negligible levels of synthesis and degradation, therefore over short time scales, S1 C S2 D constant. The system equations for the simple conserved cycle are easily written 6.2. BASIC THEORY 99 down as: dS1 D v1 dt v2 dS2 D v2 dt v1 From these equations it should be evident that the rate of appearance of S1 must equal the rate of disappearance of S2 , that is dS1 =dt D dS2 =dt . This means that when ever S1 changes, S2 must change in the opposite direction by exactly the same amount. During a simulation the sum of S1 and S2 will therefore remain unchanged. Computationally we need only explicitly evaluate one of the differential equations because the other one can be computed from the conservation relation. Whichever differential equation is chosen however, the species left out must be computed algebraically using the conservation law. Therefore, the system can be reduced to one differential and one linear algebraic equation compared to the two differential equations in the original formulation. S2 D T S1 dS1 D v1 dt v2 The term T in the algebraic equation shown above refers to the total amount of S1 and S2 . This value is computed from the initial amounts given to S1 and S2 at the start of a simulation. 6.2 Basic Theory The question we want to address here is how to determine whether a given network contains conserved cycles and if so what are they. The key to this question is the stoichiometry matrix, N . In the example shown in Figure 6.7 the stoichiometry matrix is given by: 100 CHAPTER 6. SPECIES CONSERVATION LAWS 10 Concentration 8 6 S1 S2 4 2 0 0 10 20 Time 30 40 Figure 6.8: Simulation of the simple cycle shown in Figure 6.7. The total moiety remains constant at 10 concentration units. Model: S1 -> S2; k1*S1; S2 -> S1; k2*S2; S1 = 10; k1=0.1; k2=0.2 N D 1 1 1 1 The first thing to note is that since either row can be derived from the other by multiplication by 1, the rows are called linearly dependent rows, (See Box 3.0) and the rank of the matrix is therefore 1 (See Box 3.1). It is these dependencies that appear as linear relationships between the rates of change, dS=dt. Whenever a network exhibits conserved moieties, there will be dependencies among the rows of N , and the rank of N rank(N ), will be less than m, the number of rows of N . The rows of N can be rearranged so that the first rank(N ) rows are linearly independent. The metabolites which correspond to these rows are called the independent species (Si ). The remaining m rank.N / rows correspond to the dependent species (Sd ). 6.2. BASIC THEORY 101 Box 3.0 Linear Dependence and Independent - Recap One of the most important ideas in linear algebra is the concept of linear dependence and independence. Take three vectors, say Œ1; 1; 2, Œ3; 0; 1 and Œ9; 3; 4. If we look at these vectors carefully it should be apparent that the third vector can be generated from a combination of the first two, that is Œ9; 3; 4 D 3Œ1; 1; 2 C 2Œ3; 0; 1. Mathematically we say that these vectors are linearly dependent. In contrast, the following vectors, Œ1; 1; 0; Œ0; 1; 1 and Œ0; 0; 1, are independent because there is no combination of these vectors that can generate even one of them. Mathematically we say that these vectors are linearly independent. In the simple conserved cycle, Figure 6.7, there is one independent species, S1 and one dependent species, S2 . Example 6.1 Figure 6.5 illustrates a three species cycle. What is the conservation law for this pathway? The stoichiometry matrix for this system is given by: 2 N D 4 v1 1 1 0 v2 0 1 1 v3 3 S1 1 0 5 S2 1 S3 (6.1) Inspection reveals that the sum of the three rows is zero meaning that dS1 dS1 dS1 C C D0 dt dt dt or that the total S1 CS2 CS3 is constant. There are no other relationships between the rows other than this one. Example 6.2 A linear pathway has the following stoichiometry matrix: 102 CHAPTER 6. SPECIES CONSERVATION LAWS N D 1 0 1 1 0 1 Does the pathway contain any conserved cycles? No, because neither row in the matrix can be derived from the other by a simple operation, the rows are linearly independent, therefore the pathway has no conserved cycles. To illustrate this idea on a more complicated example, consider the pathway shown in Figure 6.9. This pathway includes four species, S1 , S2 , E and ES . * S1 v2 v1 ~ ES E o ? v3 S2 Figure 6.9: Linked Conserved Cycles. The network rendered on the right shows the moiety composition of the participating species. The mass-balance equations of this model can be written down as: dE D v2 dt v3 dES D v3 dt v2 dS1 dS2 D v2 v1 D v1 v3 dt dt A visual inspection of the mass-balance equations reveals the following two relationships: dE dES C D0 dt dt (6.2) dES dS1 dS2 C C D0 dt dt dt 6.2. BASIC THEORY 103 These relationships tell us that there are two conservation laws, E C ES and ES C S1 C S2 . This means that given the amount of ES , the amount of E can be computed. In addition, given the amount of ES and S1 , the amount of S2 can be computed. Therefore ES and S1 can be designed the independent species and E and S2 the dependent species. What this means in practical terms is that in a modeling program only two differential equations need be solved instead of four. The reduced model equations will look like: E D T1 ES S2 D T2 S1 dES D v3 dt v2 dS1 D v2 dt v1 ES where T1 is the total amount of E type moiety and T2 is the total amount of S type moiety. Box 3.1 The Rank of a Matrix - Recap Closely related to linear independence (Box 3.0) is the concept of Rank. Consider the three vectors described in Box 3.0, Œ1; 1; 2, Œ3; 0; 1 and Œ9; 3; 4 and stack them one atop each other to form a matrix: 2 3 1 1 2 4 3 0 1 5 9 3 4 then the Rank is simply the number of linear independent vectors that make up the matrix. In this case the Rank is 2, because there are only two linear independent row vectors in the matrix. 104 CHAPTER 6. SPECIES CONSERVATION LAWS The stoichiometry matrix for the model in Figure 6.9 is given by: 2 N D 6 6 4 v1 1 0 1 0 v2 0 1 1 1 v3 3 1 1 7 7 0 5 1 S2 ES S1 E (6.3) Examining the stoichiometry matrix reveals conservation laws as relationships among the matrix rows. The 4th row (E) can be formed by multiplying the 2nd row (ES ) by -1, and the 3rd row (S1 ) can be formed by multiplying the first row by -1 and adding it to the 4th row (ES ). These simple examples show that it is possible to derive conservation laws by looking for dependencies among the rows of the stoichiometry matrix. For simple cases this can be done by inspection but for large pathways this approach is not practical. Instead a more systematic theory for deriving the conservation laws must be developed. 6.3 Computational Approaches There are a number of related methods for computing the conservation laws of a given pathway, some are simple such as the one shortly to be described, while others are more sophisticated and are used to determine the conservation laws in very large stoichiometry matrices. The easiest method to derive conservation laws is to use row reduction [42, 10, 9]. This is based on forward elimination which is the first part of Gaussian Elimination. Gaussian Elimination is a traditional way to solve simultaneous linear equations by eliminating one unknown at a time and is a technique often taught in high school. Elimination is carried out by applying a series of simple manipulations called elementary operations. These operations include interchanging two equations (exchange), multiplying an equation through by a nonzero number (scaling) and adding an equation one or more times to another equation (replacement). In practice the equations are recast into a matrix form so that the elementary operations are applied to the values in the matrix where each row of the matrix 6.3. COMPUTATIONAL APPROACHES 105 represents an equation. Thus interchanging two equations is equivalent to swapping two rows in the matrix. The elementary operations are carried out on the matrix until a particular arrangement, called the echelon form, is established (See Box 3.3). Elementary operations are often represented in matrix form and are then called elementary matrices (See Box 3.2). Applying a particular elementary operation then becomes equivalent to multiplying by an elementary matrix. The technique for finding conservation laws works as follows. Consider the network in Figure 6.9. The system equation for this network is: 3 2 3 2 2 3 S2 1 0 1 dS2 =dt v1 7 6 ES 6 1 1 7 7 4 v2 5 D 6 dES=dt 7 6 0 4 dS1 =dt 5 S1 4 1 1 0 5 v3 0 1 1 E dE=dt We will recast the equation in the following form where an identity matrix has been added to the right-hand side. Nv D I dS dt Written out fully the system equation will look like: 2 S2 ES 6 6 S1 4 E 1 0 1 0 0 1 1 1 2 3 2 3 1 1 v 1 6 0 1 7 7 4 v2 5 D 6 4 0 0 5 v3 1 0 0 1 0 0 0 0 1 0 32 dS2 =dt 0 6 dES=dt 0 7 76 0 5 4 dS1 =dt dE=dt 1 3 7 7 5 Let us now apply forward elimination to the stoichiometry matrix. To do this we apply a series of elementary operations to the left-hand side such that the stoichiometry matrix is reduced to echelon form. For consistency we apply the same set of elementary operations to the right-hand side so that the identity matrix records whatever operations we carried out. This amounts to multiplying both sides by a set of elementary matrices. We 106 CHAPTER 6. SPECIES CONSERVATION LAWS Box 3.2 Elementary Matrices - Recap Elementary matrix operations such as row exchange, row scaling or row replacement can be represented by simple matrices called elementary matrices, called Type I, II and III respectively. Elementary matrices can be constructed from the identity matrix. For example a scaling operation can be represented out by replacing one of the elements of the main diagonal of an identity matrix by the scaling factor. The following matrix represents a type II matrix which will scale the second row of a given matrix by the factor k: 2 3 1 0 0 40 k 0 5 0 0 1 Type I elementary matrices will exchange two given rows in a given matrix and are constructed from an identity matrix where rows in an identity matrix are exchanged that correspond to the rows exchanged in the target matrix. The following type I matrix will exchange rows 2 and 3 in a target matrix: 2 3 1 0 0 40 0 1 5 0 1 0 Type III elementary matrices will add/subtract a given row in a target matrix to another row in the same matrix. Type III matrices are constructed from an identity matrix where a single off diagonal element is set to the multiplication factor and the specific location represents the two rows to combine. If an elementary matrix adds a row i to a row j multiplied by a factor ˛, then the identity matrix with entry i; j is set to ˛. In the following example, the type III elementary matrix will subtract five times the 2nd row from the 3rd row. 2 3 1 0 0 40 0 15 0 5 0 A particularly important property of elementary matrices is that they can all be inverted. In addition, pre-multiplying by an elementary matrix will modify the rows of a target matrix while post-multiplying will operate on the columns. 6.3. COMPUTATIONAL APPROACHES 107 only need to reduce the matrix to its row echelon form not to its reduced echelon form. Reducing a matrix to echelon form raises the possibility of generating zero rows in the matrix if there are dependencies in the rows (See Box 3.3). This being the case the system equation after forward elimination can be expressed in the following way: dS M vDE (6.4) 0 dt where the identity matrix has been shown transformed into the matrix E which represents the product of all elementary operations that were applied to the left-hand side. The left-hand side has itself been transformed into an echelon form which is represented as a partitioned matrix. The E matrix can also be partitioned row-wise to match the partitioning in the echelon matrix, that is: M 0 vD X Y dS dt (6.5) Multiplying out the lower partition one obtains: Y dS D0 dt (6.6) This general result is equivalent to the equations shown in 6.2, that is 6.6 represents the set of conservation laws. Determining the conservation laws therefore involves reducing the stoichiometry matrix and extracting the lower portion of the modified identity matrix. Let us now proceed with an example to illustrate this method. We will use the stoichiometry matrix from equation 6.3. For convenience the stoichiometry and identity matrix are placed next to each other in the following sequence of elementary operations. An elementary operation carried out on the stoichiometry matrix is simultaneously applied to the identity matrix. 108 CHAPTER 6. SPECIES CONSERVATION LAWS 1. Stoichiometry matrix on the left and identity matrix on the right. 2 1 6 0 6 4 1 0 0 1 1 1 3 1 17 7 05 1 2 1 60 6 40 0 0 1 0 0 0 0 1 0 3 0 07 7 05 1 0 0 1 0 3 0 07 7 05 1 2. Add the 1st row to the third row to yield: 2 1 6 0 6 4 0 0 0 1 1 1 3 1 17 7 15 1 2 1 60 6 41 0 0 1 0 0 3. Add the 2nd row to the third and forth rows to yield: 2 1 6 0 6 4 0 0 0 1 0 0 3 1 17 7 05 0 2 1 60 6 41 0 0 1 1 1 0 0 1 0 3 0 07 7 05 1 4. Multiply the second row by -1 to yield the final echelon form: 2 1 6 0 6 4 0 0 0 1 0 0 3 1 17 7 05 0 2 1 60 6 41 0 0 1 1 1 0 0 1 0 3 0 07 7 05 1 The final operation achieves the goal of reducing the stoichiometry matrix to an echelon form (in this case it happens to be a reduced echelon form). Note that the operation has resulted in two zero rows appearing in the reduced stoichiometry matrix. These two rows correspond to the Y partition in equation 6.5. The lower two rows can be extracted from the right-hand matrix (what was once the identity matrix) to construct equation 6.6, thus 2 3 dS2 =dt 7 1 1 1 0 6 6dES=dt 7 D 0 4 0 1 0 1 dS1 =dt 5 dE=dt 6.3. COMPUTATIONAL APPROACHES 109 Or: dS2 dES dS1 C C D0 dt dt dt dE dES C D0 dt dt From the above equations the following conservation laws should be evident: S2 C ES C S1 D T1 (6.7) ES C E D T2 In summary the algorithm for deriving the conservation laws is as follows: 1. Apply elementary operations to the stoichiometry matrix until the matrix is reduced to its row echelon form. Simultaneously apply the elementary operations to an identity matrix. The size of the identity matrix should be equal to the number of rows in the stoichiometry matrix. 2. If there are zero rows at the bottom of the reduced stoichiometry matrix then there are conservation laws in the network otherwise there are not. The number of conservation laws will be equal to the number of zero rows. 3. Extract the rows in the transformed identity matrix that correspond to the position of the zero rows in the reduced stoichiometry matrix. The extracted rows represent the conservation laws. There are two points worth making when applying this algorithm. The first is that any row swaps made using the row reduction in the stoichiometry matrix will not translate to swaps in the names of the species on the righthand side of the equation. This means that when reading the conservation rows, the names on the columns are not changed by any row exchanges in the stoichiometry matrix. The second point to make is that when carrying out the elementary row operations, it is recommended to eliminate, whenever possible, terms below a leading entry by adding rather than subtracting. This will ensure that entries in the transforming identity matrix remain positive and that the resulting conservation laws will be made up of positive terms. Sometimes the ability to add will not be possible and 110 CHAPTER 6. SPECIES CONSERVATION LAWS subtractions will be necessary. This will result in negative terms appearing in the conservation laws which may make them more difficult to interpret physically. A useful strategy that can be used to avoid negative terms in the conservation equations is to order the rows of the stoichiometry matrix such that any species that is likely to appear in more than one conservation relationship should be placed at the bottom of the stoichiometry matrix. In the case of the previous example we would make sure that ES is located to the bottom row of the stoichiometry matrix. This ordering ensures that the independent species (top rows) are represented by the free variables and the dependent species (bottom rows) by the shared variables. This means that the shared or dependent variables (i.e. complexes) will then be a function of the free variables which is more likely to result in positive terms [52]. A more brute force method is to try all permutations of the matrix rows until a positive set of conservation laws is found. For small models (< 10 species) this approach is a viable option. Although it is possible to manually reduce a stoichiometry matrix, it is far easier to use specialized math software such Scilab, Octave, Matlab and Mathematica or even advanced modern desktop calculators. All these tools offer a rref() command for generating a reduced row echelon. The following examples will illustrate the use of the freely available Scilab application (www.scilab.org) to compute the conservation laws. Example 6.3 Row reduction using Scilab/Matlab. Given the following stoichiometry matrix, use Scilab functions to row reduce and extract the conservation laws. 2 3 S2 1 0 1 ES 6 1 1 7 6 0 7 N D S1 4 1 1 0 5 E 0 1 1 Enter the stoichiometry matrix into the software: -->n = [1 0 -1; 0 -1 1; -1 1 0; 0 1 -1]; Augment the matrix with the identity matrix, this will allow us to record row reduction operations in the identity matrix part of the augmented matrix. 6.3. COMPUTATIONAL APPROACHES -->ni = [n, eye(4,4)] ni = 1. 0. - 1. 0. - 1. 1. - 1. 1. 0. 0. 1. - 1. --> 1. 0. 0. 0. 0. 1. 0. 0. 111 0. 0. 1. 0. 0. 0. 0. 1. 0. - 1. 0. 0. 0. 1. 1. 0. 1. 1. - 1. 1. Row reduce the augmented matrix: -->rni = rref (ni) rni = 1. 0. - 1. 0. 1. - 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. The left partition of the reduced matrix contains two zero rows, therefore there are two conservation laws. These laws correspond to the two bottom rows in the right partition. We extract the rows in the right partition to yield: -->c = rni(3:4,4:7) c = 1. 0. 1. - 1. 0. 1. 0. 1. The species column order is the same as the species row order in the original matrix, that is S2 ; ES; S1 and E, therefore: S2 C S1 E D T1 ES C E D T2 Note the negative E term in the first conservation law. At first glance this does not appear to be the same set of conservation laws that were derived earlier. However, if we substitute E from the second equation into the first we will get the same set of conservation laws: S1 C S2 C ES D T , showing us that the two sets are identical. To avoid negative terms appearing in the conservation laws, we can use the rule that all complex species (that is shared species), such as ES be moved to the bottom of the matrix (See next example). 112 CHAPTER 6. SPECIES CONSERVATION LAWS Example 6.4 Row reduction using Scilab/Matlab. Given the following stoichiometry matrix, use Scilab functions to row reduce and extract the conservation laws. In this example, the shared species ES has been moved to the bottom of the matrix. 3 2 S2 1 0 1 1 0 7 S 6 1 7 N D 1 6 4 0 1 1 5 E 0 1 1 ES The reduced augmented matrix is now: -->rni = rref (ni) rni = 1. 0. - 1. 0. 1. - 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. - 1. 0. 1. 0. 0. 0. 0. 1. - 1. - 1. 1. 1. Once again there are two zero rows but this time the corresponding conservation laws all have positive entries, yielding the following equations: S2 C S1 C ES D T1 ES C E D T2 The following Scilab/Matlab code will find the conservation laws for any stoichiometry matrix. 6.3. COMPUTATIONAL APPROACHES 113 // Compute Conservation Laws // ------------------------// Enter the stoichiometry matrix first n = [1 0 -1; 0 -1 1; -1 1 0; 0 1 -1]; nRows = size(n, 1); // Create the augmented matrix ni = [n, eye(nRows,nRows)]; // Carry out row reduction rni = rref (ni); r = rank (n); // Extract the conservation rows c = rni(r+1:nRows,size(n,2)+1:size(ni,2)); // Display result c Figure 6.10: General purpose Scilab/Matlab code to determine conservation laws using row reduction. Row reduction of the augmented stoichiometry is probably the easiest way to derive the conservation laws. The main advantage of this method includes simplicity and significantly the ability to direct the calculation by setting the order of rows in the initial stoichiometry. However it has one disadvantage which is potential numerical instability for large systems. In particular for large genomic style stoichiometry models [40] that involve many hundreds or even thousands of reactions and species, the method can suffer dramatic failures due to rounding errors during row reduction. In a subsequent section more robust methods will be described that rely on QR factorization [63] and Singular Value Decomposition (SVD). The main disadvantage of these other methods is that sometimes, depending on the particular algorithm, the row order can not be easily prescribed. In any event there are some simple tests one can do to check that the computed conservation laws are correct, one such test will be described next. 114 CHAPTER 6. SPECIES CONSERVATION LAWS Null Space of N T To complete this section let us consider in more detail the algebraic nature of the Y partition in equation 6.6. The elementary matrix, E , reduced the stoichiometry matrix to a row echelon form, that is to: EN D M 0 (6.8) The E matrix corresponds to the same E matrix in equation 6.5, so that we can partition the elementary matrix, E row-wise into X and Y partitions (equation 6.5). X M N D Y 0 From which we can immediately see that: YN D 0 Taking the transpose we obtain N TY T D 0 The Y partition is therefore the null space of the transpose of the stoichiometry matrix (cf. ??). This is a significant result for a number of reasons. It gives a very concise definition of the conservation matrix but more importantly it opens up the possibility of using other computational approaches. The other point of interest is that this result can be used to test whether a set of conservation laws were correctly derived or not. To do this we simply multiply the transpose of N by the transpose of the conservation matrix Y and make sure the product equals zero. 6.3. COMPUTATIONAL APPROACHES 115 Many software packages such as Matlab, Scilab or Mathematica supply commands to compute the null space. This makes is easy to compute the conservation laws by simply computing the null space of the transpose of the stoichiometry matrix. For example the following session shows how we can use Scilab to compute the conservation laws for the example matrix we used in previous examples. -->N = [1 0 -1; -1 1 0; 0 1 -1; 0 -1 1] N = 1. 0. - 1. 0. - 1. 1. - 1. 1. 0. 0. 1. - 1. --> ns = kernel (N') ans = 0. 0.6324555 0. 0.6324555 0.7071068 - 0.3162278 0.7071068 0.3162278 --> // Convert the orthonormal set --> // into a rational basis using rref -->rref (ns')' ans = 1. 0. 1. 0. 0. 1. 1. 1. The null space command in Scilab is kernel, in Matlab it is null and in Mathematical it is NullSpace. Like many null space commands implemented in mathematical software, the kernel command in Scilab has the drawback of generating an orthonormal set. In order to generate a rational basis we must row reduce the kernel, this results in a more interpretable set of conservation laws. In Matlab it is possible to use the modified null space command, null (N, 'r') which will automatically generate a rational basis (Neither Octave or Scilab support this format). Interestingly, Mathe- 116 CHAPTER 6. SPECIES CONSERVATION LAWS matica’s (7.0) null space function does generate a rational basis, however, the algorithm that Mathematica uses is unknown which raises its own issues. Given that we can now compute the conservation laws for arbitrary networks, the next question to consider is whether conservation laws have any behavioral consequences. 6.4 Summary Of particular interest is to compare these results with equations 6.14 and 6.15. Whereas the flux balance relationships are derived from the stoichiometry matrix, the moiety conservation laws are derived from the transpose of the stoichiometry matrix. Thus to summarize: Moiety Conservation Laws: NR L0 I D0 N0 Flux Balance Laws: I NDC NIC D0 K0 NT T D0 NR K D 0 6.5 Behavioral Consequences Conservation laws in general can have profound effects on the behavior of pathway models. Two broad categories can be described, constraints in the form of limiting changes to species and fluxes and behavioral enhancements in the form of new emergent behavior. As discussed by Eisenthal and Cornish-Bowden [15], many traditional drugs, for example pesticides and anti-pathogen agents, work by disrupting 6.5. BEHAVIORAL CONSEQUENCES 117 either flux or metabolite levels to an extent that is harmful to the organism. This can be achieved by either reducing an important flux to unacceptably low levels or increasing the level of a metabolite to toxic proportions. Conservation constraints can impose hard limits to the extent that a drug can influence species levels. This effect is separate from any kinetic constraints that may exist. Thus stoichiometry analysis is an important initial evaluation of whether manipulating a particular target might be effective or not. For an interesting example of these constraints in operation, the reader is referred to the work of Bakker et. al. [3, 4] and also Cornish-Bowden, Eisenthal and Hofmeyr [8, 10]. More interesting is the ability of conservation cycles to enhance the behavioral properties of networks. We will now consider a series of example pathways where conservation laws can have a profound effect of behavior. We will start first with a linear chain, a pathway that has no conservation laws but provides instead a useful reference case to compare subsequent examples. Linear Chain Consider a simple linear chain where the kinetics for each reaction follows simple first order mass-action kinetics. It is assumed that Xo and X1 are fixed. v1 D k1 Xo Xo S1 v2 D k2 S1 v3 D k3 S2 S2 X1 We can investigate the steady state concentrations of S1 and S2 as a function of the rate constant, k1 . Figure 6.11 shows a typical steady state plot of S1 and S2 versus k1 . What is characteristic about this simulation is that the concentrations, S1 and S2 show linear behavior in response to changes in k1 . 118 CHAPTER 6. SPECIES CONSERVATION LAWS Concentration 4 2 S1 S2 0 0 1 2 3 4 5 k1 Figure 6.11: Simulation of the simple linear chain as a function of k1 . Model: Xo -> S1; k1*Xo; S1 -> S2; k2*S1; S2 -> X1; k3*S2; Xo = 1; k1=0.5; k2=1; k3=2 Simple Cycle with Linear Kinetics Instead of a simple linear chain let us now consider a cycle such as the one shown in Figure 6.7. We will again assume that the kinetics governing each cycle arm is simple first order mass-action kinetics. If we plot the steady state concentration of S1 and S2 versus the kinetic constant k1 we get the response curves shown in Figure 6.12. The responses for a cycle are quite different from a linear chain. The response curves are in fact hyperbolic. For example, S2 rises linearly then levels off to 10 concentration units in the limit. What is happening here is that as k1 increases more and more S1 is converted to S2 leading to a rise in S2 and a fall in S1 . The limit is reached because there is only a limited amount of mass in the cycle. A simple conservation law has resulted in a change in behavior from a linear to hyperbolic behavior even though the underlying kinetic laws are unchanged. 6.5. BEHAVIORAL CONSEQUENCES 119 10 Concentration 8 6 4 S1 S2 2 0 0 1 2 3 4 5 k1 Figure 6.12: Simulation of the simple cycle with linear kinetics. Plot shows the steady state concentration of each species as a function of k1 . Model: S1 -> S2; k1*S1; S2 -> S1; k2*S2; S1=10; k1=0.1; k2=0.4 Simple Cycle with Non-Linear Kinetics If we now take the simple cycle model from the last section and instead of linear kinetics we now use non-linear kinetics, for example MichaelisMenten kinetics on the forward and reverse arms then additional changes in behavior will be observed. The response is now sigmoidal rather than hyperbolic. The reason for this is explained in Figure 6.14. The intersection points marked by a grey marker represents the corresponding steady state point (v1 D v2 ). A perpendicular dropped from these indicates the corresponding steady state concentration of S1 . If the activity of v1 is increased by increasing k1 by 20% then the v1 curve moves up. The left intersection point indicates how much the steady state concentration moves as a result, shown by S . The closer the steady state point is to the saturated point of the curve, the more the steady state will move. This shows that the response in S1 can be very sensitive in changes in k1 . Because k1 is a linear term in the rate law we could replace it with the concentration of the enzyme implied in the Michaelis-Menten law. In practice such a cycle could represent a phos- 120 CHAPTER 6. SPECIES CONSERVATION LAWS 10 Concentration 8 6 S1 S2 4 2 0 0 0:2 0:4 0:6 0:8 1 k1 Figure 6.13: Simulation of the simple cycle with non-linear kinetics illustrating sigmoid or ultrasensitive behavior. Model: S1 -> S2; k1*S1/(Km1+S1); S2 -> S1; k2*S2/(Km2+S2); S1=10; k1=0.1; Km1=0.5; k2=0.4; Km2=0.5 phorylation/dephophsorylation cycle where the implied enzyme is now a kinase. The kinase in turn could be controlled by other processes so that changes in the kinase activity results in sigmoid (or switch like) behavior in the cycle dynamics. In the literature such behavior is termed ultrasensitivity [20, 21] and has been observed experimentally [26]. Dual Cycle We can also consider double cycles such as the one shown in Figure 6.15. We can write out the stoichiometry matrix for the double cycle as: 2 1 4 1 N D 0 1 1 0 0 1 1 3 0 15 1 From this it is possible to show that there is one conservation law given by the relation: Reaction Rate v1 and v2 6.5. BEHAVIORAL CONSEQUENCES 121 1 0:5 0 v1 k1 C 20% k1 v2 S 0 0:2 0:4 0:6 0:8 1 S1 Figure 6.14: Plots the two cycle rates, v1 and v2 for the simple cycle with non-linear kinetics. Model: S1 -> S2; k1*S1/(Km1+S1); S2 -> S1; k2*S2/(Km2+S2); S1=1; k1=1; Km1=0.05; k2=1; Km2=0.05. The intersection points marked by a grey marker represents the steady state point (v1 D v2 ). See main text for explanation. S1 C S2 C S3 D T If we assume simple linear mass-action kinetics for each of the reactions, simulation will reveal that the concentration of S3 shows sigmoid behavior with respect to the stimulus signal S . We can assume that the stimulus signal, S, operates on the rate constants, k1 and k3 by the same factor, that is an increase in S by x% results in a change in k1 and k3 by x%. What is of interest is that we no longer need non-linear kinetics to generate sigmoidal behavior but can instead rely on only a small increase in the complexity of the conservation laws. The Markevich Switch The next example will illustrate a fairly complex set of interlinked conservation laws that leads to quite elaborate behavior. This system, first discovered by Kholodenko and co-workers et al. will be referred to as the Markevich Switch after the first author on the original paper [36]. 122 CHAPTER 6. SPECIES CONSERVATION LAWS S v3 v1 S1 S3 S2 v2 v4 Figure 6.15: Two cycles in sequence. The rate laws for each step is given by v1 D k1 S1 , v2 D k2 S2 , v3 D k3 S2 , v4 D k4 S3 . S is the stimulus signal which acts by increasing k1 and k3 by the same factor. The system involves a double cycle but with secondary sequestration effects occurring on the limbs. Figure 6.16 illustrates the full pathway. The model describes the catalysis of the conversion of S1 through two enzyme catalyzed reactions, v1 and v2 . The individual catalytic cycles are made explicit in this model, that is, the binding of S1 to enzyme E1 to form complex and dissociation to form product, S2 is explicitly modeled. In addition there is the reverse conversion of S3 back to S1 , again by a sequence of two enzyme catalyzed reactions, v3 and v4 again in explicit form. The stimulus, S , acts by adding more total E1 to the upper limbs. This pathway has multiple conservation laws stemming from the two different enzymes and a separate substrate cycle. These conservation laws include: S1 C S2 C S3 C ES1 C ES2 C ES3 C ES4 D T1 (6.9) E1 C ES1 C ES2 D T2 (6.10) E2 C ES3 C ES4 D T3 (6.11) Figure 6.18 illustrates graphically the three conservation laws. The behavior shown by the pathway is called bistability. That is, given a particular set of parameters, there exists three possible steady states, two stable and one unstable (sometimes called metastable). We can see this depicted in the steady state plot that shows the concentration of S3 versus 6.5. BEHAVIORAL CONSEQUENCES 123 S E1 E1 ES1 ES2 v1 S1 v2 S3 S2 ES4 ES3 E2 E2 v4 v3 Figure 6.16: A complex interlinked set of conserved cycles that describes the Markevich switch [36]. S controls the activity of the pathway by controlling the amount of total E1 . total E1 (E1 C ES1 ). At a certain range of total E1 , the curve shows three possible steady states. A high stable state, a low stable state and an intermediate unstable state (thin line in the graph). In principle the unstable state could be achieved and maintained indefinitely but random fluctuations at the molecular level would move the network to one of the two stable steady state. The question is how does this come about? A major part of the answer lies in the constraints imposed by the conservation laws. Consider the following scenario. If the activity of the two forward limbs, v1 and v2 is increased, this will cause more S2 and S3 to be made. These changes have a number of consequences. To begin with, the additional S3 will bind to more E2 to form complex ES3 . However because ES3 is linked by way of a conservation law (6.11) to the levels of ES4 and E2 , these concentrations will therefore decline. This effectively makes S3 compete with S2 for E2 . The result is that there is less E2 to catalyze v4 resulting in an effective inhibition of v4 by S3 . This kind of inhibition has been called apparent regulation because there is not direct molecular mechanism involved, it is simply an effect brought about by competitive sequestration. There are other factors in play here as well, 124 CHAPTER 6. SPECIES CONSERVATION LAWS 1. E1 E1 ES1 ES2 v1 S1 v2 ES3 E2 E2 v3 v4 2. S3 S2 ES4 3. E1 E1 v2 v1 E1 E1 ES1 ES2 ES1 ES2 v1 S1 S3 S1 S2 v2 S3 S2 ES4 ES3 ES4 ES3 E2 E2 E2 E2 v4 v4 v3 v3 Figure 6.17 for example the degree of saturation (see [36] for details), however the constraints imposed by the conservation laws are critical to the observed bistability. Given that S2 and S3 have both increased then S1 is likely to have decreased (6.10). If this is the case then there is less binding of S1 to E1 . This results in a greater availability of E1 which can be used to increase v2 . If we invert the logic here then we see that increases in S1 will lead to decreases in v2 . This is another example of apparent regulation due to conservation law constraints, in this case equation (6.10). We can therefore redraw the pathway in a more simplified way as depicted in Figure 6.19). We can simplify this diagram even further by removing the central link, S2 to give the diagram shown in Figure 6.20. This shows more clearly the opposing repression loops that surround the pathway. In essence, what we have here is a toggle switch. Consider the states that can possible exist in the pathway shown in Figure 6.20. If the concentration of S1 is low then this relieves the inhibition on the forward limb this converting S1 into S2 and thus maintaining S1 in the low state. S2 6.5. BEHAVIORAL CONSEQUENCES 125 0.8 0.7 0.6 S3 0.5 0.4 0.3 SN 0.2 0.1 SN 0 0.1 0.2 0.3 0.4 0.5 0.6 Total E1 Figure 6.18: Bifurcation plot illustrating bistability in the concentration of S3 as a function of E1 . The symbol SN indicates a turning point, i.e. a change in stability. Thick lines represent stable branches and the thinner central line an unstable branch. Simulations were carried out by the Oscill8 Tool (oscill8.sf.net), the model was obtained from [45] as a SBML file via the BioModels Database (http://www.ebi.ac. uk/biomodels-main/). is now at a higher concentration and its effect is to repress the low limb. This state of affairs is therefore stable. If on the other hand we start S1 at a high concentration, the reverse logic applies. The forward limb is now repressed this stabilizing S1 at its high state. In contract S2 must now be at a low concentration where the repression it apply to the lower limb is now released thus stabilizing it’s low level. Sequestration Based Ultrasensitivity To illustrate one last example where conservation laws contribute to new behavior, we will look at a very simple linear pathway where there is a dead-end leak caused by complex formation. The observed ultrasensitivity is in response to a change in the stimulus signal originates from a combination of kinetic and conservation factors. Sigmoid behavior can be observed in both the free species, X and the complex XI forms. Saturation 126 CHAPTER 6. SPECIES CONSERVATION LAWS v3 v1 S1 S3 S2 v2 v4 Figure 6.19: Two apparent regulatory loops in the Markevich pathway. S1 S3 Figure 6.20: A highly simplified version of the Markevich pathway showing the opposing repression loops that surround the pathway. in the llevel of XI is due to a conservation law involving the I moiety. To achieve a saturating effect in X, the second step, v2 should be modeled using a Michaelis-Menten rate law (itself based on a conservation law between free enzyme and enzyme substrate complex) and the first step, v1 should be reversible to ensure that a steady state exists at high stimulus levels (X would go to infinity otherwise). Figure 6.22 shows an example simulation that illustrates ultrasensitivity in a simple sequestration model. 6.6 Advanced Theory In this section we will look at further aspects of conservation laws analysis using a more formal approach. In a later section we will also consider more advanced numerical methods for computing conservation laws. Let us begin by assuming that the rows of the stoichiometry matrix have 6.6. ADVANCED THEORY 127 v1 Inh v2 X v3 v4 Inh XI Figure 6.21: Simple Sequestration Model been arranged so that the top rows, mo include the independent rows and the bottom m mo rows the dependent rows. If we designate the top rows with the symbol NR and the bottom rows by N0 we can write the stoichiometry matrix as: NR N D N0 where the submatrix NR is full rank, and each row of the submatrix N0 can be derived by is a linear combination of the rows of NR . We can also reorder the columns of the stoichiometry matrix of which there will also be mo independent columns (column and rows ranks re equal). We will denote the partition of N that contains the last mo columns, the NC matrix. Finally we will designate the partition of N that includes only the independent rows and columns the NRC matrix. The NRC matrix will be a mo mo square invertible matrix. NRC must be invertible because all rows and columns are independent. The graphical depiction of this partitioning is given in Figure ??. If there are no conserved cycles in the network, then the rank (N ) = m (i.e. full rank) and N equals NR . Following Reder [48] Ehlde [14] and Hofmeyr [24], we make the following construction. Since the rows of N0 are linear combinations of the rows of NR we can define a link-zero matrix, L0 which satisfies N0 D L0 NR : (6.12) 128 CHAPTER 6. SPECIES CONSERVATION LAWS Concentration 0:4 XI I 0:2 0 0 10 20 30 Simulus 40 50 Figure 6.22: Ultrasensitivity by Simple Sequestration: Xo -> X; stimulus*(k11*Xo - k12*X); X ->; k2*X/(X + Km); X + Inh -> XI; k3*X*Inh - k4*XI; Xo=1; k11=0.1; k12=0.5; k2=1; k3=0.5; k4=0.1; Inh=1; Km1=0.001. Xo is fixed. L0 will have dimensions .m mo / mo . We can combine L0 with the identity matrix – of dimension rank.N / – to form the m mo link matrix, L, thus: I LD L0 When N has full rank, L equals the identity matrix. Using equation (6.12) and the link matrix we can write: I NR N D D NR D LNR N0 L0 For networks without conserved moieties the L matrix reduces to the identity matrix, I. If we delete the dependent columns of N and NR we obtain: NC D L0 NRC or L D NC NRC 1 By partitioning the stoichiometry matrix into a dependent and independent set we also partition the system equation. The full system equation which 6.6. ADVANCED THEORY 129 n0 m0 NR NRC m0 m N= N0 NC n Figure 6.23: Partitioning of the Stoichiometry Matrix into Four Fundamental Partitions. describes the dynamics of the network is thus: dS I dSi =dt NR v D D L0 dSd =dt dt where the terms dSi =dt and dSd =dt refer to the independent and dependent rates of change respectively. From the above equation, we see that dSd dSi D L0 : dt dt Integrating this last equation, we find Sd .t / Sd .0/ D L0 ŒSi .t / Si .0/ for all time t . Introducing the constant vector T D Sd .0/ can write the above equation as L0 I Si Sd L0 Si .0/, we DT (6.13) 130 CHAPTER 6. SPECIES CONSERVATION LAWS Recalling that S D .Si ; Sd /, we can introduce this concisely as D Œ L0 I, and write S DT We will call the conservation matrix and is equivalent to the Y matrix in equation 6.6. Each row of the conservation matrix relates to a particular conserved cycle and thus the number of rows indicates the number of conserved cycles in the network. The elements in a particular row indicate which metabolite species contribute to a particular cycle. The relationship, N0 D L0 NR can be reexpressed in the following form: NR L0 I D0 (6.14) N0 However since the conservation matrix, D Œ L0 I, the above relation can be rewritten as: N D 0. Taking the transpose of this gives us NT T D0 (6.15) We have already seen this equation in a previous section (6.3 and tells us that the conservation matrix is the null space of the transpose of the stoichiometry matrix. An equivalent way to state this is that the conservation matrix is the left null space of the stoichiometry matrix ( N D 0). The significance of equation (6.15) is that there are many software tools that allow one to compute the null space very easily. For example Matlab, Mathematica, Maple, O-Matrix, Jarnac or Scilab can easily compute the null space of a matrix and thus derive the conservation laws. Some of these tools however, for example Scilab and Matlab, do not normalize the null space so that a second stage is required, but this is easily accomplished with the command rref. Matlab has a variant on the null command, null (A, 'r') which generates what is called a rational basis. In Scilab one would enter, cm = rref (kernel (N')'). The final transpose that is applied is simply to reorientate the conservation matrix for better viewing. In Jarnac one would enter, cm = tr (ns (tr (N))) and so on. One advantage to using Jarnac is that matrices are labeled with the reaction and species names which allows the conservation matrix to 6.6. ADVANCED THEORY 131 be easily interpreted without having to manually identify the columns. In addition Jarnac can generate a labeled stoichiometry directly from a model expressed in standard SBML. Returning once again to the network shown in Figure 6.9, equation (6.13) can be rearranged so that the dependent species can be computed from the independent species, that is: Sd D L0 Si C T (6.16) The complete set of conservation law equations for this model is therefore, equation (6.16): S1 E 1 0 D dS2 =dt dES=dt 1 1 D 1 0 0 1 S2 ES T1 C T2 2 3 v1 1 4 v2 5 1 v3 (6.17) Note that even though there appears to be four variables in this system, there are in fact only two independent variables, fES; S1 g, and thus two differential equations and two linear constraints. When solving the system in time, only two differential equations need to be explicitly integrated. Scaled L In metabolic control analysis [30, 48, 17] the link matrix, L plays a central role in formulating the sensitivities. In such cases the scaled version of L, denoted, L is often used. L is defined as: L D .D s / 1 L D SI where D represents a diagonal matrix of either the reciprocals of species, D s or a diagonal of the independent species, D SI . For the previous example, L would be given by: 132 CHAPTER 6. SPECIES CONSERVATION LAWS 2 1=S2 0 6 0 1=ES LD6 4 0 0 0 0 32 0 0 6 0 0 7 76 5 4 1=S1 0 0 1=E 1 0 1 0 3 0 17 7 S2 0 15 0 ES 1 6.7 Numerical Methods In a previous section 6.3, a simple method based on forward elimination was described that could be used to derive the conservation laws. This method has a number of advantages but for large matrices can be numerically unstable. In this section we will review alternative methods that, although not always as flexible as forward elimination, are however well suited for the analysis of large matrices. These methods fall into two groups, three methods based on QR factorization and one method based on Singular Value Decomposition (SVD). The method based on SVD is the simplest and will be described first. SVD Singular Value Decomposition, or SVD is a very useful method for decompiling a matrix into the four orthonormal fundamental subspaces. These subspaces include the range and null space of the matrix and its transpose. SVD is based on the following factorization: A D USV T where A be a m n matrix of real numbers, U is a m m orthonormal matrix, V is an n n orthonormal matrix and S a m n diagonal matrix with entries 1 2 : : : p where p is either m or n, which ever is the smallest (p D minfm; ng). The numbers, i are called the singular values and are positive. The columns of U and V form the left and right-hand singular vectors. Of more interest here is the fact that the rows of V which correspond to 6.7. NUMERICAL METHODS 133 the zero singular values of A form an orthonormal basis for the null space of A. Therefore on way to obtain the null space of a given matrix is to extract these lower rows from the V matrix. The number of rows in V that correspond to the null space vectors will equal n r where r is the rank and n the number of columns of A. If there are no zero rows in the S matrix then the null space is empty. Example 6.5 Obtain an estimate for the null space of the transpose of the following stoichiometry matrix using SVD. Since we will be working on the transpose, the null space vectors will represent the conservation laws. 2 1 6 1 6 N D4 0 0 0 1 1 1 3 1 07 7 15 1 Many math applications such as Scilab or Matlab have svd functions. Here we will use the svd function from Scilab. -->[U, S, V] = svd (N') V = -0.316229 -0.707107 0.632456 0. -0.316229 0.707107 0.632456 0. -0.632456 0. -0.316229 0.707107 0.632456 0. 0.316229 0.707107 S = 2.236068 0. 0. 0. 0. 1.7320508 0. 0. 0. 0. 1.587D-16 0. U = -1.886D-16 -0.8164966 0.5773503 -0.7071068 0.4082483 0.5773503 0.7071068 0.4082483 0.5773503 We can extract the null space from V T . The number of zero rows in the S matrix is two, therefore we must extract the bottom two rows of V T . This gives us: -->Vt = V' 134 CHAPTER 6. SPECIES CONSERVATION LAWS -->Vt(3:4,1:4) 0.6324555 0.6324555 -0.3162278 0.3162278 0. 0. 0.7071068 0.7071068 SVD returns an orthonormal basis, to generate a rational basis we apply row reduction to these two rows to yield: -->rref (kk) ans = 1. 1. 0. 0. 0. 1. 1. 1. The transpose of these two vectors is the null space of N T . This can be confirmed by computing the product N T N .N T / and showing that the product equals zero: 2 1 4 0 1 1 1 0 0 1 1 2 3 1 0 6 1 15 6 40 1 1 3 2 0 0 07 7 D 40 15 0 1 3 0 05 0 Because there are no row or column exchanges during SVD, the rows in the null space vectors correspond to the same rows in the original matrix, N . This makes it easy to identify the individual conservation entries in the conservation law vectors. We can formalize the SVD algorithm using the following Scilab/Matlab code. // Use SVD to estimate conservation laws // Operate on the transpose of n [u, s, v] = svd (n'); vt = v'; nRows = size(vt, 1); nCols = size(vt, 2); // Extract bottom nCols(n')-rank orthonormal rows orthogns = Vt(r+1:nRows,1:nCols); // Row reduce the transpose to get rational basis ratns = rref (orthogns)'; // Display Result 6.7. NUMERICAL METHODS 135 ratns' // Confirm it is the null space, ns should equal 0 ns = n'*ratns Since there are no column or row exchanges during SVD, the order of the rows in the stoichiometry matrix can be used to influence the form of final conservation laws. Just like the row reduction technique, the order of rows in the stoichiometry matrix should be such that any shared species (i.e species containing more than one moiety) be located as close to the bottom of the matrix as possible. This will ensure that negative terms will tend not appear in the final conservation equations. QR Factorization The SVD method given in the last section is an excellent choice for determining the conservation laws. However, it has two downsides, the first is that it is far more computationally intensive that the simple row reduction technique described in 6.3. The second problem with the SVD approach is the need to carry out a final Guass-Jordan elimination to obtain a rational basis for the conservation laws. Depending on the size of the stoichiometry matrix Guass-Jordan elimination can be numerically unstable. Methods that have both excellent stability properties and are less computationally intense than SVD are methods based on QR factorization. The first QR method to describe is based on computing L0 . Any m n matrix can be factored into a product of two matrices Q and R and a permutation matrix P: AP D QR Q is an m m orthogonal matrix, that is QT Q D I, R is a m n upper trapezoidal matrix and P a permutation matrix. If A is the transpose of the stoichiometry matrix N T , the the permutation matrix will also reorder the columns of N T such that the independent columns are on the left and the dependent rows on the right. This is equivalent to reordering the rows in N . This partitioning can be written as follows where R has been partitioned to match the left side: 136 CHAPTER 6. SPECIES CONSERVATION LAWS T Q NR T N0 T R 11 R 12 D 0 0 Note that the partitioned matrix has been absorbed into the reordered N T matrix during the reordering. If we multiply out the terms we obtain: R 11 D Q T NR T 0 R 12 D Q T N0 T 0 Given that N0 D L0 NR , R 12 can be rewritten as: R 12 D QT NR T L0 T 0 so that R 12 R 11 D L0 T 0 0 That is R 12 D R 11 L0 T Since the permutation matrix post-multiplies N T , it means that the columns are reordered, this is reflected in column reordering in the R matrix such that all independent columns are moved to the left and dependent columns to the right. Row reduction of the R matrix to a reduced echelon form will therefore result in the left partition being transformed into the identity matrix, that is R 11 D I. From this it follows that the reduced left partition, R 12 D L0 T , which is the result we seek. L0 D R T12 By augmenting the L0 matrix with an appropriately sized identity matrix we can use this method to generate conservation laws int he standard 6.7. NUMERICAL METHODS 137 form, that is in the form Œ L0 I. This also means that the rows of the stoichiometry matrix will also have been reordered in the process as determined by the permutation matrix obtained from the QR factorization. Therefore, unlike the row reduction technique or SVD, it is not possible to greatly influence the kind of conservation laws generated by presetting the row order of the stoichiometry matrix although some flexibility still exists. It is still advantageous to make sure that all the shared species are in the bottom rows. The one potential problem with the method is the final Guass-Jordan elimination, however the reordering of the columns will make this less of an issue. The Scilab/Matlab code below illustrates an implementation of this method. It is very important to note that the species labels attached to the columns of the conservation matrix is determined by the permutation matrix. This part of the calculation is not shown in the following code. // Use QR to estimate conservation laws via Lo // Operate on the transpose of n [qm, rm, p] = qr (n'); nRows = size(n, 1); nCols = size(n, 2); mo = rank (n); m = size(n, 1); mmo = m - mo; // Extract bottom nCols-rank orthonormal rows rt = rm(1:r,1:nRows); // Row reduce the transpose to get a rational basis rrt = rref (rt); Lo = rrt(1:mo,mo+1:nRows)'; // Display Lo Lo // Construct the conservation vectors and display cm = [-Lo eye(mo,mo)]; cm Example 6.6 Compute the L0 matrix of the following stoichiometry matrix using QR factorization. 138 CHAPTER 6. SPECIES CONSERVATION LAWS 2 1 6 1 6 N D4 0 0 0 1 1 1 3 1 07 7 15 1 Many software tools offer standard QR factorization. In this example we use Scilab. QR factorization yields the following R matrix: R = 1.414217 -0.707107 -1.414217 -0.707107 0. 1.224745 0. -1.224745 0. 0. 0. 0. Since the rank of the stoichiometry matrix is 2, we extract the top two rows from R and carry our a complete row reduction (for example by using the rref() function) to yield: ans 1. 0. = 0. - 1. 1. 0. - 1. - 1. The transpose of the L0 matrix can be found in the top right corner starting at column mo C 1 where mo equals the number of independent rows in the original stoichiometry matrix. In this case mo equals 2, therefore the L0 matrix (after transposition) is given by: -1 -1 0 -1 We now combine the negative of this with the identity matrix to obtain the conservation vectors: 1 1 0 1 1 0 0 1 The only thing that remains is the species labeling for the conservation columns. These can be obtained from the original stoichiometry matrix and the permutation matrix, P. As returned by the QR factorization, P is given by: 6.7. NUMERICAL METHODS P = 1. 0. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 139 0. 0. 0. 1. and the original species order was ES; E; S1 ; S2 . The permutation matrix shows that the new species order should be: ES; S1 ; E; S2 . The final QR method to consider is one based on rank revealing methods, sometimes called RRQR [7].. The algebra is described in a separate chapter but the method uses the following formula to estimate the null space: AP R 11 1 R 12 I D0 (6.18) This approach is of interest because it generates a rational basis for the null space because of the identity matrix in the lower partition. The downside is that it requires an inversion of R 11 but since R 11 is triangular it is possible to exploit widely available and efficient routines for inverting such matrices. Example 6.7 Use the RRQR based method to compute the null space for the transpose of the stoichiometry matrix: 2 1 6 1 N D6 4 0 0 0 1 1 1 3 1 07 7 15 1 From the last example we saw that QR factorization yielded the following R matrix: R = 1.414217 -0.707107 -1.414217 -0.707107 0. 1.224745 0. -1.224745 0. 0. 0. 0. 140 CHAPTER 6. SPECIES CONSERVATION LAWS Since the rank of the stoichiometry matrix is 2, we can partition R into the following submatrices: R11 D 1:414217 0: We now compute R 11 0:707107 1:224745 1 R12 D 1:414217 0: 0:707107 1:224745 R 12 to obtain: 1 0 1 1 Combining this with an appropriately sized identity matrix gives the null space: 2 1 60 6 41 0 3 1 17 7 05 1 Like the previous method we need to be aware of the permutation matrix as this will determine the labels that are associated with the rows of the null space. There are also ways to obtain the conservation vectors via the Q matrix and these are discussed in [63]. For a completely different approach to computing the conservation laws, the reader is referred to the work by Schuster and colleagues. In this work, convex analysis [57] is used to determine the conservation laws and is used primarily to generate conservation laws that only contain (where possible) positive entries. Most modern simulation applications either use the simpler row reduction technique or more commonly in recent years, they use the QR factorization technique based on estimating the L0 matrix [63]. 6.8 Design of Simulation Software On practical implication of moiety conservation concerns the design of software for simulation and analysis. Two issues arise, one concerns in- 6.8. DESIGN OF SIMULATION SOFTWARE Method Advantages Disadvantages Row Reduction a) Simple b) Fast c) Row Order Potential numerical instabilities SVD a) Robust b) Expensive on large systems Requires one final Gauss-Jordan step QR by L0 a) Robust b) Faster than SVD Requires one final Gauss-Jordan step QR by RRQR a) Robust b) Row order No Gauss-Jordan step required 141 Table 6.1: Comparison of different approaches to computing conservation laws. creasing simulation efficiency by reducing the number of differential equations and the second concerns numerical stability by removing the dependent species from a model. The rule to follow is to make sure that any metabolite likely to appear in more than one conservation relationship must be placed at the beginning of the DEC statement. Reduced Systems The first concern is straight forward, instead of solving the full set of systems equations many simulator instead solve the following reduced set: Sd D L0 Si C T dS i D NR v.Si ; Sd / dt (6.19) 142 CHAPTER 6. SPECIES CONSERVATION LAWS In these equations, Si is the vector of independent species, Sd , the vector of dependent species, L0 the link matrix, T the total mass vector, NR the reduced stoichiometry matrix and v the rate vector. This modified equation (6.19) constitutes the most general expression for a differential equation based temporal model [24, 23]. Equations 6.6 shows a typical reduced system. Note that in these equations the dependent species are first computed from the dependent species. This is followed by the evaluation of the reduced set of differential equations. The order is crucial. The total amounts, T , can be computed at the start of a simulation by using equation 6.16 and the initial conditions. In multi-compartmental systems where the size of compartments may differ, it is important to sum the amounts not concentrations. One obvious advantage reducing the model is that it lessens the computationally burden of solving the full set of differential equations. Many biochemical simulation packages will automatically check for moiety conservations and perform this simplification before performing any analysis of the system equations. This is especially important for large models. For example, in the E. coli model obtained from PalssonŠs web site at http://gcrg.ucsd.edu/organisms/ecoli.html, approximately five percent of the differential equations are redundant, that is they can be safely eliminated from the model by using moiety conservation constraints. Multicompartment Systems Up to now we have not mentioned the fact that many models may include multiple compartments, that is separate volume spaces where the movement of mass between volumes is via specific transporter proteins. The literature is not very clear or extensive in discussing the modeling of multicompartment systems however one crucial point to bear in mind when considering conservation laws that cross compartments is that the sum must be with respect to the total mass. For convenience models will often assume a unit volume for a compartment such that any conserved cycles within the compartment are expressed as the sum of concentrations. In such situations it is easy to forget that what is actually conserved is in fact mass not concentration. In general a conservation law is therefore expressed in the 6.8. DESIGN OF SIMULATION SOFTWARE 143 form: X Vi Si D T where Vi is the volume that the concentration of species Si resides. Numerical Stability Although simplifying a model by eliminating the dependent species can offer speed improvements to simulations, the most important reason for model reduction is the gain in numerical stability. One of the most important metrics that arises often in the analysis of pathways (or any dynamical system for that matter) is the Jacobian matrix. The Jacobian matrix is an m m matrix of partial derivatives of the rates of change with respect to the species, that is: J D @ @S dS dt For example, for a simple linear chain such as: the differential equations v1 D k1 Xo Xo v2 D k2 S1 S1 v3 D k3 S2 S2 X1 are given by: dS1 D v1 dt v2 dS2 D v2 dt v3 The Jacobian matrix is then given by 2 J D @.v1 v2 / 4 @S1 @.v2 v3 / @S1 3 @.v1 v2 / @S2 5 @.v2 v3 / @S2 " D k2 0 k2 k3 # 144 CHAPTER 6. SPECIES CONSERVATION LAWS The Jacobian is used in many ancillary calculations, for example, solving differential equations (particularly stiff equations), solving for the steady state, calculating sensitivities, frequency analysis, certain optimization algorithms and others. In many of these cases the calculation involves the inversion of the Jacobian. In the case of the linear pathway, there will always be an inverse so long as the rate constants are non-zero. However if we consider a simple cycle such as the one shown below: v2 D k 1 S 1 S1 S2 v1 D k 2 S 2 then the Jacobian matrix is given by: 2 J D @.v1 v2 / 4 @S1 @.v2 v1 / @S1 3 @.v1 v2 / @S2 5 @.v2 v1 / @S2 " D k1 k2 k1 k2 # This shows that the row dependencies in the stoichiometry matrix reappear as dependencies in the Jacobian. This means that the Jacobian cannot be inverted and any calculations that require the inversion of the Jacobian will fail. The solution is to work with the reduced model, this eliminates the dependent species from the stoichiometry matrix which in turn makes sure that the Jacobian is once again invertible. 6.8. DESIGN OF SIMULATION SOFTWARE 145 Exercises 1. The network depicted below has a single conservation law, A C B C C C D D T . Using the row reduction technique described in section 6.3, prove that this conservation law is true. v1 B D A v2 C v4 v3 v1 v3 A B v2 C v4 2. Write a simple application in Matlab, Scilab or Octave to derive the conservation laws for an arbitrary stoichiometry matrix. vquestion v1 3 3. Using the determine the v1 application B v3 written in the previous conservation laws for the following models, confirm that the conserA B C vationA laws are true. D v2 C v4 v4 v2 (a) v1 A B v2 C D v3 (b) 146 CHAPTER 6. SPECIES CONSERVATION LAWS B v1 A D A v2 C v3 v1 v3 v4 B v2 C v4 (c) v1 A !B v2 B CC !ACD v3 D !C 4. Carry out a simulation that illustrates the high sensitivity seen in a simple conserved cycle that uses saturable Michaelis-Menten rate laws (See Figure 6.13). Math Practice 1. Row reduce the following matrices to reduced echelon form. 2 1 2 2 41 2 4 1 3 9 3 2 1 2 2 4 5 3 ; 2 4 3 4 6 3 2 3 0 1 1 5 4 2 2 ; 2 4 6 3 3 6 3 2 2 9 1 5 4 3 1 ; 2 0 5 0 1 0 1 3 2 35 1 References [1] R Albert. Scale-free networks in cell biology. J Cell Sci, 118(Pt 21):4947–4957, Nov 2005. [2] Eric Alm and Adam P Arkin. Biological networks. Current Opinion in Structural Biology, 13(2):193 – 202, 2003. [3] B. M. Bakker, P. A. M. Michels, F. R. Opperdoes, and H.V. Westerhoff. What controls glycloysis in bloodstream form trypanosoma brucei. J. Biol. Chem., 274:14551–14559, 1999. [4] B. M. Bakker, H. V. Westerhoff, F. R. Opperdoes, and P. A. M. Michels. Metabolic control analysis of glycolysis in trypanosomes as an approach to improve selectivity and effectiveness of drugs. Mol. Biochem. Parasitology, 106:1–10, 2000. [5] A L Barabási and Z N Oltvai. Network biology: understanding the cell’s functional organization. Nat Rev Genet, 5(2):101–113, Feb 2004. [6] Thomas W. Binsl, Katharine M Mullen, Ivo H.M. van Stokkum, Jaap Heringa, and Johannes H.G.M. van Beek. Fluxsimulator: An r package to simulate isotopomer distributions in metabolic networks. Journal of Statistical Software, 18(7):1–17, 1 2007. [7] T.F. Chan and P.C. Hansen. Some Applications of the Rank Revealing QR Factorization. SIAM Journal on Scientific and Statistical Computing, 13:727, 1992. [8] A. Cornish-Bowden and R. Eisenthal. Computer simulation as a tool for studying metabolism and drug design. In A. Cornish-Bowden 147 148 REFERENCES and M. L. Cardenas, editors, Technological and Medical Implications of Metabolic Control Analysis, pages 165–Ű172. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000. [9] A. Cornish-Bowden, J. Hofmeyr, and M. Cardenas. Stoicheiometric analysis in studies of metabolism. Biochemical Society Transactions, 30:43–47, 2002. [10] A. Cornish-Bowden and J.-H. S. Hofmeyr. The role of stoichiometric analysis in studies of metabolism: An example. J. theor. Biol, 216:179–191, 2002. [11] Marjo de Graauw, editor. Phospho-Proteomics, volume 527 of Methods in Molecular Biology. Humana Press, 2009. [12] Y. Deville, D. Gilbert, J. van Helden, and S.J. Wodak. An overview of data models for the analysis of biochemical pathways. Briefings in Bioinformatics, 4(3):246–259, 2003. [13] R. C. Dickson and M. D. Mendenhall, editors. Signal Transduction Protocols, volume 284 of Methods in Molecular Biology. Humana Press, 2n edition edition, 2004. [14] M. Ehlde and G. Zacchi. A general formalism for metabolic control analysis. Chemical Engineering Science, 52:2599–2606(8), 1997. [15] R. Eisenthal and A. Cornish-Bowden. Prospects for antiparasitic drugs the case of Trypanosoma brucei, the causative agent of African sleeping sickness. Journal of Biological Chemistry, 273(10):5500– 5505, 1998. [16] D. A. Fell and J. R. Small. Fat synthesis in adipose tissue: an examination of stoichiometric constraints. Biochem. J., 238:781–786, 1986. [17] D.A. Fell. Understanding the Control of Metabolism. Portland Press., London, 1997. [18] S Fields and O Song. A novel genetic system to detect protein-protein interactions. Nature, 340(6230):245–246, 1989. REFERENCES 149 [19] Anne-Claude Gavin, Patrick Aloy, Paola Grandi, Roland Krause, Markus Boesche, Martina Marzioch, Christina Rau, Lars Juhl Jensen, Sonja Bastuck, Birgit Dümpelfeld, Angela Edelmann, Marie-Anne Heurtier, Verena Hoffman, Christian Hoefert, Karin Klein, Manuela Hudak, Anne-Marie Michon, Malgorzata Schelder, Markus Schirle, Marita Remor, Tatjana Rudi, Sean Hooper, Andreas Bauer, Tewis Bouwmeester, Georg Casari, Gerard Drewes, Gitte Neubauer, Jens M Rick, Bernhard Kuster, Peer Bork, Robert B Russell, and Giulio Superti-Furga. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084):631–636, Mar 2006. [20] A. Goldbeter and D. E. Koshland. An amplified sensitivity arising from covalent modification in biological systems. Proc. Natl. Acad. Sci, 78:6840–6844, 1981. [21] A. Goldbeter and D. E. Koshland. Ultrasensitivity in biochemical systems controlled by covalent modification. interplay between zeroorder and multistep effects. J. Biol. Chem., 259:14441–7, 1984. [22] C.S. Goodyear and G.J. Silverman. Phage-Display Methodology for the Study of Protein-Protein Interactions: Overview. Cold Spring Harbor Protocols, 2008(9), 2008. [23] R. Heinrich and S Schuster. The Regulation of Cellular Systems. Chapman and Hall, 1996. [24] J.-H. S. Hofmeyr. Metabolic control analysis in a nutshell. In Proceedings of the Second International Conference on Systems Biology. Caltech, 2001. [25] AB Horne, TC Hodgman, HD Spence, and AR Dalby. Constructing an enzyme-centric view of metabolism. Bioinformatics, 20(13):2050–2055, 2004. [26] C. F. Huang and J. E. Ferrell. Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc. Natl. Acad. Sci, 93:10078–10083, 1996. [27] J. L. Ingraham. Growth of the Bacterial Cell. Sinauer Associates Inc, 1983. 150 REFERENCES [28] T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci U S A, 98(8):4569–4574, Apr 2001. [29] H. Jeong, S. P. Mason, A. L. Barabási, and Z. N. Oltvai. Lethality and centrality in protein networks. Nature, 411(6833):41–42, May 2001. [30] H. Kacser and J. A. Burns. The control of flux. In D. D. Davies, editor, Rate Control of Biological Processes, volume 27 of Symp. Soc. Exp. Biol., pages 65–104. Cambridge University Press, 1973. [31] P. D. Karp, I. M. Keseler, A. Shearer, M. Latendresse, M. Krummenacker, S. M. Paley, I. Paulsen, J. Collado-Vides, S. GamaCastro, M. Peralta-Gil, A. Santos-Zavaleta, M. I. Peñaloza-Spínola, C. Bonavides-Martinez, and J. Ingraham. Multidimensional annotation of the Escherichia coli K-12 genome. Nucleic Acids Res, 35(22):7577–7590, 2007. [32] K J Kauffman, P Prakash, and J S Edwards. Advances in flux balance analysis. Curr Opin Biotechnol, 14(5):491–496, Oct 2003. [33] Nevan J Krogan, Gerard Cagney, Haiyuan Yu, Gouqing Zhong, Xinghua Guo, Alexandr Ignatchenko, Joyce Li, Shuye Pu, Nira Datta, Aaron P Tikuisis, Thanuja Punna, José M Peregrín-Alvarez, Michael Shales, Xin Zhang, Michael Davey, Mark D Robinson, Alberto Paccanaro, James E Bray, Anthony Sheung, Bryan Beattie, Dawn P Richards, Veronica Canadien, Atanas Lalev, Frank Mena, Peter Wong, Andrei Starostine, Myra M Canete, James Vlasblom, Samuel Wu, Chris Orsi, Sean R Collins, Shamanta Chandran, Robin Haw, Jennifer J Rilstone, Kiran Gandi, Natalie J Thompson, Gabe Musso, Peter St Onge, Shaun Ghanny, Mandy H Y Lam, Gareth Butland, Amin M Altaf-Ul, Shigehiko Kanaya, Ali Shilatifard, Erin O’Shea, Jonathan S Weissman, C. James Ingles, Timothy R Hughes, John Parkinson, Mark Gerstein, Shoshana J Wodak, Andrew Emili, and Jack F Greenblatt. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637–643, Mar 2006. REFERENCES 151 [34] Vincent Lacroix, Ludovic Cottret, Th&#x0e9, Patricia Bault, and Marie-France Sagot. An introduction to metabolic networks and their structural analysis. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 5(4):594–617, 2008. [35] I.G. Libourel and Y. Shachar-Hill. Metabolic Flux Analysis in Plants: From Intelligent Design to Rational Engineering. Annu Rev Plant Biol, pages 625–650, Feb 2008. [36] N. I Markevich, J B Hoek, and B. N. Kholodenko. Signaling switches and bistability arising from multisite phosphorylation in protein kinase cascades. J. Cell Biol., 164:353–9, 2004. [37] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167–256, 2003. [38] M.E.J. Newman. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5):323, 2005. [39] B. G. Olivier, J. M. Rohwer, and J. H. Hofmeyr. Modelling cellular systems with pysces. Bioinformatics, 21:560–1, 2005. [40] B. O. Palsson. Systems Biology: Properties of Reconstructed Networks. Cambridge University Press, 2007. [41] J. A. Papin, N. D. Price, S. J. Wiback, D. A. Fell, and B. O. Palsson. Metabolic pathways in the post-genome era. Trends Biochem Sci, 28:250–8, 2003. [42] D.J.M.J.R. PARK. Positive compositional algorithms in chemical reaction systems. Computers & chemistry, 12(2):175–188, 1988. [43] S Petersen, A A de Graaf, L Eggeling, M Möllney, W Wiechert, and H Sahm. In vivo quantification of parallel and bidirectional fluxes in the anaplerosis of corynebacterium glutamicum. J Biol Chem, 275(46):35932–35941, Nov 2000. [44] E. Phizicky, P.I.H. Bastiaens, H. Zhu, M. Snyder, and S. Fields. Protein analysis on a proteomic scale. Nature, 422(6928):208–215, 2003. 152 REFERENCES [45] L. Qiao, R.B. Nachbar, I.G. Kevrekidis, S.Y. Shvartsman, and A. Asthagiri. Bistability and oscillations in the Huang-Ferrell model of MAPK signaling. PLoS Comput Biol, 3(9):e184, 2007. [46] Karthik Raman, Preethi Rajagopalan, and Nagasuma Chandra. Flux balance analysis of mycolic acid pathway: targets for anti-tubercular drugs. PLoS Comput Biol, 1(5):e46, Oct 2005. [47] R G Ratcliffe and Y Shachar-Hill. Measuring multiple fluxes through plant metabolic networks. Plant J, 45(4):490–511, Feb 2006. [48] C. Reder. Metabolic control theory: A structural approach. J. Theor. Biol., 135:175–201, 1988. [49] J. G. Reich and E. E. Selkov. Energy metabolism of the cell. Academic Press, London, 1981. [50] R Rios-Estepa and B M Lange. Experimental and mathematical approaches to modeling plant metabolic networks. Phytochemistry, 68(16-18):2351–2374, Aug-Sep 2007. [51] D K Ro, E M Paradise, M Ouellet, K J Fisher, K L Newman, J M Ndungu, K A Ho, R A Eachus, T S Ham, J Kirby, M C Chang, S T Withers, Y Shiba, R Sarpong, and J D Keasling. Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature, 440(7086):940–943, Apr 2006. [52] H. M. Sauro and D. A. Fell. Scamp: A metabolic simulator and control analysis program. Mathl. Comput. Modelling, 15:15–28, 1991. [53] J. M. Savinell and B. O. Palsson. Network analysis of intermediary metabolism using linear optimization. i. development of mathematical formalism. J Theor Biol, 154(4):421–454, Feb 1992. [54] K. Schmidt and SH Isaacs. An evolutionary algorithm for initial state and parameter estimation in complex biochemical models. Proceedings of the sixth international conference on computer applications in biotechnology. Garmish-Partenkirchen: Germany. p, pages 239–242, 1995. REFERENCES 153 [55] S Schuster, T Dandekar, and D A Fell. Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol, 17(2):53–60, Feb 1999. [56] S. Schuster, D. A. Fell, and T. Dandekar. A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks. Nature Biotechnlogy, 18:326–332, 2000. [57] S. Schuster and T. Hofer. Determining all extreme semi-positive conservation relations in chemical reaction systems: a test criterion for conservativity. J. Chem. Soc. Faraday Trans., 87:2561–2566, 1991. [58] J Schwender. Metabolic flux analysis as a tool in metabolic engineering of plants. Curr Opin Biotechnol, 19(2):131–137, Apr 2008. [59] G P Smith. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science, 228(4705):1315–1317, Jun 1985. [60] G. N. Stephanopoulos and A. A. Aristidou. Metabolic Engineering: Principles and Methodologies. Academic Press, 1998. [61] Cong Trinh, Aaron Wlaschin, and Friedrich Srienc. Elementary mode analysis: a useful metabolic pathway analysis tool for characterizing cellular metabolism. Applied Microbiology and Biotechnology, 81:813–826, 2009. 10.1007/s00253-008-1770-1. [62] P. Uetz, L. Giot, G. Cagney, T. A. Mansfield, R. S. Judson, J. R. Knight, D. Lockshon, V. Narayan, M. Srinivasan, P. Pochart, A. Qureshi-Emili, Y. Li, B. Godwin, D. Conover, T. Kalbfleisch, G. Vijayadamodar, M. Yang, M. Johnston, S. Fields, and J. M. Rothberg. A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature, 403(6770):623–627, Feb 2000. [63] R R Vallabhajosyula, V Chickarmane, and H M Sauro. Conservation analysis of large biochemical networks. Bioinformatics, 22(3):346– 353, Feb 2006. 154 REFERENCES [64] A. Varma and B. O. Palsson. Metabolic capabilities of escherichia coli: I. synthesis of biosynthetic precursors and cofactors. Journal of Theoretical Biology, 165(4):477–502, 1993. [65] A. Varma and B. O. Palsson. Metabolic capabilities of escherichia coli ii. optimal growth patterns. Journal of Theoretical Biology, 165(4):503–522, 1993. [66] A. Wagner and D. A. Fell. The small world inside large metabolic networks. Proceedings of the Royal Society B: Biological Sciences, 268(1478):1803–1810, 2001. [67] M. R. Watson. Metabolic maps for the apple-ii. Biochem. Soc. Trans, 12(6):1093–1094, 1984. [68] M. R. Watson. A discrete model of bacterial metabolism. Comput Appl Biosci, 2(1):23–27, 1986. [69] M Weitzel, W Wiechert, and K Nöh. The topology of metabolic isotope labeling networks. BMC Bioinformatics, 8:315–315, 2007. [70] W Wiechert. 13c metabolic flux analysis. Metab Eng, 3(3):195–206, Jul 2001. [71] W. Wiechert. A gentle introduction to 13 C metabolic flux analysis. Genet. Eng, 24, 2001. [72] W Wiechert, M Möllney, N Isermann, M Wurzel, and A A de Graaf. Bidirectional reaction steps in metabolic networks: Iii. explicit solution and analysis of isotopomer labeling systems. Biotechnol Bioeng, 66(2):69–85, 1999. [73] W Wiechert, M Möllney, S Petersen, and A A de Graaf. A universal framework for 13c metabolic flux analysis. Metab Eng, 3(3):265– 283, Jul 2001. [74] W Wiechert and K Nöh. From stationary to instationary metabolic flux analysis. Adv Biochem Eng Biotechnol, 92:145–172, 2005. REFERENCES 155 [75] J. Yang, S. Wongsa, V. Kadirkamanathan, S.A. Billings, and P.C. Wright. Metabolic Flux EstimationŮA Self-Adaptive Evolutionary Algorithm with Singular Value Decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, pages 126–138, 2007. [76] N Zamboni, E Fischer, and U Sauer. Fiatflux–a software for metabolic flux analysis from 13c-glucose experiments. BMC Bioinformatics, 6:209–209, 2005. 156 REFERENCES History 1. VERSION: 0.9 Date: 2011-01-6 Author(s): Herbert M. Sauro Title: Structural and Behavioral Properties of Biochemical Networks Modification(s): Initial Version 157 158 REFERENCES