Metabolic Network Shiyi Wang Supervised by Dr Jamie Wood Dissertation submitted for the MSc in Mathematics with Modern Applications, Department of Mathematics, University of York, UK. Submitted in 08/2011. 1 Abstract Cellular metabolism is defined as the essential physical and chemical processes for the maintenance of life. A metabolic network for a specific organism contains all metabolic reactions occurring within the living cells of an organism. With the rapid development of genomics and various successful genome projects, biologist deciphered the genome sequence of many organisms and the metabolic networks for such organisms can be faithfully reconstructed from available genome information. Thus the analysis of metabolic network has become essential in further studies, such analysis could help us to obtain a better understanding of the topology and biological functions for different organisms, hence enable us to utilize cellular metabolic process to assist the development of ferment technology, medicine industry and agriculture. Successfully developments in such areas would not only benefit the economics, but also allow us to understand or even inflect the biological evolution. In the thesis, the goal is to study two methods of the metabolic network flux analysis. First the basic metabolic network framework was studied including basic properties of metabolic network. Then the study focused on the methods of analyzing the flux distribution with mathematical models. First, the flux balance analysis was studied. Flux balance analysis is a commonly used method in estimating the fluxes flow through the metabolic network. The network is modeled into a linear programming problem with some stoichiometric constraints, hence solved by simplex algorithm. Second, the elementary flux mode analysis is studied. Elementary flux mode analysis is one of the pathway analysis methods which each mode determining a pathway in a metabolic network hence predict the fluxes flow through each pathway and how much each pathway contributes to the whole network. The algorithm for evaluating elementary flux mode matrix is introduced, as well as an improved method, the Shannon‟s maximum entropy principle for elementary mode analysis. Finally, the above two methods are applied to predict the flux distribution of tricarboxylic acid cycle, glyoxylate shunt and adjacent amino acid network with glucose and acetate as uptakes. 2 Content Abstract ............................................................................................................................................. 2 Content .............................................................................................................................................. 3 1. Introduction ............................................................................................................................... 4 1.1 Background of metabolic network research ........................................................................ 4 1.2 Basic properties of metabolic network ................................................................................ 4 1.3 Metabolic network analysis ................................................................................................. 5 2. Methods of Metabolic network flux analysis ............................................................................ 7 2.1 Glossary: ............................................................................................................................. 7 2.2 Flux Balance Analysis. ........................................................................................................ 8 2.21 Introduction ............................................................................................................... 8 2.22 Flux Balance Analysis procedure .............................................................................. 9 2.23 Limits of FBA method. ........................................................................................... 11 2.24 Extension of the FBA method. ................................................................................ 12 2.3 Linear programming ......................................................................................................... 13 2.31 Introduction of linear programming and simplex algorithm. .................................. 13 2.32 Basic concept of linear programming ..................................................................... 14 2.33 Simplex algorithm ................................................................................................... 15 2.34 Numerical examples: ............................................................................................... 17 2.4 Elementary Flux Modes Analysis. .................................................................................... 21 2.41 Introduction ............................................................................................................. 21 2.42 Mathematics behind EFMs ..................................................................................... 23 2.43 Calculate EFM matrix. ............................................................................................ 24 2.44 Calculation of flux in elementary modes. ............................................................... 28 2.45 Shannon‟s MEP for elementary mode analysis ....................................................... 32 2.46 Summary ................................................................................................................. 40 3. Numerical results .................................................................................................................... 42 3.1 Flux balance analysis ........................................................................................................ 43 3.2 Elementary flux modes analysis with Shannon‟s MEP. .................................................... 48 3.3 Comparison of two methods. ............................................................................................. 51 4. Conclusion .................................................................................................................................. 55 Reference ........................................................................................................................................ 56 Appendix ......................................................................................................................................... 59 3 1. Introduction 1.1 Background of metabolic network research The human genome project was successfully developed for the past two decades, the focus of biology research gradually transferred from study of individual gene or protein inside of a cell toward the studies of whole genome. With more and better knowledge of genome sequence, the studies of different kind of omics, such as genomics; mRNA; proteomics and transcriptomics are more regarded and supported. In such circumstances, the concept of metabolome was proposed. [1] Metabolomics is a branch of system biology which has been applied to identify and quantize all metabolites in an organism sample under specified living conditions. Strictly speaking metabolome refers all metabolites in an organism or cell. Metabolism is essential in the processes of live. It is mainly consisted by catabolism and anabolism, anabolism refers the process that organisms transfer absorbed nutrients from external environment into their own components and stores the energy; catabolism on the other hand, refers the process that organisms decompose itself, produce energy then excrete the end products from the decomposition. These processes with necessary enzymes produce all of the major constituents of the cell. Metabolic network is an abstract expression of cell metabolism that maps all biochemistry reactions into a network for a cell or organism, each metabolite is a node and the reactions are the links or pathways between metabolites which connect the nodes to form a network. This network reflects the interactions between all compounds as well as the enzymes which involved in the metabolic processes. For the past decade, whole genome sequencing were deciphered for hundreds of species, and the knowledge of genome annotation has also been significantly improved, thus enable us to reconstruct a faithful metabolic network for those species based on the available genome information. Metabolic network analysis is a successful way of predicting the metabolic phenotype of an organism under its metabolic genotype and particular conditions which could provide us a better understanding of cellular metabolic processes and the evolution of life. Therefore predicting the functions of a metabolic network become one of the most important tasks now days. [2-3] 1.2 Basic properties of metabolic network 1. Small-world 1998, Watts and Strogatz discovered the small-world property of network, if a network satisfies two statistical properties such as large clustering coefficients and small average distances then it is said to be a small-world network. [4] Wagner and coworkers studied 287 reactions within E.coli that found out the average pathway length of the network is 3.8. [5] MA studied and analyzed the metabolic network for 4 80 organisms and found out the overall average pathway is 8.2. [6] All these studies showed that metabolic network has small-world property, as a whole, the short pathway shows that the local perturbation of metabolites could be passed to the whole network very fast. 2. Scale-free A scale-free network is a network whose degree of node follows a power law distribution. Salas and coworkers showed that metabolic network is a scale-free network by studying different metabolic networks for animals, plants and microbes. [7]. 3. Robustness and redundancy All biology have capability of self-dynamic equilibrium, many can grow like wild type when certain genes been knockout. Robustness usually defined as the insensitivity of parameter changes in a system. In the structure robustness analysis, the problem is whether a cell could abide losing or deactivation of an enzyme. Structure robustness and redundancy are usually combined. With none redundancy, a cell would lose its general function when a pathway is cut or an enzyme is deactivated due to non-possible alternative pathway available. However in many cells, there exist parallel pathways hence redundant which would allow cells work as normal when above scenarios occur. 1.3 Metabolic network analysis For the past decade, with the growing interest of better understandings of the biochemical networks, the experimental techniques such as isotopic-tracer were improved significantly especially by the application of nuclear magnetic resonance technology to biological systems, however it is still not powerful enough to determine the whole network or too expensive to conduct, hence some alternative estimation methods have been proposed such as metabolic network flux analysis. Metabolic network analysis successfully predicts the metabolic phenotype which gives us a good idea of what is happening in an organism and how the organisms work under different external environment conditions. Depending on the different demands, there are several network analysis methods which could help us to obtain a better understanding of the metabolic process. For example: statistical clustering analysis; stoichiometric network analysis etc. 5 Approach Constraints incorporated Quasi Flux Functio Opti stead distrib y demand capaciti utions es Network ed operatio e of n reactions pathway y state Importanc nal malit s Correlat Optimal tional n dynamic metry Computa Reactio Therma Stoichio Applications Robustness / function reaction s Pathway lengths Flexibility ality s Flux Balance Yes Yes Yes Yes No Low Single No Yes (Yes) No No Yes (Yes) Yes Yes Yes No No High All Yes Yes Yes Yes Yes Yes Yes Yes (Yes) Yes (Yes) No Low Single No No (Yes) No No No No Yes Yes Yes Yes Yes Medium Single No Yes (Yes) No No Yes (Yes) Yes Yes Yes No No High All (Yes) (Yes) (Yes) (Yes) (Yes) (Yes) (Yes) Topol Possib No No No Low None No No (Yes) (No) (Yes) (Yes) (Yes) ogy le Yes No No No No Low None No No No No No No No Yes No No No No Low None No No No (Yes) No (Yes) No Analysis Elementary Flux Modes Metabolic Flux Analusis Minimization of Metabolic Adjustment Extreme Pathways Graph theory Conservation relations Null-space Analysis (via Kernal Matrix) Table 1.31. Comparison methods, adapted from 《Network-based Metabolic Flux and Structure Analysis》[8]. In this thesis, the goal is to analyze the flux distribution of the metabolic network. The above table shows some of the methods and its applications [8]. The purpose of studying the flux distribution of metabolic network is to understand how the fluxes flow through the network hence enables us to regulate the fluxes. These would let us be able to achieve following goals: [9] 1. Improvement of productivity. To maximize the demand product. 2. Elimination or reduction of by-product. In many processes, by-products take up available nutrition and make the product less pure, in some cases even poisonous, that‟s very important in pharmacology. The thirst of obtain better understandings of these biological functions and be able to utilize it attracted a lot of attention for the past decade, especially for industrial and medical purposes. Flux balance analysis and Elementary mode analysis are two of the most commonly used methods in metabolic network flux analysis depending on different resources and demands. Full details and their mathematical concept of these methods are given in the next section. 6 2. Methods of Metabolic network flux analysis 2.1 Glossary: 1. Flux: the reaction rate of a certain reaction when the metabolic network is in quasi steady-state. [9]. 2. Reaction rate: the amount of chemical substrate that is consumed and the amount of product that is formed by that reaction. 3. Steady-state: the state in which the concentrations of every metabolite does not change. [9]. 4. Stoichiometric matrix: matrix containing the stoichiometric constraints for every reaction in terms of each chemical. [9]. For a given network, all the metabolites which included in every reaction are put into the stoichiometric matrix, where the columns represent all metabolites in the metabolic network and rows correspond to all reactions. Example (1): we have a simple network below. [49]. Figure 2.11. A simple metabolic network. The stoichiometric matrix can be read from the network: R1 R2 R3 R4 R5 R6 R7 R8 R9 A 1 -1 -1 0 0 0 0 0 0 B 0 1 0 -1 0 -2 0 0 0 C D 0 0 0 0 1 0 0 2 0 0 1 0 -1 1 0 -1 0 0 E 0 0 0 -1 1 0 0 0 0 F 0 0 0 0 0 1 0 0 -1 Table 1.1. The stoichiometric matrix of the simple metabolic network example. 5. A mode is a flux vector 𝒗 ∈ 𝑹𝒎 that contains the system in steady state. That is a vector 𝒗 ≥ 0 such that Sv=0, where S is the stoichiometric matrix.[28] 7 6. A mode 𝒆 ≠ 0 is an elementary mode if its support is minimal, that is, if there is no other mode 𝒘 ≠ 0 such that 𝑹 𝒘 ∈ 𝑹(𝒆), R(e) is the set of reactions participating in mode v.[28] 7. The set of modes forms a cone which is the intersection of the nullspace Sv=0 with the positive orthant v≥0.[28] 8. Irreversible reaction: A reaction in which the rate of the forward reaction is always so much higher than the rate of the reverse reaction that the latter is relatively negligible.[28] 2.2 Flux Balance Analysis. 2.21 Introduction Table 2.211. Development of flux balance analysis. The above figure is adapted from Kenneth and coworkers paper which given in the reference below [14]. More recently, Benyamin and coworkers has published a paper about metabolite dilution flux balance analysis, which resulting in improved metabolic phenotype predictions. [15]. Flux balance analysis (FBA) becomes one of the most popular methods in this research field and been widely applied in practical studies with the growing interest of the biochemical networks. FBA is a constraint-based mathematical method which specializes in large-scale metabolic networks to predict the internal fluxes of metabolic network, biochemical knowledge of the network such as concentration of metabolites or enzyme kinetics of the system are not required hence make it easy to implement. This methodology is utilized to estimate intracellular fluxes in a metabolic network under the optimization of a specific objective function restricted by stoichiometric constraints, thereby predict the growth rate of an organism or the rate of a particular metabolite. [10] 8 It is obvious that the core of constraint-based method is the constraints, There are two main constraints, steady state and physiochemical constraints,[11] such as: stoichiometric constraints, thermodynamic restrictions, time to maximum rate constraints. Stoichiometric constraints restrict the mass balance and the energy balance; thermodynamic restrictions limit the direction of the reactions; time to maximum rate constraint determines the reactive potency of a single enzyme.[12] Flux balance analysis is one of the most used constraint-based method, it utilizes the stoichiometric constraints to make the metabolic network reaches the steady state. Such constraints limit the solution space of all feasible fluxes, mathematically it defines a flux cone. Furthermore this solution space could be further restricted by assigning minimum and maximum (usually use minus infinity for reversible reactions zero for irreversible reactions for lower bound) possible fluxes through any particular reaction. Once all constraints are obtained, it forms a convex solution space which contains all the feasible solutions to the stead state equation [10]. All the feasible points in the solution space of a metabolic network can be reached by the system. For a given metabolic network, if the number of reactions is equal to the number of unknown flux flows, it becomes a very simple determining equation problem which will give a unique flux distribution. However in a realistic metabolic network, the metabolic process can be very complicated, usually hundreds of metabolites and thousands reactions are involved in a cell or organism, although only a few of those reaction rates can be determined by experiment, these experimentally determined data are far less than the total number of reactions of the metabolic network.[13] Therefore in order to estimate the metabolic flux distribution, linear programming is applied to evaluate an optimal solution for the desired objective function. The choice of the objective function plays a very important role in FBA progress. Typical objective functions include biomass of production and reproduction, maximal energy ATP production, minimize nutrient uptake, minimal Manhattan distance or Euclidean distance of the flux vector, among the production of various typical products [14]. It has been proofed that the by using biomass production growth as objective function, FBA gives a very promising estimation of internal flux distribution comparing to the experimental results. [14] 2.22 Flux Balance Analysis procedure 1. Genome-scale metabolic reconstruction In order to perform FBA to analyzing the flux for a metabolic network, the network has to been defined first. The FBA requires all the reactions and metabolites that included in the network, and then mathematical modeling is used to predict the pathway flux. Although in real live organisms, different environmental conditions will have different regulation on each pathway, in FBA model, it always assumes that all reaction reacts under the quasi steady state, in other words, all reactions react at their 9 maximum rate. The ideal starting point for the metabolic reconstruction would be identifying all the metabolic enzymes and metabolites. Next record all the reactions which catalyzed by each of the enzymes. [10]. Now days many genomic networks information can be find in Kyoto Encyclopedia of Genes and Genomes (KEGG) database which is developed by Kanehisa Laboratories Kyoto University. This database contributes great support of reconstruction of some well-developed biological network. [16] 2. Mathematically represent metabolic reactions and constraints. Once all the reactions are recorded. FBA convert those into a stoichiometric matrix (S) of size m×n. Every row of this matrix represents one unique metabolite for a network with m compounds and every column corresponds to one reaction. The entries in each column are the stoichiometric coefficients of the metabolites participating in a reaction. There is a negative coefficient for every metabolite consumed and a positive coefficient for every metabolite that is produced. The coefficient is equal to zero when the corresponding metabolite is not involved in that reaction. Biomass reaction is incorporated into the reaction list which contains all metabolites consumed during biomass production. The biomass reaction is based on experimental measurements of biomass components. [10]. Reactions Metabolites 1 2 … n Biomass A B … Stoichiometric matrix S Table 2.221. Stoichiometric matrix. 3. Mass balance defines a system of linear equation. The goal of FBA is to estimate the fluxes of the metabolic network under the pseudo-steady state, at steady-state, this implies that the flux through reaction is given by Sv=0 which defines a system of linear equations. [10] Metabolites 1 A B … 2 … Reactions n Biomass v1 v2 … × vn vbiomass Stoichiometric matrix S Fluxs x Table 2.222. Stoichiometric constraints of the fluxes. 10 =0 The linear equations of above indicate that the network is at quasi steady state. 4. Define objective function We have obtained a linear system of the network which includes all components, in order to predict the fluxes, objective function is used to define how much each reaction contributes to the network. The objective function can be defined as: z = c1 × v1 + c2 × v2 … = 𝐜 T 𝐯 , c is the weights vector that represents the contribution of each reaction to the network. In the case that the biomass production is the objective, the maximal or minimal of biomass production is desired, and the vbiomass will be the only term in the objective function, hence the vector c will have 1 at the position of the biomass reaction and zero for other reactions. Finally the complete FBA model which is developed for the maximization of biomass production, constrained by mass balances is given: Max Z = vbiomass N Such that j=1 𝐒𝐢𝐣 𝐯𝐢 = 0, 𝐯jmin ≤ 𝐯i ≤ 𝐯jmax , i∈𝐌 j∈𝐍 Where vbiomass is the biomass flux, 𝑆𝑖𝑗 is the stoichiometric coefficient of metabolite i in the reaction j, and 𝑣𝑗 is the flux of reaction j, M is the number of metabolite, N is the total number of reactions in the network. [10] 5. Calculate fluxes that maximize objective function. The model defined above could finally determine the optimal flux distribution within the region of allowable fluxes. The optimization can be achieved by solving the linear programming since the objective function and constraints equations are linear functions, full details will be discussed in below. 2.23 Limits of FBA method. Traditional FBA model optimizes a single objective over a feasible region, as stated above, the efficiency of the outcomes obtained by the FBA is totally depend on the constrains and the choice of the objective function, this method has been applied widely in the practical field, however there are several cases where the traditional FBA have difficulty to solve.[24] 1. As stated above, FBA model works under the assumption that all reaction reacts under the quasi steady state and only stoichiometric constraints are used, therefore the regulations for the reactions and pathways are neglected. However 11 the genes have very complex influence on each other, the outcome of FBA maybe inaccurate or been badly conducted when this important fact been neglected. [24] 2. Parallel metabolic routes cannot be resolved. When a reaction catalyzed by multiple enzymes, FBA can only estimate the total flux of such reaction rather than specific fluxes caused by each enzyme. [24] 3. Reversible reaction: similar with problem 2, only the sum of both directions can be estimated rather than the fluxes for each way. [24] 2.24 Extension of the FBA method. FBA optimize the biochemistry network behavior by achieve some specified objectives. In the past few years, further development of FBA model with incorporating more biological knowledge became increasingly interested. 1. Combine with different additional constraints. Regulatory constraints were first imposed as Boolean logic operators by Covert and coworker[17]. The regulatory constraints depend on the specific environment rather than physiochemical constraints, regardless of time, space and other fundamental restrictions, by using the Boolean logic approach, regulatory constraints were evaluated based on the initial condition of the cellular system. Then carry out the traditional FBA model. The outcomes were then used to re-evaluate the regulatory constraints. This process can be repeated under the allowable time [17]. In traditional FBA, the thermodynamic constraints are only accounted for defining the reversibility for a given reaction. However, the reversibility is depending on the intracellular conditions, it may vary with the changes of the external environment. So apply the reaction thermodynamics into FBA model, by using the nonlinear constraints of the balance of chemical potential instead of simple stoichimetric constrains would provide a stronger biological background to the traditional FBA. For the E. coli metabolism, Beard found that combination of the energy balance analysis with FBA gave the same optimal growth rate, but the flux distributions are different. This method has its disadvantage too, thus the complicity of the calculation. [18,19]. 2. Variability of objective functions. In most applications, the objective function used in the FBA model is biomass production. However this method dose not performs very well in predicting the behavior of the gene knockouts based cell. In order to achieve a better understanding of the flux distribution for the variant strain, the minimization of metabolic adjustment (MOMA) has been developed based on the FBA, MOMA calculates the flux distribution for the wide type metabolites, utilizing secondary 12 planning to find the minimal Euclid distance between the variant strain fluxes and wide type metabolites fluxes. This method successfully simulated the viability of E. coli and quantization the metabolic flux distribution after knockouts. This method has been proofed efficient by comparing with experimental results, an example is to increase the production of lycopene in the E. coli by choosing the change point. [20] Regulatory on/off minimization (ROOM) is another method in estimating the flux distributions for variant strain, this method expect to minimize the number of significant flux changes. ROOM efficiently predicted the flux distribution of E. coli after a particular gene knockout.[21] Both approaches are motivated by the assumptions that cells will still have similar behavior with the wild type after gene knockout [18]. 3. Dynamic optimization For the past few years, the extended FBA method which involves dynamics have been studied. This approach was used in the analysis of diauxic growth in E. coli on glucose, and the predictions qualitatively match the experimental data. This application showed that the instantaneous objective function gives a better prediction than a terminal-type objective function for a large network. [18,23]. 4. Predictive capability A bi-level optimization problem based method OptKnock was used to predict the flux after gene knockout. This approach simultaneously optimizes two objective functions, biomass growth and secretion of a desired metabolite. [22] This method was successfully applied in the experiments of lactic acid production in E. coli [18]. 2.3 Linear programming 2.31 Introduction of linear programming and simplex algorithm. French mathematicians Joseph Fourier and C. Vallée Poussin first introduced the idea of linear programming in 1832 and 1911 respectively. In 1939 a Russian mathematician Leonid Kantorovich first published a book about solving economic problems with mathematical method where the idea of linear programming been used. Later on this method has widely applied in military during World War Ⅱ. After the war, linear programming was published to the public and been widely used in economic and industry management ever since. 13 In 1947, an American scientist George B. Dantzig published the simplex method. Electronic computers were brought into use from 1945, these made possible of solving linear programming problem with very complicated constraints. In 1979, a Russian mathematician Leonid Khachiyan developed Ellipsoid algorithm for linear programming, this method first showed that the linear programming problems can be solved in polynomial time. However in almost all numerical tests for large-scale linear programming problems, simplex algorithm performed better than the Ellipsoid algorithm. In 1984, Narendra Karmarkar introduced the interior point method for solving linear programming problems, Karmarkar and couple of professors in the University of California compiled an open computing program, it showed in many linear programming problems the interior point method is a more efficient method than the simplex algorithm. Although the interior point method gives great contribution to both theoretical and practical field, the simplex algorithm is still the most commonly used method due to its publicity and simplicity. There were also some other methods been introduced since 1980s, such as genetic algorithm, ACO Ant Colony Optimization and Tabu Search algorithm. [25] 2.32 Basic concept of linear programming A linear programming model contains objective function, constraint variables and constraints. Its basic form can be represented as follow [26]: Max 𝐳 = c1 x1 + ⋯ + cn xn 1.1 a11 x1 + ⋯ + a1n xn ≤, =, ≥ b1 , a21 x1 + ⋯ + a2n xn ≤, =, ≥ b2 , Subject to (1.2) ⋮ am1 x1 + ⋯ + amn xn ≤, =, ≥ bm , and x1 , x2 , … , xn ≥ 0. (1.3) In matrix form: Max 𝐳 = 𝐜 T 𝐱, (1.4) Subject to the constraints Subject to a11 Where 𝐀 = ⋮ am1 ⋯ ⋱ ⋯ 𝐀𝐱 ≤, =, ≥ 𝐛 (1.5) 𝐱≥0 a1n ⋮ , 𝐛 = (b1 , b2 , … , bm )T is a m-component vector, amn 𝐜 = (c1 , c2 , … , cn )T , is an n-component row vector, 𝐱 = [x1 , x2 , … , xn ], is an n-component column vector.[26] A is called constraint matrix, b is requirements vector, c is called price vector, x is called decision variable vector.[26] Let 𝐜 = 𝐜𝐁 , 𝐱 = (𝐱 𝐁 )T , then its partitioned matrix form is represented as: [25] 14 Max z = 𝐜𝐁 𝐱 𝐁 (1.7) 𝐁𝐱 𝐁 = 𝐛 (1.8) 𝐱𝐁 ≥ 𝟎 Where B is a nonsingular sub-matrix of A. 𝐱 = [x1 , x2 , … , xn ] is a feasible solution if the restriction conditions 𝐀𝐱 = 𝐛, 𝐱 ≥ 0 are satisfied and the region D = x 𝐀𝐱 = 𝐛, 𝐱 ≥ 0} is called the feasible region. If x∈D and x cannot be represented by the convex combination of D then x is an extreme point of the feasible region D. [25] For equation (1.8), if |B|≠0, then B is a basis matrix for a linear programming problem and the corresponding vectors 𝐁 = (P1 , P2 , … , Pm ) is called basis vector. [25] The m variables x which are associated with the m linearly independent column vector 𝑷𝒋 are called basic variables.[25] If 𝐱 𝐁 ≥ 0 , then xB is called a basic feasible solution, if 𝐱 𝐁 maximize or minimize the objective function z, then 𝐱 𝐁 is called the optimal solutions.[25] If one or more of the basic variables takes the value zero, then the basic solution is said to be degenerate.[25] Figure 2.321 Linear programming plot. 2.33 Simplex algorithm As defined above if we have a basic feasible solution of the standard linear programming problem 𝒙𝑩 , we have: [27] 𝑴𝒂𝒙 𝒛 = 𝒄𝑩 𝒙𝑩 𝑩𝒙𝑩 = 𝒃 𝒙𝑩 ≥ 𝟎 15 Then each column of A can be expressed as a linear combination of the corresponding column of B: [27] 𝒂𝒋 = 𝜷𝟏,𝒋 𝒃𝟏 + ⋯ + 𝜷𝒎,𝒋 𝒃𝒎 𝒎 = 𝒃𝒊 𝜷𝒊,𝒋 = 𝑩𝜷𝒋 𝒊=𝟏 𝜷𝒋 are known coefficients for the given basic feasible solution. To obtain a new basic feasible solution is simply change one column of B. The new quantities are denoted with a bar. Then 𝑩 is formed from B by removing 𝒃𝒓 and replacing it by 𝒂𝒌 from A. we have: 𝒂𝒌 = 𝜷𝟏,𝒌 𝒃𝟏 + 𝜷𝒓,𝒌 𝒃𝒓 … + 𝜷𝒎,𝒌 𝒃𝒎 Thus we have: 𝒎 𝒙𝑩𝒊 𝒃𝒊 = 𝒃 (𝟏) 𝒊=𝟏 𝒎 𝒙𝑩𝒊 − 𝒊=𝟏 𝒊≠𝒓 𝜷𝒊,𝒌 𝒙𝑩𝒓 𝒙𝑩𝒓 𝒃𝒊 + 𝒂 = 𝒃. (𝟐) 𝜷𝒓,𝒌 𝜷𝒓,𝒌 𝒌 Such that: 𝒙𝑩𝒊 = 𝒙𝑩𝒊 − 𝜷𝒊,𝒌 𝒙 ≥ 𝟎, 𝒊 = 𝟏, … , 𝒎, 𝒊 ≠ 𝒓, (𝟑) 𝜷𝒓,𝒌 𝑩𝒓 And 𝒙𝑩𝒓 ≥ 𝟎. (𝟒) 𝜷𝒓,𝒌 𝒙𝑩𝒓 = If 𝒙𝑩𝒓 > 0, then 𝜷𝒓,𝒌 > 0, then (3) requires: 𝒆𝒊𝒕𝒉𝒆𝒓 𝒐𝒓 𝜷𝒊,𝒌 ≥ 𝟎 𝜷𝒊,𝒌 ≤ 𝟎 𝒙𝑩𝒊 𝒙𝑩𝒓 𝒂𝒏𝒅 ≥ . 𝜷𝒊,𝒌 𝜷𝒓,𝒌 𝒊 = 𝟏, … , 𝒎, 𝒊 ≠ 𝒓. Hence the new basic feasible solution can be chose by: 𝒙𝑩𝒓 𝜷𝒓,𝒌 = 𝒎𝒊𝒏 𝒙𝑩𝒊 : 𝜷𝒊,𝒌 𝜷𝒊,𝒌 > 0 . (5) Since we have a new basic feasible solution, it is possible to find 𝒛. 𝒎 𝒛= 𝒄𝑩𝒊 𝒙𝑩𝒊 . 𝒊=𝟏 𝒎 = 𝒊=𝟏 𝒄𝑩𝒊 𝒙𝑩𝒊 − 𝒎 = 𝒄𝑩𝒊 𝒙𝑩𝒊 − 𝒊=𝟏 𝜷𝒊,𝒌 𝒙𝑩𝒓 𝒙𝑩𝒓 + 𝒄𝒌 𝜷𝒓,𝒌 𝜷𝒓,𝒌 𝒙𝑩𝒓 𝜷𝒓,𝒌 𝒎 𝒄𝑩𝒊 𝜷𝒊,𝒌 + 𝒄𝒌 𝒊=𝟏 𝒙𝑩𝒓 =𝒛− 𝒛 − 𝒄𝒌 𝜷𝒓,𝒌 𝒌 So (𝟔) 𝒛 > 𝑧 if and only if: 𝒛𝒌 − 𝒄𝒌 < 0 𝑎𝑛𝑑 Finally calculate 𝜷𝒊,𝒋 , we have: 16 𝒙𝑩𝒓 > 0. 𝜷𝒓,𝒌 𝒙𝑩𝒓 𝜷𝒓,𝒌 𝒎 𝒂𝒋 = 𝜷𝒊,𝒋 𝒃𝒊 + 𝜷𝒓,𝒋 𝒃𝒓 𝒊=𝟏 𝒊≠𝒓 And 𝒎 𝒂𝒌 = 𝒊=𝟏 𝒊≠𝒓 𝜷𝒊,𝒌 𝒃𝒊 + 𝜷𝒓,𝒌 𝒃𝒓 𝒎 𝒂𝒋 = 𝜷𝒊,𝒋 − 𝒊=𝟏 𝒊≠𝒓 𝜷𝒊,𝒌 𝜷𝒓,𝒋 𝜷𝒓,𝒋 𝒃𝒊 + 𝒂 𝜷𝒓,𝒌 𝜷𝒓,𝒌 𝒌 And 𝒂𝒌 = 𝒎 𝒊=𝟏 𝒊≠𝒓 𝜷𝒊,𝒋 𝒃𝒊 + 𝜷𝒓,𝒋 𝒃𝒓. Thus: 𝜷𝒊,𝒌 𝜷𝒓,𝒋 , 𝜷𝒓,𝒌 𝜷𝒓,𝒋 𝜷𝒓,𝒋 = 𝜷𝒓,𝒌 𝜷𝒊,𝒋 = 𝜷𝒊,𝒋 − 𝒊≠𝒓 (𝟕) Iterative steps: 1. Test for optimal solution. If 𝒛𝒋 − 𝒄𝒋 ≥ 𝟎 for all j, then the solution is optimal. 2. If there exists at least one j for which 𝒛𝒋 − 𝒄𝒋 < 0 and 𝜷𝒊,𝒋 > 0 for at least one i for each of these j, then variable 𝒙𝒌 becomes basic, where k is chosen by the rule: 𝒛𝒌 − 𝒄𝒌 = 𝒎𝒊𝒏 𝒛𝒋 − 𝒄𝒋 : 𝒛𝒋 − 𝒄𝒋 < 0, 𝜷𝒊,𝒋 > 0, 𝑖 = 1, … , 𝑚 . 3. If (2) holds, the variable 𝒙𝑩𝒓 becomes non-basic, where r is chosen by the equation (5). 4. Compute 𝒙𝑩𝒊 , 𝒛, 𝜷𝒊,𝒋 , 𝒂𝒏𝒅 𝒛 − 𝒄𝒋 , for all i and j, from equation (3)-(7). Repeat step (2)-(4) until (1) holds. The above method description are quoted from 《An introduction to linear programming》written by G.R. Walsh [27], full details including theorems, lemmas, and explanations are given in the book [27]. A numerical example is made below, which the coefficients are set to be simple in order to make the calculation easy. 2.34 Numerical examples: A linear programming problem solved by simplex tableaus.[27] Suppose we want to Maximize: Z = 2x1 + 3x2 , Subject to: x1 + 2x2 ≤ 4, 2x1 + x2 ≤ 7, x1 , x2 ≥ 0. Adding slack variables x3 , x4 , the constraint equations become: 17 x1 + 2x2 + x3 = 4, 2x1 + x2 + x4 = 7. By setting x1 , x2 equal to zero, we get a feasible solution: x1 = 0, x2 = 0, x3 = 4, x4 = 7. The tableau: c' cB basic c1 c2 c3 c4 variables cB1 xB1 β11 β12 β13 β14 cB2 xB2 β21 β22 β23 β24 z z1 − c1 z2 − c2 z3 − c3 z4 − c4 c' 2 3 0 0 cB basic variables x1 x2 x3 x4 cB1 = c3 = 0 x3 xB1 = 4 1 2 1 0 cB2 = c4 = 0 x4 xB2 = 7 2 1 0 1 zj − cj 0 -2 -3 0 0 Equation used for computation: xBi = xBi − βik βrk xBr ≥ 0, i = 1, … , m, i ≠ r, xBr = xBr βrk ≥ 0 1.1 . βij = βij − βik βrj βrk , i ≠ r, βrj = βrj βrk . zj − cj = zj − cj − 1.2 . βrj z − ck . βrk k 1.3 . By simplex rules, because zj − cj < 0 𝑎𝑛𝑑 βij > 0 For some i for each of these j, and the basic variable is chosen by the rule: zk − ck = min zj − cj : zj − cj < 0, βij > 0 = 1, … , 𝑚 . The original basic variable xBr becomes non-basic, where r is chosen by the rule: xBr βrk = min xBi βik : βik > 0 . So by these rules, x2 become basic and min xB 1 xB 2 , β 12 β 22 = min 4 7 , 2 1 = 2. Hence xB1 = x3 become non-basic. The element in the tableau at the intersection of the x2 , column and the x3 row, indication the pair of variables to be exchanged, is called the pivot, the pivot βrk = β12 = 2, r=1,k=2,is denoted by an asterisk. Then by equation (1.1), (1.2), (1.3) compute the values. 18 βrj = βrj βrk . 1 2 =1 1 = 2 =0 β11 = β12 β13 β14 βij = βij − βik βrj βrk , i ≠ r, β21 = β21 − β22 β11 β12 = 2 − 1 3 = , 2 2 1 β22 = 0, β23 − , β24 = 1 2 xBr = xBr βrk ≥ 0 xB1 4 xB1 = = = 2, β12 2 xBi = xBi − βik βrk xBr xB2 = xB2 − β22 β12 xB1 = 7 − 4 ∗ 1 = 5. 2 βrj z − ck . βrk k β11 1 1 z1 − c1 = z1 − c1 − z2 − c2 = −2 − ∗ −3 = − , β12 2 2 2 z2 − c2 = 0, z3 − c3 = , z4 − c4 = 0. 3 Then the new tableau becomes: zj − cj = zj − cj − c' 2 3 0 0 cB basic variables x1 x2 x3 x4 cB1 = c2 = 3 x2 xB1 = 2 1/2 1 1/2 0 x4 xB2 = 5 3/2 0 1/2 1 zj − cj 6 -1/2 0 2/3 0 cB2 = c4 = 0 For the second tableau, x1 will become basic at the next iteration. The variable to become non-basic is determined from: min 4,10/3 = 10/3. This shows that xB2 = 5 become non-basic. Hence the pivot is βrk = β21 = 3/2, and by the same steps above construct the new tableau. βrj = βrj βrk . 1 β21 = 1, β22 = 0, β23 = , β24 = 2/3 3 βij = βij − βik βrj βrk , i ≠ r, 19 β11 = β11 − β11 β21 β21 = 0, 1 β12 = 1, β13 = , β14 = −1/3. 3 xBr = xBr βrk ≥ 0 xB2 5 10 xB2 = = = , β21 3 3 2 xBi = xBi − βik βrk xBr 1 1 xB1 = xB1 − β11 β21 xB2 = 2 − 5 ∗ = . 3 3 βrj zj − cj = zj − cj − z − ck . βrk k β21 z1 − c1 = z1 − c1 − z − c1 = 0, β21 1 5 1 z2 − c2 = 0, z3 − c3 = , z4 − c4 = . 6 3 Then the new tableau becomes: c' 2 3 0 0 variables x1 x2 x3 x4 1 3 0 1 1/3 -1/3 10 3 1 0 1/3 2/3 0 0 5/6 1/3 cB basic cB1 = c2 = 3 x2 xB1 = cB2 = c1 = 2 x1 xB2 = zj − cj 23/3 The above tableau is optimal, since zj − cj ≥ 0 for all j. The optimal solution is: x1 = 10 1 , x2 = , 3 3 z = 23/3. Usually the linear programming problem in real life is much more complicated than the example above, it is very difficult to calculate by hand. So computer software such as MATLAB is designed to solve it. Solve a linear programming problem to maximize 𝒗biomass subject to the constraints that defined by example (1), suppose the uptake fluxes are evenly distributed between A and E: Maximize: 𝐅 = 𝐯𝐛𝐢𝐨𝐦𝐚𝐬𝐬 , Subject to: v1 − v2 − v3 = 0, v2 − v4 − 2v6 = 0, v3 + v6 − v7 = 0, 2v4 + v7 − vbiomass = 0 −v4 + v5 = 0 20 v6 − v9 = 0 0 ≤ xi,i≠1,5. ≤ ∞, x1,5. = 0.5. The MATLAB codes: >> f=[0,0,0,0,0,0,0,-1,0]; >> b=[0;0;0;0;0;0;0;0;0]; >> b=[0;0;0;0;0;0]; >> ub=[0.5,Inf,Inf,Inf,0.5,Inf,Inf,Inf,Inf]; >> lb=[0,0,0,0,0,0,0,0,0]; >> simlp(f,data,b,lb,ub,[],6) The solution is: v1 = 0.5, v2 = 0.5, v3 = 0, v4 = 0.5, v5 = 0.5, v6 = 0, v7 = 0, vbiomass = 1, v9 = 0 F = vbiomass = 1. 2.4 Elementary Flux Modes Analysis. 2.41 Introduction Metabolic pathway analysis has been recognized as a central approach to discover and analyze the structure of a metabolic network. [28] This approach identifies the topology of cellular metabolism based on the stoichiometric structure and thermodynamic constraints of reactions where kinetic parameters are not required for the model. It has been successfully applied to various organisms to investigate metabolic network structure, robustness, fragility, regulation, metabolic flux distribution. The metabolic pathway analysis is developed based on the first principle of mass conservation of internal metabolites within a system. [28] Elementary flux modes analysis is a pathway analysis for evaluating metabolism based on the set of metabolic reactions of a given network. An elementary flux mode is a minimal subset of enzymes in a network that can operate at steady state with all irreversible reactions proceeding in the direction as regulated by the thermodynamics. The flux distributions in a cell or organism are defined as nonnegative linear combination of elementary modes. This approach identifies the pathways from an educt to a production. However, in a large-scale network, calculation based on elementary modes sometime facing the combinatorial explosion problem, therefore splitting the whole network into several sub-networks is necessary sometimes [28]. EFM includes all pathways, which indicate the route of a certain educts to production. The number of pathways can demonstrate the sensibility of the network, hence been used to investigate network structure robustness, fragility regulation, metabolic flux vector and rational strain design. [29] Klamt and coworkers calculated the elementary modes for E. coli model by FluxAnalyzer, this model contains 89 metabolites, 110 reactions, obtained 4.85 × 1013 elementary modes. [30] 21 Elementary flux modes analysis has become an important theoretical tool for system biology, biotechnology and metabolic engineering now days. Some major applications of this method are listed below: [31] 1. Identification of pathways: the set of EFM consists all possible pathways. 2. Network flexibility: the number of EFMs is at rough measure of the network’s flexibility to perform a certain function. 3. Identification of all pathways with optimal yield: consider the linear optimization problem, where all flux vectors with optimal product tield are to be identified, i.e. where the moles of products generated per mole of educts is maximal. Then, one or several of the EFMs reach this optimum and any optimal flux vector is a convex combination of these optimal EFMs. 4. Redundancy: Wilhelm and coworkers developed a new measure method which studies the number of EFMs after knockout some enzymes. They compared the metabolic network for E. coli and human erythropoietin. It theoretically analyzed and compared the environmental viability of E. coli and human erythropoietin. 5. Importance of reactions: predicate the growth ability, if a reaction is involved in all growth-related EFMs, its deletion will make the related EFMs disappear. 6. Reaction correlations: EFMs can be used to analyze structural couplings between reactions, which could give hints for underlying regulatory circuits, hence obtain the enzyme or reaction subset. 7. Detection of thermodynamically infeasible cycles: EFMs representing internal cycles are infeasible by laws of thermodynamics and thus reflect structural inconsistencies. 8. Pathway analysis can also be combined with regulatory rules and stoichiometric constraints to study the metabolic network. 9. Minimal cut sets: EFMs can be used to calculate the minimal cut sets, thus the minimal reaction subset in a metabolic network, losing such reaction subset will cause certain function invalidated for the metabolic network. By this property, many other applications will be available, such as phenotype prediction, analyzing the structure flexibility, metabolic network structure analysis and identify the drug target etc. Above points of view are quoted from the paper 《Computation of elementary modes: a unifying framework and the new binary approach》written by Julien 22 Gagneur and Steffen Klamt [31]. 2.42 Mathematics behind EFMs Suppose for a given metabolic network, which consists m metabolites and n reactions. The stoichiometry matrix S is a m×n matrix. In such network, each EFM is defined by a vector e, composed of n elements, each describing the net rate of the corresponding reaction (i.e. vector e is a flux distribution). Such network can be mathematically represented as the following equation:[31,32,33] 𝐒𝐞 = 0. Where the equation must satisfy three conditions: 1. Pseudo steady-state, which restricts Se equal to 0. This ensures that none of the metabolites is consumed or produced in the overall stoichiometry. [31] 2. Feasibility, there are two sets of reactions, the set of irreversible reactions and the set of reversible reactions, only the irreversible reaction are thermodynamically feasible in only one direction. Thus the rate 𝒆𝒊 ≥ 𝟎 if reaction i is irreversible. This demands that only thermodynamically realizable fluxes are contained in e. [31] 3. Non-decomposability, let P(m) be the set of reactions that do not occur in elementary moed m. then for all other modes n≠m it follows that P(n) is not a a proper subset of P(m). [31] Furthermore, if above three conditions are satisfied, then all feasible steady-state flux distributions v can be described by a non-negative combination of all EFMs. Thus the whole model is: 𝑷 = 𝒗 ∈ 𝑹𝒒 : 𝑵𝒗 = 𝟎 𝒂𝒏𝒅 𝒗𝒊 ≥ 𝟎, 𝒊 ∈ 𝑰𝒓𝒓𝒆𝒗 P is a set of vectors that obey a finite set of homogeneous linear equations and inequalities, by definition, a convex polyhedral cone. 𝒆 ∈ 𝑷, 𝑺𝒆 = 𝟎: 𝒒𝒖𝒂𝒔𝒊 𝒔𝒕𝒆𝒂𝒅𝒚 𝒔𝒕𝒂𝒕𝒆 𝒆𝒊 ≥ 𝟎, 𝒊 ∈ 𝑰𝒓𝒓𝒆𝒗: 𝒕𝒉𝒆𝒓𝒎𝒐𝒅𝒚𝒏𝒂𝒎𝒊𝒄𝒂𝒍 𝒇𝒆𝒂𝒔𝒊𝒃𝒊𝒍𝒊𝒕𝒚 𝒇𝒐𝒓 𝒂𝒍𝒍 𝒆′ ∈ 𝑷: 𝑷 𝒆′ ∈ 𝑷 𝒆 → 𝒆′ = 𝟎𝒐𝒓 𝒆′ = 𝒆 𝒐𝒓 𝒆′ = −𝒆: 𝒆𝒍𝒆𝒎𝒆𝒏𝒕𝒂𝒓𝒊𝒕𝒚 In other words, e is an EM if and only if it works at quasi steady state, is thermodynamically feasible and there is no other non-null flux vector (up to a scaling) that both satisfies these constraints and involves a proper subset of its participating reactions. Note that with this convention, reversible modes are here considered as 23 two vectors of opposite directions. These descriptions are quoted from the paper 《Computation of elementary modes: a unifying framework and the new binary approach》written by Julien Gagneur and Steffen Klamt. [31] 2.43 Calculate EFM matrix. Some basic properties of EFMs. [35] Lemma 1. All vectors V fulfilling conditions 1 and 2 above either represent elementary modes or are positive linear combinations of vectors representing elementary modes, 𝑽= 𝜼𝒍 𝒎𝒍 , 𝜼𝒍 > 0, 𝒍 Where η is the rank of stoichiometry matrix, the sum runs over at least two different indices l, and all 𝒎𝒍 have zero components wherever V has zero components and include at least one additional zero each, 𝑺 𝑽 ⊂ 𝑺 𝒎𝒍 . All those 𝒎𝒍 that enter the above equation represent reversible elementary modes if and only if V represents a reversible flux mode. [35] Lemma 2. For any pair of vectors, 𝑽∗ 𝒂𝒏𝒅 𝑽∗∗ , with 𝑽∗ representing an elementary flux mode and 𝑽∗∗ representing a flux mode and having zero components wherever 𝑽∗ has zero components: 𝑺 𝑽∗ ⊆ 𝑺 𝑽∗∗ , 𝑽∗∗ either represents the same elementary mode as 𝑽∗ or the same elementary mode as -𝑽∗, which implies 𝑺 𝑽∗ = 𝑺 𝑽∗∗ . [35] Lemma 3. For two reaction systems A and B differing only in that some reactions are reversible in B while being irreversible in A, all elementary modes of A are also elementary modes in B, which may involve additional elementary modes. [35] Lemma 4. Any vector representing an elementary mode involves at least γ -1 zero components, with γ = r – η denoting the dimension of the null-space of N. where N denote the stoichiometry matrix, and r is the number of reactions. [35] 24 Above lemmas are quoted from the paper 《Reaction routs in biochemical reaction systems: Algebraic properties, validated calculation procedure and example from nucleotide metabolism》written by S.schuster, C. Hilgetag. J.H. Woods and D.A. Fell [35]. There are possibly hundreds of EFMs in a system even for a small metabolic network, the computation for them are usually achieved with the help of computer software, such as program EMPAHT by John Woods, Oxford, C program METATOOL, Pfeiffer et al., 1999 and MAPLE program METAFLUX, Klaus Mauch, Stuttgart. In this section, the mathematical concept behind it will be described with example (1). [34] The algorithm basically seeks special solutions to a system of linear homogeneous equations and inequalities. 1. Start with the tableau: 𝐓 0 = 𝐓 𝐍𝐫𝐞𝐯 𝐈 𝟎 . 𝐓 𝐍𝐢𝐫𝐫 𝟎 𝐈 Where N represents the stoichiometric matrix of the network. For example (1), the stoichiometric matrix was already constructed, thus we have the first tableau: A B C D E F R1 R2 R3 R4 R5 R6 R7 R8 R9 R1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 R2 -1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 R3 -1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 R4 0 -1 0 2 -1 0 0 0 0 1 0 0 0 0 0 R5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 R6 0 -2 1 0 0 1 0 0 0 0 0 1 0 0 0 R7 0 0 -1 1 0 0 0 0 0 0 0 0 1 0 0 R8 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 R9 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 𝐓0 This example is relatively easy because there is no reversible reaction in the network. Form the tableau 𝑻(𝒋) 𝒋 = 𝟎, 𝟏, 𝟐, … , 𝒏 − 𝟏 , the elements of which are (𝒋) 𝑻 denoted by 𝒕𝒊,𝒉 , calculate the next tableau, 𝑻(𝒋) = 𝑵(𝒋+𝟏) 𝑴𝒋 flux mode) in the following way:[34] 25 (M denote 1. (𝒋) For each row of 𝑴𝒋 , determine the set 𝑺(𝒎𝒊 ) recording the position of the zeroes. 𝑻 2. First each row of 𝑻(𝒋) with a zero in the (j+1)th column of 𝑵(𝒋) are copied into 𝑻(𝒋+𝟏) . Then, new rows formed by allowed linear combinations of pairs 𝑻 of rows of 𝑵(𝒋) consecutively fo into 𝑻(𝒋+𝟏) if they fulfill the conditions: 𝒋 𝒋 𝒕𝒊,𝒋+𝟏 ∙ 𝒕𝒎,𝒋+𝟏 ≠ 𝟎, This condition will constraint the reversible reaction goes in the right direction. 𝑺 𝒎𝒊 𝒋 𝒋 (𝒋+𝟏) ∩ 𝑺 𝒎𝒎 ⊈ 𝑺(𝒎𝒍 ) Above descriptions are quoted from the paper 《Description of the algorithm for computing elementary flux modes.》written by Stefan Schuster, Thomas Dandekar and David Fell. Where 𝒎𝒊 𝒋 represents the i-th row of the right part of 𝐓 (𝐣) , 𝐒 𝐦𝐢 𝐣 is the set of zeroes in that row. This indicates that such enzymes are not used in the set of reactions. In 𝐓 𝟎 , the entries of first column in 4,5,6,7,8,9th rows are 0, so they can be copied into the next tableau without any combinations. Combining first row with second and third row respectively will make their first column equal to 0, and put into the second tableau. NB: If there exists reversible reactions, the result of combining two reversible reactions should be put into the reversible part of next tableau, however the combination of reversible and irreversible reactions should be put into the irreversible part of next tableau. [34] By combining R1 with R2 and R3 respectively, we get tableau 1: A B C D E F R1 R2 R3 R4 R5 R6 R7 R8 R9 R1+R2 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 R1+R3 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 R4 0 -1 0 2 -1 0 0 0 0 1 0 0 0 0 0 R5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 R6 0 -2 1 0 0 1 0 0 0 0 0 1 0 0 0 R7 0 0 -1 1 0 0 0 0 0 0 0 0 1 0 0 R8 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 R9 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 T 1 26 Combine first row with third row and fifth row respectively. A B C D E F R1 R2 R3 R4 R5 R6 R7 R8 R9 R1+R2+R4 0 0 0 2 -1 0 1 1 0 1 0 0 0 0 0 R1+R3 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 R5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 2R1+2R2+R6 0 0 1 0 0 1 2 2 0 0 0 1 0 0 0 R7 0 0 -1 1 0 0 0 0 0 0 0 0 1 0 0 R8 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 R9 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 T2 Applying the same algorithm, zero one column at a step. A B C D E F R1 R2 R3 R4 R5 R6 R7 R8 R9 R1+R2+R4 0 0 0 2 -1 0 1 1 0 1 0 0 0 0 0 R1+R3+R7 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 R5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 2R1+2R2+R6+R7 0 0 0 1 0 1 2 2 0 0 0 1 1 0 0 R8 0 0 0 -1 0 0 0 0 0 0 0 0 0 1 0 R9 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 R1 R2 R3 R4 R5 R6 R7 R8 R9 T A B C D E 3 F R1+R2+R4+2R8 0 0 0 0 -1 0 1 1 0 1 0 0 0 2 0 R1+R3+R7+R8 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 R5 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 2R1+2R2+R6+R7+R8 0 0 0 0 0 1 2 2 0 0 0 1 1 1 0 R9 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 T 4 A B C D E F R1 R2 R3 R4 R5 R6 R7 R8 R9 R1+R2+R4+2R8+R5 0 0 0 0 0 0 1 1 0 1 1 0 0 2 0 R1+R3+R7+R8 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 2R1+2R2+R6+R7+R8 0 0 0 0 0 1 2 2 0 0 0 1 1 1 0 R9 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 T5 Thus the final elementary modes is R1 R2 R3 R4 R5 R6 R7 R8 R9 R1+R2+R4+2R8+R5 R1+R3+R7+R8 1 1 1 0 2R1+2R2+R6+R7+R8 2 2 𝐓 0 1 1 0 1 0 0 0 0 1 2 1 0 0 0 0 0 1 1 1 1 6 As stated above, there are numbers of computer program to evaluate elementary flux modes for real life network, “efmtool” package is used in this thesis, it is a Java 27 based program and has been integrated into MATLAB.[50] Only two inputs required, the stoichiometric matrix and the vector determines the reversibility of reactions, 1 for reversible reactions and 0 for irreversible reactions, full details are given in the Appendix A. The above example is performed by the program, which gives the same result. 2.44 Calculation of flux in elementary modes. Now we know how to evaluate the elementary flux modes matrix, the next step would be estimating the flux distribution. Let vector w of length m denote the elementary mode coefficients which determine how much of each elementary modes contribute to the whole network. At steady state, the flux carried by any particular reaction is the sum of fluxes of that reaction in each elementary flux modes, multiplied by the coefficient, so denote the fluxes as a row vector v, then the mathematical representation of these fluxes is expressed: [36] vi = j=m j=1 Eij wj (1). Or in the matrix form: 𝐯 = 𝐄𝐰. (2) It is possible to gain some knowledge of flux vector v through experiments, or by determine certain reaction flux in the way that we are interested, therefore we could use these data to estimate the elementary mode coefficients w. Thus: 𝐰 = f 𝐄, 𝐯𝟎 , (2) Where v0 is a vector of observed or determined fluxes. f are some functions that evaluate 𝐰.Then these results can be substituted back to the above equation, hence estimate the unobserved fluxes.[36] 𝐯=𝐄𝐰 For the non-invertible problem of E, it can be solved by the method of pseudo-inverse, single valued decomposition is the method which will be discussed later. Then denote the generalized inverse matrix 𝐄﹟ , then the above equation can be rearranged.[36] 𝐰 = 𝐄﹟ 𝐯 T , 4 As stated above, only some of the fluxes can be observed, so divided the vector v into two components: 𝐯 = 𝐯𝟎 , 𝐯𝐱 , 28 flux Where 𝐯𝟎 are the observed fluxes, and 𝐯𝐱 are the non-observed fluxes. The same apply to E. 𝐄 𝐄= 𝟎 . 𝐄𝐱 So by the observed part, we can calculate the w. # 𝐰 = 𝐄𝟎 𝐯𝟎𝐓 (5). Hence estimate the unobserved fluxes: 𝐯𝟎 , 𝐯𝐱 = 𝐰 𝐄𝟎 . 𝐄𝐱 The above equation has the following properties: [36] 1. In the case that the problem is an over determined system, 𝒘 represents the least-squares solution of the problem. 2. In the case that the system is exactly determined, where # 𝑬𝟎 = 𝑬−𝟏 𝟎 and this then simply represents the solution of a set of linear simultaneous equations. 3. In the case that the system is underdetermined, the most likely case in this context, equation (4) generates the minimum norm solution, i.e., it minimizes 𝒘𝟐𝒊 The above properties are quoted from the paper 《A method for the determination of flux in elementary modes, and its application to Lactobacillus rhamnosus.》written by M.G. Poolman, K.V. Venakatesh, M.K. Pidcock and D.A. Fell [36]. As mentioned above, pseudo-inverse could be used to calculate the inverse of elementary flux modes matrix. The basic mathematics of pseudo-inverse is single value decomposition. Single value decomposition is a way to factorize a matrix into a product of three simpler matrices, hence study the underlying structure of the matrix. This method has been used in many fields, such as solving homogeneous linear equations, latent semantic analysis, clustering and features, etc. [37] Suppose we have a m×n matrix S, we have the following theorem.[37-39] Let S be any real m ×n matrix with rank r. the we can write 𝑺 = 𝑼𝜮𝑽𝑻 , where U and V are orthogonal, and Σ is m by n diagonal matrix, with r nonzero entries given by the positive square roots of eigenvalues of 𝑺𝑻 𝑺 and 𝑺𝑺𝑻 , known as singular 29 values of S. The columns of U and V are called the left singular vectors and right singular vectors of S respectively. Matrix Σ takes the form: 𝜹𝟏 𝟎 𝟎 𝜹𝟐 𝜮= 𝟎 𝟎 … … 𝟎 𝟎 And 𝜹 take the order: 𝟎 … 𝟎 𝟎 … 𝟎 𝜹𝟑 … 𝟎 … … … 𝟎 … 𝜹𝒓 𝜹𝟏 ≥ 𝜹𝟐 ≥ 𝜹𝟑 ≥ ⋯ ≥ 𝜹𝒓 ≥ 𝟎, 𝟎 𝟎 𝟎 … 𝟎 𝒓 = 𝒎𝒊𝒏 𝒎, 𝒏 An example of SVD. Suppose we have a 3 ×4 matrix S. 2 −1 0 −1 2 −1 0 −1 2 0 0 −1 S= SS T = 1 5 −4 −4 −4 6 1 −4 6 0 1 −4 0 0 −1 2 0 1 = ST S −4 5 Eigenvalues and eigenvectors of S T S are: λ1 = 13.09, λ2 = 6.854, λ3 = 1.91, λ4 = 0.146. e1 = 0.372, −0.602, 0.602, 0.372 , e2 = −0.602, 0.372, 0.372, 0.602 , e3 = 0.602, 0.372, −0.372 0.602 , e4 = −0.372, −0.602, −0.602, 0.372 . Thus 3.618 0 Σ= 0 0 U= 0.372 −0.602 0.602 −0.372 0 2.618 0 0 0 0 1.328 0 0 0 0 0.328 −0.602 0.602 0.372 0.372 0.372 0.602 0.372 −0.372 0.602 −0.602 −0.602 0.372 30 V= 0.372 −0.602 −0.602 0.372 0.602 0.372 −0.372 −0.602 0.602 0.372 0.372 0.602 −0.372 0.602 −0.602 0.372 Then 0.372 −0.602 0.602 0.372 0.372 0.602 UΣV = −0.602 0.372 0.602 0.372 −0.372 0.602 −0.372 −0.602 −0.602 0.372 T = 3.618 0 0 0 0 2.618 0 0 0 0 1.328 0 0 0 0 0.328 2 −1 0 −1 2 −1 0 −1 2 0 0 −1 0.372 −0.602 0.602 0.372 −0.602 0.372 0.372 0.602 0.602 0.372 −0.372 0.602 −0.372 −0.602 −0.602 0.372 0 0 = S. −1 2 SVD for pseudo-inverse.[37-39] Pseudo-inverse is a way to solve the following linear equation: [37-39] 𝐒 ∙ 𝐚 = 𝐛, 𝐚 ∈ Rm ; 𝐛 ∈ Rn ; 𝐒 ∈ Rm×n . The general solution can be find by the equation: 𝐚 = 𝐒 +𝐛. S + is the pseudo-inverse matrix of S, there are some properties for the unique matrix: [37-39] 𝑺𝑺+ 𝑺 = 𝑺, 𝑺+ 𝑺𝑺+ = 𝑺+ , 𝑺𝑺+ 𝑻 = 𝑺𝑺+ , 𝑺+ 𝑺 𝑻 = 𝑺+ 𝑺 Sometime the matrix S is invertible, so SVD is used to find 𝐒 +. 𝐒 + = 𝐕𝚺 +𝐔 𝐓 . Where the matrix 𝚺 + is equal to: 1/δ1 0 0 1/δ2 Σ+ = 0 … 0 … 0 0 0 … 0 0 0 … 0 0 1/δ3 … 0 0 … … … … 0 … 1/δr 0 So using the same example above, we find S −1 : 0.8 S−1 = 0.6 0.4 0.2 0.6 1.2 0.8 0.4 31 0.4 0.8 1.2 0.6 0.2 0.4 0.6 0.8 T UΣ +V T 0.372 −0.602 0.602 0.372 0.372 0.602 = −0.602 0.372 0.602 0.372 −0.372 0.602 −0.372 −0.602 −0.602 0.372 1/3.618 0 0 0 0 1/2.618 0 0 0 0 0 1/1.328 1/0.328 0 0 0 0.8 = 0.6 0.4 0.2 0.6 1.2 0.8 0.4 0.4 0.8 1.2 0.6 0.372 −0.602 0.602 0.372 −0.602 0.372 0.372 0.602 0.602 0.372 −0.372 0.602 −0.372 −0.602 −0.602 0.372 0.2 0.4 = S+ = S −1 . 0.6 0.8 Above method could find 𝐄# to help calculate the flux vector in elementary flux mode analysis. Furthermore singular value decomposition can be applied to decompose stoichiometric matrixes to evaluate the flux vector. [40]. 2.45 Shannon’s MEP for elementary mode analysis Apart from using Poolman‟s Morre-Penrose generalized inverse method, there are few other methods which could determine the metabolic flux vector for elementary flux modes. For example α-spectrum method [41], this method defines which elementary mode can constitute the metabolic flux vector of a physiological state and the available ranges of elementary mode coefficient by maximizing and minimizing each elementary mode, the term elementary mode coefficient is like a weighting factor which represents how much each elementary mode contributes to the whole network. This method only gives range of flux vectors, so in order to obtain a final solution, each maximum and minimum elementary mode coefficient are averaged to get the statistical mean value using linear programming [42], constrained by experimentally determined fluxes. Based on this method, Kurata et al. further introduced a heuristic, non-mechanistic model to determine the range of flux vectors of mutants by incorporating into the model the enzymatic activities of mutant relative to the wild type and its metabolic flux vectors.[41,43]. The optimization method is also close toα-spectrum which based on linear programming and constrained by some experimentally determined exchange fluxes. Thus for both methods, more experimentally determined fluxes or transcriptional regulatory constraints available would give a better result. Although all the methods introduced above have been successfully applied in real live problems, for exampleα-spectrum has been applied to investigate the E.coli central metabolism (Wiback et al. 2004) and human red blood cell metabolism (Wiback et al. 2003). Kurata‟s method has been applied to several Saccharomyces species under various growth conditions. [41]. They have one disadvantage in common which is lack of a physical or a biological background behind those methods. Quanyu Zhao and Hiroyuki Kurata proposed a method which utilize maximum entropy principle and nonlinear programming method to optimize elementary mode coefficients. The maximum entropy principle (MEP) is derived from Shannon‟s information theory and is widely used in physic, chemistry, and bioinformatics for 32 T gene expression and sequence analysis. [43]. Entropy is the core concept of thermodynamics and statistical physics, law of entropy also known as second law of thermodynamics is one of the greatest results in natural science in 19th century. It was first proposed by Clausius in 1864. In 1948, C.E.Shannon introduced the concept into information theory, entropy was regarded as the measurement of the uncertainty of a random event or the amount of information. Hence the uncertainty of a random event can be described by probability distribution function. [44] Maximum entropy principle was introduced by E.R.Jaynes, its basic idea is, when only a part of information of the distribution is known, the one with the maximum entropy which satisfy the known part should be chose. In other word, the probability distribution that satisfies the known information is not unique, the one which gives the maximum entropy should be the best and we should not make any assumptions about the unknown part. Thus by the constraints of known information, the estimation of unknown information is the most indeterminate or most random estimation, any other choice would presume that other constraints or assumptions are added, however those extra constraints and assumptions cannot be supported by the known information, hence bring destruction of the results.[44] Shannon‟s entropy describes the uncertainty of a random event, let the probability of a discrete random variable x taking value 𝐀 𝐱 be 𝐏𝐤 , k = 1,2, … , N, 𝐏𝐤 > 0, n k=1 𝐏𝐤 = 1. Then the entropy is defined as : N H x = − K=1 Pk ln Pk , (1) The flux distribution for elementary mode analysis is generally denoted as: 𝐯 = 𝐏 ·𝛌, (2) Where P is the elementary mode matrix in which the rows represent the reactions, and the columns correspond to the elementary modes. λ is the elementary mode coefficient vector and v is the flux vector. [42]. The elementary mode matrix P is taken the form: e1,1 e1,2 … e1,m e2,1 e2,2 … e2,m 𝐏= ⋮ ⋮ ⋮ ⋮ . en,1 en,2 … en,m In Zhao and Kurata‟s Shannon;s MEP for elementary mode analysis method, each elementary mode is regarded as a random event. As stated above, elementary modes are a set of all possible pathways for a metabolic network, each elementary mode excluding internal loops must have an uptake reaction. Thus from equation 2 above, the flux of a substrate uptake reaction should be calculated as:[42] 33 i esubstrate uptake ,i vsubstrate Where 𝑒𝑠𝑢𝑏𝑠𝑡𝑟𝑎𝑡𝑒 𝑢𝑝 𝑡𝑎𝑘𝑒 ,𝑖 ·λi = 1. uptake is the element for the substrate uptake reaction in the i-th elementary mode in matrix P, and 𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆 is the flux of substrate uptake. This is obvious from equation (2), the flux is equation to dot produce of elementary mode coefficient vector and the rows of elementary mode matrix. Then based on equation 1, the probability of the i-th elementary mode in Shannon‟s entropy is provided as follows: [42] ρi = 1 vsubstrate uptake esubstrate uptake ,i ·λi , 3 Thus the algorithm of Zhao and Kurata‟s method is defined below: ne Max − i=1 ρi ln ρi , Subject to 𝐞𝐝 ·𝛌 = 𝐯𝐝 4 5 ne i=1 ρi = 1, (6) 𝛌 > 0 (7) Where ne is the total number of elementary modes, 𝐯𝐝 is the vector whose fluxes are to be determined and 𝐞𝐝 is the elementary mode sub-matrix that consists of rows corresponding to the determined fluxes. It is clear that the objective function here (equation (4)) is nonlinear and constrained by experimentally determined fluxes. The results would be the maximum elementary mode coefficients which satisfy the constraints, the flux distribution is then calculated by equation (5) [42]. This method has its disadvantage too, thus the calculation complexity for a moderate or large-scale metabolic model. To obtain reliable estimation for such model, Zhao and Kurata propose the MEP algorithm coupled with Lagrange Multipliers. This method readily optimizes hundreds of thousands of elementary mode coefficients in large-scale networks under different types of environmental and genetic perturbations. [45]. Lagrange multiplier Lagrange multiplier is a method of finding the maxima or minima of a function subject to constrains. It is named after Lagrange who was one of the greatest mathematicians in 18th century. [46,47] Suppose we want to find: Max f x1 , x2 , … , xn , 34 y1 x1 , x2 , … , xn = c1 Subject to: y2 x1 , x2 , … , xn = c2 . ⋮ ym x1 , x2 , … , xn = cm Lagrange introduced a new concept called Lagrange multiplier (usually denoted as λs), and refined the Lagrange function as: F x1 , x2 , … , xn = f x1 , x2 , … , xn + λ1 y1 x1 , x2 , … , xn − c1 + λ2 y2 x1 , x2 , … , xn − c2 + ⋯ + λm ym x1 , x2 , … , xn − cm By the first order conditions for a constrained optimum, maximizing xi s implies dF dx i = 0.[48]. Thus we have: dF df dy1 dy2 dym =0= + λ1 + λ2 + ⋯ λm dx1 dx1 dx1 dx1 dx1 dF df dy1 dy2 dym =0= + λ1 + λ2 + ⋯ λm dx2 dx2 dx2 dx2 dx2 ⋮ dF df dy1 dy2 dym =0= + λ1 + λ2 + ⋯ λm dxn dxn dxn dxn dxn Thus by solving above n equations and m constraint equations should be able to find the a unique solution of n+m variables, thus (x1 , x2 , … , xn ) and (λ1 , λ2 , … , λm ) (just the problem of solving the determined function). [48] Numerical example: Find the optimal solution of the following objective function: Max f x, y = 2x 2 + xy − y Subject to the constraint: x + 2y = 3 Solution: The Lagrange function is: F = 2x 2 + xy − y + λ(x + 2y − 3) Let the partial derivatives equal to zero: dF = 4x + y + λ = 0 dx dF = x − 1 + 2λ = 0 dy dF = x + 2y − 3 = 0 dλ Solving above equations: x=7 y = −2 λ = −6 Check this solution by bring the result into the constraint equation: 35 3 = x + 2y = 7 − 4 = 3 This gives the solution of the objective function: f x, y = 2x 2 + xy − y = 98 − 14 + 2 = 86 Lagrange multipliers coupled MEP algorithm. As stated above, the goal is to find:[42,45] 𝒏𝒆 𝑴𝒂𝒙 − 𝝆𝒊 𝒍𝒏 𝝆𝒊 , (𝟏) 𝒊=𝟏 Subject to 𝒆𝒅 ·𝝀 = 𝒗𝒅 (𝟐) 𝒏𝒆 𝝆𝒊 = 𝟏, (𝟑) 𝒊=𝟏 Where 𝝆𝒊 = 𝟏 𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝝀𝒊 (𝟒) Rearrange equation (1): 𝝀𝒊 = 𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆 𝝆 , (𝟓) 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝒊 Bring the new equation into the first constraint equation:[45] 𝒏𝒅 𝒗𝒅 = 𝒊=𝟏 𝒆𝒅,𝒊 𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆 𝝆 (𝟔) 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝒊 Where nd is the number of determined fluxes. [45] For easier notation, let: 𝑿𝒅,𝒊 = 𝒆𝒅,𝒊 𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝒊𝒇 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 ≠ 𝟎 𝟎 (𝟕) 𝒊𝒇 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 = 𝟎 The constraint equation becomes: [45] 𝒏𝒅 𝒊=𝟏 𝑿𝒅,𝒊 𝝆𝒊 = 𝒗𝒅 (𝟖) The Lagrange function with Lagrange multipliers 𝝋𝒋 then can be defined as: 𝒏𝒆 𝑭 𝝆𝒊 , 𝝋𝒋 = − 𝒏𝒅 − 𝒋=𝟏 𝝋𝒋 𝒏𝒆 𝝆𝒊 𝒍𝒏 𝝆𝒊 − 𝝋𝟎 𝒊=𝟏 𝒏𝒅 𝒊=𝟏 𝑿𝒅,𝒊 𝝆𝒊 − 𝒗𝒅 36 𝝆𝒊 − 𝟏 𝒊=𝟏 (𝟗) Let the partial derivatives equal to zero: 𝒏𝒅 𝒅𝑭 = 𝟎 = −𝟏 − 𝐥𝐧 𝝆𝒊 − 𝝋𝟎 − 𝝋𝒋 𝑿𝒅,𝒊 𝒊 = 𝟏, 𝟐, … , 𝒏 𝒅𝝆𝒊 𝒋=𝟏 Let 𝝋𝟎 = 𝐥𝐧(𝒛) − 𝟏 We have: (𝟏𝟎) 𝒏𝒅 𝒍𝒏𝝆𝒊 + 𝒍𝒏 𝒁 = − 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 𝒏𝒅 𝒍𝒏(𝝆𝒊 𝒁) = − 𝝋𝒋 𝑿𝒅,𝒊 𝒏𝒅 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 𝒆𝒙𝒑(− 𝝆𝒊 = 𝒋=𝟏 ) 𝒁 (𝟏𝟏) By the second constraint: 𝒏𝒆 𝝆𝒊 = 𝟏 𝒊=𝟏 Then: 𝒏𝒆 𝒏𝒅 𝒆𝒙𝒑 − 𝒊=𝟏 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 =𝒁 And 𝝆𝒊 = 𝒆𝒙𝒑(− 𝒏𝒆 𝒊=𝟏 𝒆𝒙𝒑 𝒏𝒅 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 ) − 𝒏𝒅 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 (𝟏𝟐) Thus by the rearranged constraint: [45] 𝒏𝒆 𝒏𝒅 𝒊=𝟏 𝑿𝒅,𝒊 𝒆𝒙𝒑 − 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 𝒏𝒆 𝒏𝒅 𝒊=𝟏 𝒆𝒙𝒑 − 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 – 𝒗𝒓 = 𝟎 𝒓 = 𝟏, 𝟐, … , 𝒎 (𝟏𝟑) The above method described are quoted from the paper 《Use of maximum entropy principle with Lagrange multipliers extends the feasibility of elementary mode analysis》written by Quanyu Zhao and Hiroyuki Kurata. [45] A worked example: Example (1) is used to perform above method. The given network is a typical metabolic network which has been used in many papers as an example for different methods. The network is defined as: [49] 37 Figure 2.451. Simple metabolic network. The stoichiometrix matrix was read from the network in section 2.1: v1 v2 v3 v4 v5 v6 v7 v8 v9 A 1 -1 -1 0 0 0 0 0 0 B C 0 0 1 0 0 1 -1 0 0 0 -2 1 0 -1 0 0 0 0 D E F 0 0 0 0 0 0 0 0 0 2 -1 0 0 1 0 0 0 1 1 0 0 -1 0 0 0 0 -1 Table 2.451. Stoichiometrix matrix of the simple metabolic network. Hence the elementary mode were calculated above. E1 E2 E3 v1 2 1 1 v2 v3 v4 2 0 0 0 1 0 1 0 1 v5 v6 v7 0 1 1 0 0 1 1 0 0 v8 1 1 2 v9 1 0 0 Table 2.451. Elementary flux modes matrix of the simple metabolic network. Thus we have: v1 2 = v5 0 1 1 0 1 λ1 λ2 λ3 v1 = 2λ1 + λ2 + λ3 v5 = λ3 38 vu.p = v1 + v5 = 2λ1 + λ2 + 2λ3 By equation (5) above: (∗) vu.p vu.p ρ1 = ρ eu.p 2 1 vu.p vu.p λ2 = ρ2 = ρ eu.p 1 2 vu.p λ3 = ρ = v5 2 3 λ1 = Thus the new constraint is: vu.p ρ1 + vu.p ρ2 + v5 = v1 The Lagrange function is: ne F ρi , φj = − i=1 ρi ln ρi – φ1 ρ1 + ρ2 + ρ3 − 1 − φ2 vu.p ρ1 + vu.p ρ2 + v5 − v1 dF = −1 − lnρ1 − φ1 − vu.p φ2 = 0 dρ1 dF = −1 − lnρ2 − φ1 − vu.p φ2 = 0 dρ2 dF = −1 − lnρ3 − φ1 = 0 dρ3 Thus: ρ1 = ρ2 = exp −1 − φ1 − φ2 ρ3 = exp −1 − φ1 = ρ1 = ρ2 = 2v5 vu.p exp −1 − φ1 − φ2 2 2 vu.p exp −1 − φ1 − φ2 = v1 − v5 ρ1 = ρ2 = exp −1 − φ1 − φ2 = v1 − v5 2vu.p Checking the results: ρ1 + ρ2 + ρ3 = v1 − v5 v1 − v5 v5 2v1 − 2v5 + 4v5 2(v1 + v5) + + = = =1 2vu.p 2vu.p vu.p 2vu.p 2vu.p v1 − v5 v1 − v5 + + v5 = v1 2 2 vu.p v1 − v5 v1 − v5 λ1 = = 2 2vu.p 4 vu.p v1 − v5 v1 − v5 λ2 = = 1 2vu.p 2 vu.p ρ1 + vu.p ρ2 + v5 = 39 λ3 = vu.p ρ = v5 2 3 These results of 𝝀 also satisfy the constraint𝒆𝒅 ·𝝀 = 𝒗𝒅 . We can see in the original method there is only one uptake reaction, so only one row defines the uptake reaction in the elementary mode matrix. However there are two reactions which take fluxes into the network, so an extra assumption is made that the star equation must to be true, thus the two uptake reactions v1 and v5 are independent, so the sum of them would give the total uptake fluxes and also the corresponding rows of elementary mode matrix are independent as well so sum of them could give the total uptake reaction row in the elementary mode matrix. 2.46 Summary In this section, two of the most commonly used methods for metabolic network analysis have been introduced. The computation behind them can be very complicated for a large-scale network, and usually obtained by different software, such as COBRA method in MATLAB for FBA, and C programming METATOOL and FluxAnalyzer based on MATLAB for EFMS. The methods of metabolic network analysis have some great contributions of our biological researches, every method have its advantage and disadvantage. In order to achieve the desired information, it is important to choose the “best fit” model with given observations and specifically demand. The summary of two methods introduced above is given by the following figure. [32] Figure 2.452. Stoichiometric modeling: flux and pathway analysis. 40 Figure 2.452 is taken from the paper 《 Design and analysis of amino acid supplementation in hepatocyte culture using IN VITRO experiment and mathematical modeling》written by Hong Yang [32]. 41 3. Numerical results The two methods studied above are applied to analyze the tricarboxylic acid cycle, glyoxylate shunt adjacent amino acid network in this section. [28] The network defined below is a simplified network that nevertheless incorporates the key central metabolic elements of the citric acid cycle. The tricarboxylic acid cycle also known as citric acid cycle is the core of all living cells especially in aerobic organisms. The TCA cycle appears in the metabolic pathways that decompose major food groups such as carbohydrates, lipids and proteins into carbon dioxide and water to generate usable energy. Glyoxylate shunt also known as glyoxylate cycle is an alternative route involved in the network. This shunt ensures the system remain functional when there is only simple carbon compounds as a carbon source. It is a two-step bypass of TCA cycle derived by two key enzymes (isocitrate lyase Icl and malate synthase Mas). How the network works with two different uptake resources and how it reacts on changing the proportion of two resources are analyzed in this section. Figure 3.1 The tricarboxylic acid cycle, glyoxylate shunt and adjacent reactions. Abbreviations of metabolites and enzymes are given in the Appendix B. For above network, glucose goes into PG and acetate goes into AcCoA. Flux balance analysis and MEP combined elementary modes analysis are applied to find: 1. How the biomass varies by shifting the uptake fluxes form PG to AcCoA. 2. The activation of the glyoxylate shunt. To indentify whether the glyoxylate shunt is active or not, there are two important pathways: 1. Bypass 1 of glyoxylate shunt: acetyl-CoA (AcCoA) + oxaloacetate (OAA) → cirtrate (Cit) → isocitrate (IsoCit) 42 → glyoxylate (Gly) + acetyl-CoA (AcCoA) (OAA). → malate (Mal) → oxaloacetate 2. Bypass 2 of glyoxylate shunt: acetyl-CoA (AcCoA) + oxaloacetate (OAA) → cirtrate (Cit) → isocitrate (IsoCit) → succinate (Succ) →fumarase (Fum) → malate (Mal) → oxaloacetate (OAA). The two key reactions of glyoxylate shunt are: isocitrate (IsoCit) → glyoxylate (Gly) + succinate (Succ) and acetyl-CoA (AcCoA) + glyoxylate (Gly) →malate (Mal). 3.1 Flux balance analysis 1. Reconstruct the network. Enzymes Reactions 1 Eno PG↔PEP 2 Ppc PEP→OAA 3 Pyk PEP→Pyr 4 Pck OAA→PEP 5 Pps Pyr→PEP 6 AceEF Pyr→AcCoA 7 GltA OAA+AcCoA→Cit 8 Mdh OAA↔Mal 9 Ac n Cit↔IsoCit 10 Icl IsoCit→Gly+Succ 11 Mas AcCoA+Gly→Mal 12 Fum Mal↔Fum 13 Sdh Succ↔Fum 14 SucCD SucCoA↔Succ 15 SucAB OG→SucCoA 16 Icd IsoCit→OG 17 AspC OAA+Glu↔OG+Asp 18 AspA Asp→Fum 19 Gdh OG↔Glu 20 IlvE/AvtA OG+Ala↔Pyr+Glu 21 Eno *→PG 22 EX_AcCoA *→AcCoA 23 AspCon Asp→AspCon 24 AlaCon Ala→AlaCon 25 GluCon Glu→GluCon 26 SucCoACon SucCoa→SucCoACon 27 Biomass Asp+Glu+Ala+SucCoA→ Table 3.11 Reactions read from figure 1. 43 Reaction 21 and 22 represent the two possible uptake reactions. The original network contain only 25 reactions [28], however in this thesis, the goal is to analysis the variation of biomass and activation of glyoxylate shunt as the uptake fluxes shift from glucose to acetate, so two extra reactions are added into the network, which are the biomass reaction and the income reaction of AcCoA. 2. Mathematically represent metabolic reactions and constraints. Table 3.12 Stoichiometric matrix construct by table 3.11. 3. Define the constraints and objective function Max F = vbiomass 27 Such that j=1 Sij vi = 0, 𝐯𝐣𝐦𝐢𝐧 ≤ 𝐯𝐢 ≤ 𝐯𝐣𝐦𝐚𝐱 , i∈M j∈N 0 For irreversible reactions −∞ For revversible reaction v For uptake reactions. = uptake ∞ Otherwise. 𝐯𝐣𝐦𝐢𝐧 = 𝐯𝐣𝐦𝐚𝐱 The fluxes are normalized which means if all uptake fluxes are coming from reaction 21, then vuptake ,21 = 1. Because we are interested in the variation of biomass when the uptake flux shift from glucose to acetate, so the final step is to find: 𝐯biomass = x 𝐏𝐆 + 1 − x 𝐀𝐜𝐂𝐨𝐀 , 44 x = 1,0.99,0.98, … ,0.01,0. (1) Since the fluxes are normalized, that means run above calculation x times, each time the uptake flux shift 0.01 from PG to AcCoA. The result is calculated by optimization toolbox in MATLAB. Results: x= 1 0.8 0.6 0.4 0.2 0 R1 1.00 0.80 0.60 0.40 0.20 0.00 R2 0.60 0.60 0.42 0.24 0.07 0.00 R3 0.40 0.20 0.18 0.16 0.13 0.11 R4 0.00 0.00 0.00 0.00 0.00 0.11 R5 0.00 0.00 0.00 0.00 0.00 0.00 R6 0.20 0.00 0.00 0.00 0.00 0.00 R7 0.20 0.20 0.29 0.38 0.47 0.56 R8 0.20 0.20 -0.04 -0.29 -0.53 -0.78 R9 0.20 0.20 0.29 0.38 0.47 0.56 R10 0.00 0.00 0.11 0.22 0.33 0.44 R11 0.00 0.00 0.11 0.22 0.33 0.44 R12 0.20 0.20 0.07 -0.07 -0.20 -0.33 R13 0.20 0.20 0.07 -0.07 -0.20 -0.33 R14 0.20 0.20 0.18 0.16 0.13 0.11 R15 0.00 0.00 0.00 0.00 0.00 0.00 R16 0.20 0.20 0.18 0.16 0.13 0.11 R17 0.20 0.20 0.18 0.16 0.13 0.11 R18 0.00 0.00 0.00 0.00 0.00 0.00 R19 0.60 0.60 0.53 0.47 0.40 0.33 R20 -0.20 -0.20 -0.18 -0.16 -0.13 -0.11 R21 1.00 0.80 0.60 0.40 0.20 0.00 R22 0.00 0.20 0.40 0.60 0.80 1.00 R23 0.00 0.00 0.00 0.00 0.00 0.00 R24 0.00 0.00 0.00 0.00 0.00 0.00 R25 0.00 0.00 0.00 0.00 0.00 0.00 R26 0.00 0.00 0.00 0.00 0.00 0.00 R27 0.20 0.20 0.18 0.16 0.13 0.11 Table 3.13 Some of the results from MATLAB. The MATLAB code for FBA is given in the Appendix C. The full flux table is given in the supplementary file. By plotting equation (1), we can see the biomass general decreases as the uptake fluxes shifting from glucose to acetate. When the glucose is the only uptakes, the 45 fluxes of the end products take up 20% of the total income fluxes. Once the acetate income fluxes flow in the network exceeds 20% of the total flux in, the biomass starts to decrease smoothly. At the end, when only acetate flows into the network, the biomass reaches the lowest point which is only approximately half of the amount when there is only glucose flow in. Biomass = xPG + (1-x)AcCoA 0.2500 biomass 0.2000 0.1500 0.1000 FBA 0.0500 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.0000 x Figure 3.11. Biomass plot as uptake fluxes shift from PG to AcCoA by FBA. TCA cycle: 0.7 0.6 0.5 Flux 0.4 OAA→PEP 0.3 PEP→OAA 0.2 PEP→Pyr 0.1 Pyr→AcCoA -0.1 1 0.93 0.86 0.79 0.72 0.65 0.58 0.51 0.44 0.37 0.3 0.23 0.16 0.09 0.02 0.0 x Figure 3.12. Fluxes of the reactions that enter the TCA cycle and produce AcCoA from glucose. 46 Figure 3.12 shows how the fluxes flow into the TCA cycle from glucose and produce AcCoA. We can see when all the uptake fluxes are glucose, 60% of the fluxes from glucose flow to TCA cycle and rest of the fluxes flow to the reactions that produce AcCoA. The flux of the pathway that produces AcCoA decrease with the income fluxes of glucose decrease. When the fluxes of glucose drop below 80% of the total uptake fluxes, there is no more AcCoA produced by such pathway, and the fluxes flow to the TCA cycle start to drop and vanish as soon as the income fluxes of glucose drop under 12%. Glyoxylate shunt: Figure 3.13. Two key reaction of activation of glyoxylate shunt. Figure 3.13 shows how the glyoxylate shunt actives in such network with given uptake flux patterns. Opposite to the TCA cycle, when there is not enough acetate flow into AcCoA, the glyoxylate shunt is deactivated, once the fluxes of glucose drop under 80%, in other word, the income fluxes flow from AcCoA into the network exceeds 20% of the total uptake fluxes, the glyoxylate shunt become active. Summary. We can see from above figures, flux balance analysis suggests that such system gives maximum biomass when at least 80% of uptake fluxes are glucose, once the percentage drop below this point, biomass start to decrease, however the glyoxylate shunt become activate. Beside it also suggests another turning point which is when the uptake fluxes are distributed as 12% of glucose and 88% of acetate, this is when there is no more fluxes flow into the TCA cycle from PG, showed in figure 3.12, so the entire system is relying on the glyoxylate shunt, however for such distribution biomass is only approximately half of the amount when the flux is distributed as 80% of glucose and 20% of acetate. 47 3.2 Elementary flux modes analysis with Shannon’s MEP. 1. Determinate the elementary flux modes. Using “efmtool” package [50], import the stoichiometric matrix defined above, 16 elementary flux modes were found for original network, 42 were found for the adjusted network where two extra reactions were added. The codes and elementary flux modes matrix are given in the Appendix D. E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 R1 0 2 3 1 3 2 3 1 1 2 1 2 1 0 0 1 R2 0 0 0 1 1 1 0 1 0 1 1 0 0 0 1 0 R3 1 2 3 0 2 1 3 0 2 1 0 2 1 0 0 1 R4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 R5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R6 0 2 3 0 2 1 3 0 2 1 0 2 0 0 0 1 R7 0 1 2 0 1 1 2 0 1 1 0 1 0 0 0 1 R8 0 -2 -2 1 0 0 -2 0 -2 0 0 -1 0 -1 0 -1 R9 0 1 2 0 1 1 2 0 1 1 0 1 0 0 0 1 R10 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 0 R11 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 0 R12 0 -1 -1 1 1 0 -1 0 -1 0 0 0 0 -1 0 -1 R13 0 1 1 -1 -1 0 1 0 1 0 -1 0 0 0 0 1 R14 0 0 0 -1 -2 0 0 0 0 0 -1 -1 0 0 0 1 R15 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 R16 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 R17 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 R18 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 R19 0 1 0 0 0 0 1 1 0 1 1 0 1 1 0 0 R20 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 R21 0 2 3 1 3 2 3 1 1 2 1 2 1 0 0 1 R22 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 R23 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 R24 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 R25 0 0 1 1 2 1 0 0 0 0 1 1 0 0 0 0 2 7 16 10 11 13 14 3 6 12 8 9 4 5 1 15 Table 3.21. 16 elementary flux modes for original network. Table 3.21 shows the 16 elementary flux modes for original network, the last line identifies the order given in the paper [28]. The 42 elementary flux modes matrix will be used in this section. Quanyu Zhao and Hiroyuki Kurata coded the algorithm of elementary flux modes analysis with Shannon‟s MEP method as well as the method combined with Lagrange multipliers, the packages were given in the supplementary file of their original papers 48 [42, 45, 51, 52]. This package is adapted in this thesis, the given metabolic network is relatively small which only contains 42 elementary mode, therefore the simple method is used. In Quanyu Zhao and Hiroyuki Kurata‟s paper [42, 45], the metabolic networks they analyzed only contain one uptake reaction, so as the code. In case to perform such analysis to the TCA cycle and glyoxylate shunt network where contain two uptake reactions, the code were adjusted. The new codes are given in the Appendix E. Results: Biomass 0.2 0.18 0.16 0.14 Flux 0.12 0.1 0.08 0.06 0.04 0.02 1 0.96 0.92 0.88 0.84 0.8 0.76 0.72 0.68 0.64 0.6 0.56 0.52 0.48 0.44 0.4 0.36 0.32 0.28 0.24 0.2 0.16 0.12 0.08 0.04 0 0 x Figure 3.21. The Biomass plot as uptake fluxes shift from PG to AcCoA by MEP. Figure 3.21 indicates that the biomass as the uptake fluxes shift from glucose to acetate. When the glucose is the only uptakes, the fluxes of the end products equal 18% of the total income fluxes. Once the income fluxes of acetate flow in the network exceed 36% of the total flux, the biomass starts to decrease smoothly. At the end, when only acetate flows into the network, the biomass reaches the lowest point which is only approximately half of the amount when glucose is the only uptakes. 49 TCA cycle: Figure 3.22. Fluxes of the reversible reaction that enter the TCA cycle and produce AcCoA from glucose by MEP. Figure 3.22 shows how the fluxes flow through the TCA cycle from glucose and produce AcCoA from glucose. We can see when all the uptake fluxes are glucose, 47.8% of the fluxes from glucose flow to TCA cycle and 56.9% of the fluxes flow to the reactions that produce AcCoA. The flux of the pathway that produces AcCoA decrease with the income fluxes from glucose. When the fluxes of glucose drop below 64% of the total uptake fluxes, there is no more AcCoA produced by such pathway, and the fluxes flow to the TCA cycle start to drop, and nearly reached zero as all uptake fluxes come from AcCoA. The vertical line shows a turning point where the uptake fluxes are distributed as 64% of glucose and 46% of acetate. 50 Glyoxylate shunt: AcCoA+Gly→Mal 0.5 0.4 0.4 0.3 0.3 flux 0.5 0.2 0.2 0.1 0.1 0 0 1 0.91 0.82 0.73 0.64 0.55 0.46 0.37 0.28 0.19 0.1 0.01 1 0.91 0.82 0.73 0.64 0.55 0.46 0.37 0.28 0.19 0.1 0.01 flux IsoCit→Gly+Succ x x Figure 3.23. Two key reaction of activation of glyoxylate shunt by MEP. Figure 3.23 shows how the glyoxylate shunt actives in such network with given uptake flux patterns. The MEP method shows that although only a very small amount of fluxes, approximately 9% of total fluxes appears in those reactions, the glyoxylate shunt is active even 100% of the uptake fluxes come from PG. Once the fluxes of glucose drop under 64%, in other word, the income fluxes flow from AcCoA into the network exceeds 36% of the total uptake fluxes, the glyoxylate shunt become more active. Summary. We can see from above figures, the MEP method suggests that such system gives maximum biomass when at least 64% of uptake fluxes are glucose, once the percentage drop below this point, biomass start to decrease. The glyoxylate shunt is active no matter how the uptake fluxes distributed. Moreover there are always fluxes flows into oxaloacetate (OAA) from PEP which represents the activation of TCA cycle. So the TCA cycle and the glyoxylate shunt are always work together. 3.3 Comparison of two methods. Both methods determined a turning point, for FBA the ratio of glucose and acetate is 8:2, for MEP the ratio is 6.4:3.6, once the fluxes of acetate exceeds such point, fluxes estimated by above two methods have a similar trend for most reactions, all comparison figures are given in Appendix F. The comparison of biomass estimation and activation of glyoxylate shunt are given below. 51 FBA vs MEP 0.2500 0.2000 Biomass 0.1500 0.1000 FBA MEP 0.0500 1 0.94 0.88 0.82 0.76 0.7 0.64 0.58 0.52 0.46 0.4 0.34 0.28 0.22 0.16 0.1 0.04 0.0000 x Figure 3.31. The predicted biomass results of FBA and MEP. Figure 3.31 shows the biomass predicted by FBA and MEP, although these two methods suggest different maximum and minimum value of biomass and different turning points, they all predict a decreasing trend once the ratio of two uptake fluxes reach a certain point and the rate of decreasing are very close too, as for FBA 0.1111/0.2 = 0.5555, for MEP 0.1/0.1818 = 0.5501. AcCoA+Gly→Mal 0.5 0.5 0.4 0.4 0.3 0.3 MEP 0.2 flux 0.2 FBA 0.1 FBA 0.1 0 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 0 -0.1 MEP -0.1 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux IsoCit→Gly+Succ x x Figure 3.32. The fluxes of two key reaction of activation of glyoxylate shunt predicted by two methods. The major different between FBA and MEP is that although both methods suggest that growth of acetate is necessary for the activation of glyoxylate shunt, FBA predicts no glyoxylate shunt activation when there is not enough acetate flow into AcCoA, MEP on the other side suggest the glyoxylate shunt is always active, and become more active when more acetate flow into the network. Moreover FBA suggests that once the fluxes of glucose drop under 12% of total uptakes, there is zero flux flow 52 into the TCA from phosphoenolpyruvate (PEP) as we can see from figure 3.12, the flux of reaction PEP→OAA is zero after that point, thus the system totally rely on the glyoxylate shunt. However on the other hand, MEP method suggests that TCA cycle and glyoxylate shunt both works at all the time. IsoCit→OG 0.25 0.2 flux 0.15 MEP 0.1 FBA 0.05 1 0.94 0.88 0.82 0.76 0.7 0.64 0.58 0.52 0.46 0.4 0.34 0.28 0.22 0.16 0.1 0.04 0 x Figure 3.33. The flux of IsoCit→OG estimated by FBA and MEP. Pyr→PEP 0.025 0.02 flux 0.015 0.01 MEP 0.005 FBA -0.005 1 0.94 0.88 0.82 0.76 0.7 0.64 0.58 0.52 0.46 0.4 0.34 0.28 0.22 0.16 0.1 0.04 0 x Figure 3.34. The flux of Pyr→PEP estimated by FBA and MEP. Figure 3.33 shows another different trends of fluxes estimated by FBA and MEP, where FBA predict a decrease, MEP suggest a slightly increase. Figure 3.34 shows MEP predict small amount fluxes of the reaction where FBA estimated none, the small amount of fluxes ensured that there is always possible of generate OAA by PEP even there is no fluxes flow in from PG to PEP. It is usually believed that this reaction is responsible for the swithch between TCA 53 cycle and the glyoxylate shunt [28], the glutamate (Glu) synthesis requires both 2-oxoglutarate (OG) and oxaloacetate OAA (either from PEP or regenerated via glyoxylate), so it is sensible to assume that IsoCit→OG always active or even increase to ensure the fluxes exist in the Pyr→PEP reaction as the fluxes flow from PEP to OAA decrease with the decrease of glucose flow in. 54 4. Conclusion In this thesis, flux balance analysis and elementary flux modes with Shannon‟s MEP analysis were studied and applied to analyzing real metabolic network. Both methods predict a similar flux distribution trend, however two very different results on the activation of glyoxylate shunt. Flux balance analysis is based on linear optimization programming with stoichimetric constraints, which is easy to conduct since complex biochemical and physical information are not required such as kinetic parameters, and it gives a reasonable estimation of the flux distribution, hence FBA is one of the most used methods in metabolic network flux analysis. However the simple objective function and constraints assume that all metabolites process as their maximum rate which may not be always true, so for further studies, how to construct the model with more biochemical background related objective function and constraints is a major goal. Elementary flux modes analysis method performs better in pathway and network structure analysis, when combining with Shannon‟s MEP, it utilizes experimentally data as constraints and a stronger biochemical related objective function and certainly evaluated some interesting behaviors of the metabolic network. However this method so far has only been proved efficient with one uptake reaction. In this thesis two uptake reactions were just simply added together to form the total uptake reaction which is too naive, it may be sensible when two uptakes are totally independent like the example given in section 2.45, however in the TCA cycle and glyoxylate shunt network, it is possible that PG and AcCoA regenerate one by another which could affect the estimation, so a more mathematically or biochemically procedure to obtain the total uptake reaction by several sub-uptake reaction could be helpful to further improve this method. 55 Reference [1] Fiehn O, Metabolomics—the link between genotypes and phenotypes. Plant Molecular Biology. 2002. [2] Ma H, Zenf A·P: Reconstruction of metabolic networks from genome data and analysis of their global strcture for various organisms. Bioinformatics 2003. [3] Patil KR, Nielsen J: Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc Natl Acad Sci 2005. [4] Watts DJ, Strogatz SH: Collective dtnamics of „small-world‟ networks. Nature 1998. [5] Wagner A, Fell D A. The small world inside large metabolic networks. Proceedings Biological Sciences 2001. [6] Ma H W and Zeng A P, Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms, Bioinformatics, 2003. [7] Salas A, Yao Y G, Macaulay V, et al. A critical reassessment of the role of mitochondria in tumorigenesis. PloS Med 2005. [8] Jiang D. Network-based Metabolic Flux and Structure Analysis. 2006. [9] Peter van Nes. Metabolic network modeling 2006. [10] Jeffrey D Orth, Ines Thiele & Bernhard Palsson. What is flux balance analysis. Primer Computational Biology. [11] Qian H, Beard D A. Thermodynamics of stoichiometric biochemical networks in living systems far from equilibrium. Biophysical Chemistry, 2005. [12] Roded Sharan. Analysis of Biological Networks: Constraint-Based Modeling of Metabolic Networks. 2006 [13] German J B, Hammock B D, Watkins SM. Metabolomics: building on a century of biochemistry to guide human health. Metaboloics, 2005 [14] Kenneth J Kauffman, Purusharth Prakash and Jeremy S Edwards. Adbances in flux balance analysis. 2003. [15] Tomer Benyamini, Ori Folger, Eytan Ruppin and Tomer Shlomi. Flux balance analysis accounting for metabolite dilution. [16] Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355-D360 (2010). [17] Covert M W, Schilling C H, Palsson B. Regulation of gene expression in flux balance models of metabolism. Journal of Theoretical Biology, 2001. [18] Lee J M, Gianchandani E P, Papin J A. Flux balance analysis in the era of metabolomics. Bioinformatics, 2006. [19] Beard D A, Liang S D, Qian H. Energy balance for analysis of complex metabolic networks. Biophysical Journal, 2002. [20] Segre D, Vitkup D, Church G M. Analysis of optimality in natural and perturbed metabolic networks. Pro Natl Acad Sci USA, 2002. [21] Shlomi T, Berkman O, Ruppin E. Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc Natl Acad Sci USA, 2005. 56 [22] Burgard A P, Pharkya P, Maranas C D. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnology and Bioengineering, 2003. [23] R. Mahadevan, J.S. Edwards, F.J. Doyle III. Dynamic flux balance analysis for metabolic modeling. [24] Ralf Steuer and Bjorn H. Junker. Computational Models of Metabolism: Stability and Regulation in Metabolic Networks. [25] S.H Zhou. Algorithm Research and Improve in Optimize design. [26] F.X Wang. Linear programming. 2002. [27] G.R. Walsh. An introduction to linear programming. [28] Stefan Schuster, Thoms Dandekar and David A. Fell. Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. [29] Cong T. Triinh, Aaron Wlaschin, Friedrich Srienc. Elementary Mode Analysis: A Useful Metabolic Pathway Analysis Tool for Characterizing Cellular Metabolism. [30] Klamt S, Stelling J. Combinatorial complexity of pathway analysis in metabolic networks. Molecular Biology Reports, 2002. [31] Julien Gagneur, Steffen Klamt. Computation of elementary modes: a unifying framework and the new binary approach. [32] Hong Yang, Design and analysis of amino acid supplementation in hepatocyte culture using IN VITRO experiment and mathematical modeling. [33] Steffen Klamt, Jorg Stelling. Two approaches for metabolic pathway analysis. [34] Stefan Schuster, Thomas Dandekar, David Fell. Description of the algorithm for computing elementary flux modes. [35] S.schuster, C. Hilgetag. J.H. Woods, D.A. Fell. Reaction routs in biochemical reaction systems: Algebraic properties, validated calculation procedure and example from nucleotide metabolism. Mathematical Biology, 2002. [36] M.G. Poolman, K.V. Venakatesh, M.K. Pidcock, D.A. Fell. A method for the determination of flux in elementary modes, and its application to Lactobacillus rhamnosus. [37] Neil Muller, Lourenco Magaia, B.M. Herbst. Singular Value Decomposition, Eigenfaces, and 3D reconstructions. [38] David Clark. A note on the pseudo-inverse. [39] Zhihua Zhang. Lecture Notes 4: Examples of the SVD. [40] Iman Famili, Bernhard O. Palsson. MBW: Singular value decomposition of stoichiometric matrices. [41] Cong T. Trinh Aaron Wlaschin Friedrich Srienc. Elementary mode analysis: a useful metabolic pathway analysis tool for characterizing cellular metabolism. [42] Quanyu Zhao, Hiroyuki Kurata. Maximum entropy decomposition of flux distribution at steady state to elementary modes. [43] Hiroyuki Kurata, Quanyu Zhao, Ryuichi Okuda, Kazuyuki Shimizu. Integration of enzyme activities into metabolic flux distributions by elementary mode analysis. [44] Bo Liu. Entropy and laser. [45] Quanyu Zhao, Hiroyuki Kurata. Use of maximum entropy principle with 57 Lagrange multipliers extends the feasibility of elementary mode analysis. [46] Dan Klein. Lagrange Multipliers without Permanent Scarring. [47] Com S 477/577 Lagrange Multipliers 2008, http://www.cs.iastate.edu/~cs577/handouts/lagrange-multiplier.pdf] [48] Xiaojun Chen. First order conditions for nonsmooth discretized constrained optimal control problems. [49] http://bio.freelogy.org/wiki/User:JeremyZucker [50] “efmtool” package, Computational systems biology, http://www.csb.ethz.ch/tools/efmtool [51] Quanyu Zhao, Hiroyuki Kurata. “mep1” package, http://www.cadlive.jp/MetabolicEngineering/MEP/ManualMEP.htm [52] Quanyu Zhao, Hiroyuki Kurata. Use of maximum entropy principle extends the feasibility of elementary mode analysis. http://www.cadlive.jp/MetabolicEngineering/MEPLM/InstructionMEPLM.htm 58 Appendix Appendix A. Calculate the elementary flux modes matrix for example metabolic network by “efmtool” package. “efmtool” package is a EFM matrix calculation toolbox for MATLAB which can be download from http://www.csb.ethz.ch/tools/efmtool. First download and install the package, then set the path to where the package installed for MATLAB. Second, input the stoichiometric matrix into excel or text file. Then import the data to MATLAB. Finally perform the following codes: >> stru.stoich=[data]; % read in the stoichiometric matirx >> stru.reversibilities=[0,0,0,0,0,0,0,0,0]; % define the reversibility of each reaction, 1 for reversible reaction, 0 for irreversible reactions. >> mnet=CalculateFluxModes(stru) The outcome: 2 1 1 2 1 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 2 1 1 0 0 This is the same with hand calculated result. Appendix B. Abbreviations of metabolites and enzymes for TCA cycle, glyoxylate shunt and adjacent reactions network. Abbreviations of metabolites: AcCoA -- acetyl-CoA; Ala -- alanine; Asp -- aspartate; Cit -- citrate; Fum -- fumarate; Glu -- glutamate; Gly -- glyoxylate; IsoCit -- isocitrate; Abbreviations of enzymes: AceEF -- pyruvate dehydrogenase; Acn -- aconitase; AspA -- aspartase; AspC -- aspartate aminotransferase; Eno -- enolase; Fum -- fumarase; Gdh -- glutamate dehydrogenase; GltA -- citrate synthase; Mal -- malate; Icd -- isocitrate dehydrogenase (in E. 59 coli with cofactors NADP/NADPH); OAA -- oxaloacetate; Icd -- isocitrate dehydrogenase (in E. coli with cofactors NADP/NADPH); OG -- 2-oxoglutarate; PEP -- phosphoenolpyruvate; Icl -- isocitrate lyase; Mas -- malate synthase; PG -- 2-phosphoglycerate; IlvE/AvtA -- branched-chain amino acid aminotransferase/valine-pyruvate aminotransferase; Pyr -- pyruvate; Mdh -- malate dehydrogenase; Succ -- succinate; Pck -- PEP carboxykinase (in E. coli with cofactors ADP/ATP); SucCoA -- succinyl-CoA. Ppc -- PEP carboxylase; Pps -- PEP synthetase; Pyk -- pyruvate kinase; Sdh -- succinate dehydrogenase; SucAB -- 2-oxoglutarate dehydrogenase; SucCD -- succinyl-CoA synthetase (in E. coli with cofactors ADP/ATP); AlaCon, AspCon, GluCon and SucCoACon are consumption of alanine, aspartate, glutamate and succinyl-CoA, respectively. Appendix C. MATLAB code for FBA to analyze the TCA cycle and adjacent reactions network. First import the stoichiometric matrix shown in Table 3.12 into MATLAB. Then perform the following codes: >> f=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1]; >> b=zeros(16,1); >> lb=[-Inf,0,0,0,0,0,0,-Inf,-Inf,0,0,-Inf,-Inf,-Inf,0,0,-Inf,0,-Inf,-Inf,0,0,0,0,0,0,0]; >>ub=[Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,1,0,Inf,Inf,Inf,I nf,Inf]; >> simlp(f,data,b,lb,ub,[],16) 60 Appendix D. The codes for calculate the elementary flux modes matrix for TCA cycle and glyoxylate shunt network and the elementary flux modes matrix. First, inport the stoichiometric matrix shown in Table 3.12 into MATLAB. Then perform the following codes: >> stru.stoich=[data]; % read in the stoichiometric matrix >> stru.reversibilities=[1,0,0,0,0,0,0,1,1,0,0,1,1,1,0,0,1,0,1,1,0,0,0,0,0,0,0]; % define the reversibility of each reaction, 1 for reversible reaction, 0 for irreversible reactions. >> mnet=CalculateFluxModes(stru) The elementary flux modes matrix: Appendix E. “mep1” package code for elementary flux modes analysis with Shannon‟s MEP method. This program was originally written by Quanyu Zhao and Hiroyuki Kurata and given in the supplementary file of their original papers First download the package from the website: http://www.cadlive.jp/MetabolicEngineering/MEP/ManualMEP.htm Install the package and set the path to the file for MATLAB Rewrite the code so it can perform with two uptake reactions. function example clear all global uptakeflux uptake disp('Please input the name of data file for the optimization') disp('Make sure the date format is right') alldata=input('The file name is; ','s') 61 datas=textread(alldata); nr=datas(1,1);% the number of reactions nd=datas(1,2);% the number of determined fluxes for optimization uptakeflux=datas(1,3);% the flux for uptake reaction %input the matrix A for i=2:1:(nd+1) emmatrix(i-1,:)=datas(i,:); end [a1,a2]=size(emmatrix); %input the mxtrix b for i=1:1:nd fluxv(1,i)=datas(nd+2,i); end %input the vector for uptake reaction in elmentary mode matrix uptake1=datas(nd+3,:); uptake2=datas(nd+4,:); uptake=uptake1+uptake2; %input elementary mode matrix for i=(nd+4):1:(nd+3+nr) emall(i-nd-3,:)=datas(i,:); end x0=uptakeflux/a2*ones(a2,1); lb=1e-9*ones(a2,1); hb=100*ones(1,a2); options=optimset('LargeScale','off'); [x,fval]=fmincon(@func,x0,[],[],emmatrix,fluxv,lb,hb,[],options) fluxdistribution=emall*x; disp('Flux distribution is: ') disp(fluxdistribution) save('outputEMC.dat','x','-ascii'); save('outputflux.dat','fluxdistribution','-ascii'); disp('Optimization is successful. You can close it now.') pause function f=func(y) global uptakeflux uptake f=0; [u1,u2]=size(uptake); for i=1:u2 if uptake(1,i)~=0.0 f=f-uptake(1,i)*y(i)/uptakeflux*log(uptake(1,i)/uptakeflux*y(i)); end disp(f) end An input text file is needed to perform which all information about the metabolic 62 network are included, such as number of reactions, number of determined reactions, elementary flux modes matrix. Full details of how to construct the input file is given in the supplementary file which can be found in the above website. Then input “mep1” in the MATLAB command window and input the text file name to run the program. Appendix F. Comparison table of FBA results and MEP results. FBA vs MEP 0.2500 Biomass flux 0.2000 0.1500 FBA 0.1000 MEP 0.0500 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.0000 x PEP→Pyr 0.6 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5 Flux 0.4 0.3 MEP 0.2 FBA 0.1 MEP FBA 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 0 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 flux PEP→OAA x x 63 0.025 0.02 0.015 MEP flux FBA 0.01 -0.005 x x Pyr→AcCoA OAA+AcCoA→Cit flux flux MEP Cit↔IsoCit 0.7 0.2 0.6 0 0.5 MEP flux 0.4 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 FBA 0.4 MEP 0.3 FBA 0.2 -0.6 0.1 -0.8 0 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux FBA x OAA↔Mal -0.4 MEP 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 FBA 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 x -1 FBA 0.005 0 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -0.2 MEP 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 -0.02 Pyr→PEP 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 flux OAA→PEP x x 64 AcCoA+Gly→Mal 0.5 0.4 0.4 0.3 0.3 MEP FBA 0.1 0 0 -0.1 x x Succ↔Fum Mal↔Fum 0.3 0.3 0.2 0.2 0.1 0.1 0 MEP FBA flux -0.1 -0.2 -0.2 -0.3 -0.3 -0.4 -0.4 -0.5 -0.5 x FBA OG→SucCoA SucCoA↔Succ 0.12 0.25 0.1 0.2 0.08 MEP 0.1 FBA flux 0.15 0.06 0.02 0 0 -0.02 x MEP FBA 0.04 0.05 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux MEP x 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 -0.1 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux 0 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 FBA 0.1 -0.1 MEP 0.2 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 0.2 flux 0.5 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux IsoCit→Gly+Succ x 65 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux 0.1 0 0.3 0.2 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 MEP 0.15 MEP FBA flux 0.25 0.25 0.2 0.2 FBA 0.05 x 0.7 OG↔Glu 0 0.6 0.5 -0.05 0.4 0 -0.25 x 66 1 0.88 0.76 0.64 0.52 0.4 0.28 0.16 0.04 0.15 flux 1 0.89 0.78 0.67 0.56 0.45 0.34 0.23 0.12 0.01 flux IsoCit→OG OAA+Glu↔OG+Asp 0.1 MEP 0.05 FBA 0 x OG+Ala↔Pyr+Glu -0.1 MEP -0.15 FBA 0.1 -0.2 x