Metabolic Network - Department of Mathematics

advertisement
Metabolic Network
Shiyi Wang
Supervised by Dr Jamie Wood
Dissertation submitted for the MSc in Mathematics with
Modern Applications, Department of Mathematics,
University of York, UK.
Submitted in 08/2011.
1
Abstract
Cellular metabolism is defined as the essential physical and chemical processes for
the maintenance of life. A metabolic network for a specific organism contains all
metabolic reactions occurring within the living cells of an organism. With the rapid
development of genomics and various successful genome projects, biologist
deciphered the genome sequence of many organisms and the metabolic networks for
such organisms can be faithfully reconstructed from available genome information.
Thus the analysis of metabolic network has become essential in further studies, such
analysis could help us to obtain a better understanding of the topology and biological
functions for different organisms, hence enable us to utilize cellular metabolic process
to assist the development of ferment technology, medicine industry and agriculture.
Successfully developments in such areas would not only benefit the economics, but
also allow us to understand or even inflect the biological evolution.
In the thesis, the goal is to study two methods of the metabolic network flux
analysis.
First the basic metabolic network framework was studied including basic properties
of metabolic network.
Then the study focused on the methods of analyzing the flux distribution with
mathematical models.
First, the flux balance analysis was studied. Flux balance analysis is a commonly
used method in estimating the fluxes flow through the metabolic network. The
network is modeled into a linear programming problem with some stoichiometric
constraints, hence solved by simplex algorithm.
Second, the elementary flux mode analysis is studied. Elementary flux mode
analysis is one of the pathway analysis methods which each mode determining a
pathway in a metabolic network hence predict the fluxes flow through each pathway
and how much each pathway contributes to the whole network. The algorithm for
evaluating elementary flux mode matrix is introduced, as well as an improved method,
the Shannon‟s maximum entropy principle for elementary mode analysis.
Finally, the above two methods are applied to predict the flux distribution of
tricarboxylic acid cycle, glyoxylate shunt and adjacent amino acid network with
glucose and acetate as uptakes.
2
Content
Abstract ............................................................................................................................................. 2
Content .............................................................................................................................................. 3
1. Introduction ............................................................................................................................... 4
1.1 Background of metabolic network research ........................................................................ 4
1.2 Basic properties of metabolic network ................................................................................ 4
1.3 Metabolic network analysis ................................................................................................. 5
2. Methods of Metabolic network flux analysis ............................................................................ 7
2.1 Glossary: ............................................................................................................................. 7
2.2 Flux Balance Analysis. ........................................................................................................ 8
2.21 Introduction ............................................................................................................... 8
2.22 Flux Balance Analysis procedure .............................................................................. 9
2.23 Limits of FBA method. ........................................................................................... 11
2.24 Extension of the FBA method. ................................................................................ 12
2.3 Linear programming ......................................................................................................... 13
2.31 Introduction of linear programming and simplex algorithm. .................................. 13
2.32 Basic concept of linear programming ..................................................................... 14
2.33 Simplex algorithm ................................................................................................... 15
2.34 Numerical examples: ............................................................................................... 17
2.4 Elementary Flux Modes Analysis. .................................................................................... 21
2.41 Introduction ............................................................................................................. 21
2.42 Mathematics behind EFMs ..................................................................................... 23
2.43 Calculate EFM matrix. ............................................................................................ 24
2.44 Calculation of flux in elementary modes. ............................................................... 28
2.45 Shannon‟s MEP for elementary mode analysis ....................................................... 32
2.46 Summary ................................................................................................................. 40
3. Numerical results .................................................................................................................... 42
3.1 Flux balance analysis ........................................................................................................ 43
3.2 Elementary flux modes analysis with Shannon‟s MEP. .................................................... 48
3.3 Comparison of two methods. ............................................................................................. 51
4. Conclusion .................................................................................................................................. 55
Reference ........................................................................................................................................ 56
Appendix ......................................................................................................................................... 59
3
1. Introduction
1.1 Background of metabolic network research
The human genome project was successfully developed for the past two decades,
the focus of biology research gradually transferred from study of individual gene or
protein inside of a cell toward the studies of whole genome. With more and better
knowledge of genome sequence, the studies of different kind of omics, such as
genomics; mRNA; proteomics and transcriptomics are more regarded and supported.
In such circumstances, the concept of metabolome was proposed. [1]
Metabolomics is a branch of system biology which has been applied to identify and
quantize all metabolites in an organism sample under specified living conditions.
Strictly speaking metabolome refers all metabolites in an organism or cell.
Metabolism is essential in the processes of live. It is mainly consisted by catabolism
and anabolism, anabolism refers the process that organisms transfer absorbed
nutrients from external environment into their own components and stores the energy;
catabolism on the other hand, refers the process that organisms decompose itself,
produce energy then excrete the end products from the decomposition. These
processes with necessary enzymes produce all of the major constituents of the cell.
Metabolic network is an abstract expression of cell metabolism that maps all
biochemistry reactions into a network for a cell or organism, each metabolite is a node
and the reactions are the links or pathways between metabolites which connect the
nodes to form a network. This network reflects the interactions between all
compounds as well as the enzymes which involved in the metabolic processes. For the
past decade, whole genome sequencing were deciphered for hundreds of species, and
the knowledge of genome annotation has also been significantly improved, thus
enable us to reconstruct a faithful metabolic network for those species based on the
available genome information. Metabolic network analysis is a successful way of
predicting the metabolic phenotype of an organism under its metabolic genotype and
particular conditions which could provide us a better understanding of cellular
metabolic processes and the evolution of life. Therefore predicting the functions of a
metabolic network become one of the most important tasks now days. [2-3]
1.2 Basic properties of metabolic network
1. Small-world
1998, Watts and Strogatz discovered the small-world property of network, if a
network satisfies two statistical properties such as large clustering coefficients and
small average distances then it is said to be a small-world network. [4] Wagner and
coworkers studied 287 reactions within E.coli that found out the average pathway
length of the network is 3.8. [5] MA studied and analyzed the metabolic network for
4
80 organisms and found out the overall average pathway is 8.2. [6] All these studies
showed that metabolic network has small-world property, as a whole, the short
pathway shows that the local perturbation of metabolites could be passed to the whole
network very fast.
2. Scale-free
A scale-free network is a network whose degree of node follows a power law
distribution. Salas and coworkers showed that metabolic network is a scale-free
network by studying different metabolic networks for animals, plants and microbes.
[7].
3. Robustness and redundancy
All biology have capability of self-dynamic equilibrium, many can grow like wild
type when certain genes been knockout. Robustness usually defined as the
insensitivity of parameter changes in a system. In the structure robustness analysis,
the problem is whether a cell could abide losing or deactivation of an enzyme.
Structure robustness and redundancy are usually combined. With none redundancy, a
cell would lose its general function when a pathway is cut or an enzyme is deactivated
due to non-possible alternative pathway available. However in many cells, there exist
parallel pathways hence redundant which would allow cells work as normal when
above scenarios occur.
1.3 Metabolic network analysis
For the past decade, with the growing interest of better understandings of the
biochemical networks, the experimental techniques such as isotopic-tracer were
improved significantly especially by the application of nuclear magnetic resonance
technology to biological systems, however it is still not powerful enough to determine
the whole network or too expensive to conduct, hence some alternative estimation
methods have been proposed such as metabolic network flux analysis. Metabolic
network analysis successfully predicts the metabolic phenotype which gives us a good
idea of what is happening in an organism and how the organisms work under different
external environment conditions.
Depending on the different demands, there are several network analysis methods
which could help us to obtain a better understanding of the metabolic process. For
example: statistical clustering analysis; stoichiometric network analysis etc.
5
Approach
Constraints incorporated
Quasi
Flux
Functio
Opti
stead
distrib
y
demand
capaciti
utions
es
Network
ed
operatio
e of
n
reactions
pathway
y
state
Importanc
nal
malit
s
Correlat
Optimal
tional
n
dynamic
metry
Computa
Reactio
Therma
Stoichio
Applications
Robustness /
function
reaction
s
Pathway
lengths
Flexibility
ality
s
Flux Balance
Yes
Yes
Yes
Yes
No
Low
Single
No
Yes
(Yes)
No
No
Yes
(Yes)
Yes
Yes
Yes
No
No
High
All
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
(Yes)
Yes
(Yes)
No
Low
Single
No
No
(Yes)
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Medium
Single
No
Yes
(Yes)
No
No
Yes
(Yes)
Yes
Yes
Yes
No
No
High
All
(Yes)
(Yes)
(Yes)
(Yes)
(Yes)
(Yes)
(Yes)
Topol
Possib
No
No
No
Low
None
No
No
(Yes)
(No)
(Yes)
(Yes)
(Yes)
ogy
le
Yes
No
No
No
No
Low
None
No
No
No
No
No
No
No
Yes
No
No
No
No
Low
None
No
No
No
(Yes)
No
(Yes)
No
Analysis
Elementary
Flux Modes
Metabolic
Flux Analusis
Minimization
of Metabolic
Adjustment
Extreme
Pathways
Graph theory
Conservation
relations
Null-space
Analysis (via
Kernal
Matrix)
Table 1.31. Comparison methods, adapted from 《Network-based Metabolic Flux and
Structure Analysis》[8].
In this thesis, the goal is to analyze the flux distribution of the metabolic network.
The above table shows some of the methods and its applications [8].
The purpose of studying the flux distribution of metabolic network is to understand
how the fluxes flow through the network hence enables us to regulate the fluxes.
These would let us be able to achieve following goals: [9]
1. Improvement of productivity. To maximize the demand product.
2. Elimination or reduction of by-product. In many processes, by-products take up
available nutrition and make the product less pure, in some cases even
poisonous, that‟s very important in pharmacology.
The thirst of obtain better understandings of these biological functions and be able
to utilize it attracted a lot of attention for the past decade, especially for industrial and
medical purposes.
Flux balance analysis and Elementary mode analysis are two of the most commonly
used methods in metabolic network flux analysis depending on different resources
and demands. Full details and their mathematical concept of these methods are given
in the next section.
6
2. Methods of Metabolic network flux analysis
2.1 Glossary:
1. Flux: the reaction rate of a certain reaction when the metabolic network is in
quasi steady-state. [9].
2. Reaction rate: the amount of chemical substrate that is consumed and the
amount of product that is formed by that reaction.
3. Steady-state: the state in which the concentrations of every metabolite does
not change. [9].
4. Stoichiometric matrix: matrix containing the stoichiometric constraints for
every reaction in terms of each chemical. [9].
For a given network, all the metabolites which included in every reaction
are put into the stoichiometric matrix, where the columns represent all
metabolites in the metabolic network and rows correspond to all reactions.
Example (1): we have a simple network below. [49].
Figure 2.11. A simple metabolic network.
The stoichiometric matrix can be read from the network:
R1 R2 R3 R4 R5 R6 R7 R8 R9
A
1
-1
-1
0
0
0
0
0
0
B
0
1
0
-1
0
-2
0
0
0
C
D
0
0
0
0
1
0
0
2
0
0
1
0
-1
1
0
-1
0
0
E
0
0
0
-1
1
0
0
0
0
F
0
0
0
0
0
1
0
0
-1
Table 1.1. The stoichiometric matrix of the simple metabolic network example.
5. A mode is a flux vector 𝒗 ∈ 𝑹𝒎 that contains the system in steady state. That
is a vector 𝒗 ≥ 0 such that Sv=0, where S is the stoichiometric matrix.[28]
7
6. A mode 𝒆 ≠ 0 is an elementary mode if its support is minimal, that is, if
there is no other mode 𝒘 ≠ 0 such that 𝑹 𝒘 ∈ 𝑹(𝒆), R(e) is the set of
reactions participating in mode v.[28]
7. The set of modes forms a cone which is the intersection of the nullspace Sv=0
with the positive orthant v≥0.[28]
8. Irreversible reaction: A reaction in which the rate of the forward reaction is
always so much higher than the rate of the reverse reaction that the latter is
relatively negligible.[28]
2.2 Flux Balance Analysis.
2.21 Introduction
Table 2.211. Development of flux balance analysis.
The above figure is adapted from Kenneth and coworkers paper which given in the
reference below [14]. More recently, Benyamin and coworkers has published a paper
about metabolite dilution flux balance analysis, which resulting in improved
metabolic phenotype predictions. [15].
Flux balance analysis (FBA) becomes one of the most popular methods in this
research field and been widely applied in practical studies with the growing interest of
the biochemical networks. FBA is a constraint-based mathematical method which
specializes in large-scale metabolic networks to predict the internal fluxes of
metabolic network, biochemical knowledge of the network such as concentration of
metabolites or enzyme kinetics of the system are not required hence make it easy to
implement. This methodology is utilized to estimate intracellular fluxes in a metabolic
network under the optimization of a specific objective function restricted by
stoichiometric constraints, thereby predict the growth rate of an organism or the rate
of a particular metabolite. [10]
8
It is obvious that the core of constraint-based method is the constraints, There are
two main constraints, steady state and physiochemical constraints,[11] such as:
stoichiometric constraints, thermodynamic restrictions, time to maximum rate
constraints. Stoichiometric constraints restrict the mass balance and the energy
balance; thermodynamic restrictions limit the direction of the reactions; time to
maximum rate constraint determines the reactive potency of a single enzyme.[12]
Flux balance analysis is one of the most used constraint-based method, it utilizes
the stoichiometric constraints to make the metabolic network reaches the steady state.
Such constraints limit the solution space of all feasible fluxes, mathematically it
defines a flux cone. Furthermore this solution space could be further restricted by
assigning minimum and maximum (usually use minus infinity for reversible reactions
zero for irreversible reactions for lower bound) possible fluxes through any particular
reaction. Once all constraints are obtained, it forms a convex solution space which
contains all the feasible solutions to the stead state equation [10].
All the feasible points in the solution space of a metabolic network can be reached
by the system. For a given metabolic network, if the number of reactions is equal to
the number of unknown flux flows, it becomes a very simple determining equation
problem which will give a unique flux distribution. However in a realistic metabolic
network, the metabolic process can be very complicated, usually hundreds of
metabolites and thousands reactions are involved in a cell or organism, although only
a few of those reaction rates can be determined by experiment, these experimentally
determined data are far less than the total number of reactions of the metabolic
network.[13] Therefore in order to estimate the metabolic flux distribution, linear
programming is applied to evaluate an optimal solution for the desired objective
function.
The choice of the objective function plays a very important role in FBA progress.
Typical objective functions include biomass of production and reproduction, maximal
energy ATP production, minimize nutrient uptake, minimal Manhattan distance or
Euclidean distance of the flux vector, among the production of various typical
products [14]. It has been proofed that the by using biomass production growth as
objective function, FBA gives a very promising estimation of internal flux distribution
comparing to the experimental results. [14]
2.22 Flux Balance Analysis procedure
1. Genome-scale metabolic reconstruction
In order to perform FBA to analyzing the flux for a metabolic network, the network
has to been defined first. The FBA requires all the reactions and metabolites that
included in the network, and then mathematical modeling is used to predict the
pathway flux. Although in real live organisms, different environmental conditions will
have different regulation on each pathway, in FBA model, it always assumes that all
reaction reacts under the quasi steady state, in other words, all reactions react at their
9
maximum rate.
The ideal starting point for the metabolic reconstruction would be identifying all
the metabolic enzymes and metabolites. Next record all the reactions which catalyzed
by each of the enzymes. [10]. Now days many genomic networks information can be
find in Kyoto Encyclopedia of Genes and Genomes (KEGG) database which is
developed by Kanehisa Laboratories Kyoto University. This database contributes
great support of reconstruction of some well-developed biological network. [16]
2. Mathematically represent metabolic reactions and constraints.
Once all the reactions are recorded. FBA convert those into a stoichiometric matrix
(S) of size m×n. Every row of this matrix represents one unique metabolite for a
network with m compounds and every column corresponds to one reaction. The
entries in each column are the stoichiometric coefficients of the metabolites
participating in a reaction. There is a negative coefficient for every metabolite
consumed and a positive coefficient for every metabolite that is produced. The
coefficient is equal to zero when the corresponding metabolite is not involved in that
reaction. Biomass reaction is incorporated into the reaction list which contains all
metabolites consumed during biomass production. The biomass reaction is based on
experimental measurements of biomass components. [10].
Reactions
Metabolites
1
2
…
n
Biomass
A
B
…
Stoichiometric matrix S
Table 2.221. Stoichiometric matrix.
3. Mass balance defines a system of linear equation.
The goal of FBA is to estimate the fluxes of the metabolic network under the
pseudo-steady state, at steady-state, this implies that the flux through reaction is given
by Sv=0 which defines a system of linear equations. [10]
Metabolites
1
A
B
…
2
…
Reactions
n Biomass
v1
v2
…
×
vn
vbiomass
Stoichiometric matrix S
Fluxs x
Table 2.222. Stoichiometric constraints of the fluxes.
10
=0
The linear equations of above indicate that the network is at quasi steady state.
4. Define objective function
We have obtained a linear system of the network which includes all components, in
order to predict the fluxes, objective function is used to define how much each
reaction contributes to the network. The objective function can be defined as:
z = c1 × v1 + c2 × v2 … = 𝐜 T 𝐯 , c is the weights vector that represents the
contribution of each reaction to the network. In the case that the biomass production is
the objective, the maximal or minimal of biomass production is desired, and the
vbiomass will be the only term in the objective function, hence the vector c will have
1 at the position of the biomass reaction and zero for other reactions. Finally the
complete FBA model which is developed for the maximization of biomass production,
constrained by mass balances is given:
Max Z = vbiomass
N
Such that
j=1
𝐒𝐢𝐣 𝐯𝐢 = 0,
𝐯jmin ≤ 𝐯i ≤ 𝐯jmax ,
i∈𝐌
j∈𝐍
Where vbiomass is the biomass flux, 𝑆𝑖𝑗 is the stoichiometric coefficient of
metabolite i in the reaction j, and 𝑣𝑗 is the flux of reaction j, M is the number of
metabolite, N is the total number of reactions in the network. [10]
5. Calculate fluxes that maximize objective function.
The model defined above could finally determine the optimal flux distribution
within the region of allowable fluxes. The optimization can be achieved by solving
the linear programming since the objective function and constraints equations are
linear functions, full details will be discussed in below.
2.23 Limits of FBA method.
Traditional FBA model optimizes a single objective over a feasible region, as stated
above, the efficiency of the outcomes obtained by the FBA is totally depend on the
constrains and the choice of the objective function, this method has been applied
widely in the practical field, however there are several cases where the traditional
FBA have difficulty to solve.[24]
1. As stated above, FBA model works under the assumption that all reaction reacts
under the quasi steady state and only stoichiometric constraints are used,
therefore the regulations for the reactions and pathways are neglected. However
11
the genes have very complex influence on each other, the outcome of FBA
maybe inaccurate or been badly conducted when this important fact been
neglected. [24]
2. Parallel metabolic routes cannot be resolved. When a reaction catalyzed by
multiple enzymes, FBA can only estimate the total flux of such reaction rather
than specific fluxes caused by each enzyme. [24]
3. Reversible reaction: similar with problem 2, only the sum of both directions can
be estimated rather than the fluxes for each way. [24]
2.24 Extension of the FBA method.
FBA optimize the biochemistry network behavior by achieve some specified
objectives. In the past few years, further development of FBA model with
incorporating more biological knowledge became increasingly interested.
1. Combine with different additional constraints.
Regulatory constraints were first imposed as Boolean logic operators by Covert
and coworker[17]. The regulatory constraints depend on the specific environment
rather than physiochemical constraints, regardless of time, space and other
fundamental restrictions, by using the Boolean logic approach, regulatory
constraints were evaluated based on the initial condition of the cellular system.
Then carry out the traditional FBA model. The outcomes were then used to
re-evaluate the regulatory constraints. This process can be repeated under the
allowable time [17].
In traditional FBA, the thermodynamic constraints are only accounted for
defining the reversibility for a given reaction. However, the reversibility is
depending on the intracellular conditions, it may vary with the changes of the
external environment. So apply the reaction thermodynamics into FBA model, by
using the nonlinear constraints of the balance of chemical potential instead of
simple stoichimetric constrains would provide a stronger biological background to
the traditional FBA. For the E. coli metabolism, Beard found that combination of
the energy balance analysis with FBA gave the same optimal growth rate, but the
flux distributions are different. This method has its disadvantage too, thus the
complicity of the calculation. [18,19].
2. Variability of objective functions.
In most applications, the objective function used in the FBA model is biomass
production. However this method dose not performs very well in predicting the
behavior of the gene knockouts based cell. In order to achieve a better
understanding of the flux distribution for the variant strain, the minimization of
metabolic adjustment (MOMA) has been developed based on the FBA, MOMA
calculates the flux distribution for the wide type metabolites, utilizing secondary
12
planning to find the minimal Euclid distance between the variant strain fluxes and
wide type metabolites fluxes. This method successfully simulated the viability of
E. coli and quantization the metabolic flux distribution after knockouts. This
method has been proofed efficient by comparing with experimental results, an
example is to increase the production of lycopene in the E. coli by choosing the
change point. [20]
Regulatory on/off minimization (ROOM) is another method in estimating the
flux distributions for variant strain, this method expect to minimize the number of
significant flux changes. ROOM efficiently predicted the flux distribution of E.
coli after a particular gene knockout.[21]
Both approaches are motivated by the assumptions that cells will still have
similar behavior with the wild type after gene knockout [18].
3. Dynamic optimization
For the past few years, the extended FBA method which involves dynamics
have been studied. This approach was used in the analysis of diauxic growth in E.
coli on glucose, and the predictions qualitatively match the experimental data.
This application showed that the instantaneous objective function gives a better
prediction than a terminal-type objective function for a large network. [18,23].
4. Predictive capability
A bi-level optimization problem based method OptKnock was used to predict
the flux after gene knockout. This approach simultaneously optimizes two
objective functions, biomass growth and secretion of a desired metabolite. [22]
This method was successfully applied in the experiments of lactic acid
production in E. coli [18].
2.3 Linear programming
2.31 Introduction of linear programming and simplex algorithm.
French mathematicians Joseph Fourier and C. Vallée Poussin first introduced the
idea of linear programming in 1832 and 1911 respectively.
In 1939 a Russian mathematician Leonid Kantorovich first published a book about
solving economic problems with mathematical method where the idea of linear
programming been used. Later on this method has widely applied in military during
World War Ⅱ. After the war, linear programming was published to the public and
been widely used in economic and industry management ever since.
13
In 1947, an American scientist George B. Dantzig published the simplex method.
Electronic computers were brought into use from 1945, these made possible of
solving linear programming problem with very complicated constraints.
In 1979, a Russian mathematician Leonid Khachiyan developed Ellipsoid
algorithm for linear programming, this method first showed that the linear
programming problems can be solved in polynomial time. However in almost all
numerical tests for large-scale linear programming problems, simplex algorithm
performed better than the Ellipsoid algorithm.
In 1984, Narendra Karmarkar introduced the interior point method for solving
linear programming problems, Karmarkar and couple of professors in the University
of California compiled an open computing program, it showed in many linear
programming problems the interior point method is a more efficient method than the
simplex algorithm. Although the interior point method gives great contribution to both
theoretical and practical field, the simplex algorithm is still the most commonly used
method due to its publicity and simplicity.
There were also some other methods been introduced since 1980s, such as genetic
algorithm, ACO Ant Colony Optimization and Tabu Search algorithm. [25]
2.32 Basic concept of linear programming
A linear programming model contains objective function, constraint variables and
constraints. Its basic form can be represented as follow [26]:
Max 𝐳 = c1 x1 + ⋯ + cn xn
1.1
a11 x1 + ⋯ + a1n xn ≤, =, ≥ b1 ,
a21 x1 + ⋯ + a2n xn ≤, =, ≥ b2 ,
Subject to
(1.2)
⋮
am1 x1 + ⋯ + amn xn ≤, =, ≥ bm ,
and x1 , x2 , … , xn ≥ 0. (1.3)
In matrix form:
Max 𝐳 = 𝐜 T 𝐱, (1.4)
Subject to the constraints
Subject to
a11
Where 𝐀 = ⋮
am1
⋯
⋱
⋯
𝐀𝐱 ≤, =, ≥ 𝐛
(1.5)
𝐱≥0
a1n
⋮ , 𝐛 = (b1 , b2 , … , bm )T is a m-component vector,
amn
𝐜 = (c1 , c2 , … , cn )T , is an n-component row vector, 𝐱 = [x1 , x2 , … , xn ], is an
n-component column vector.[26]
A is called constraint matrix, b is requirements vector, c is called price vector, x is
called decision variable vector.[26]
Let 𝐜 = 𝐜𝐁 , 𝐱 = (𝐱 𝐁 )T , then its partitioned matrix form is represented as: [25]
14
Max z = 𝐜𝐁 𝐱 𝐁 (1.7)
𝐁𝐱 𝐁 = 𝐛
(1.8)
𝐱𝐁 ≥ 𝟎
Where B is a nonsingular sub-matrix of A.
𝐱 = [x1 , x2 , … , xn ] is a feasible solution if the restriction conditions 𝐀𝐱 = 𝐛,
𝐱 ≥ 0 are satisfied and the region D = x 𝐀𝐱 = 𝐛, 𝐱 ≥ 0} is called the feasible
region. If x∈D and x cannot be represented by the convex combination of D then x is
an extreme point of the feasible region D. [25]
For equation (1.8), if |B|≠0, then B is a basis matrix for a linear programming
problem and the corresponding vectors 𝐁 = (P1 , P2 , … , Pm ) is called basis vector. [25]
The m variables x which are associated with the m linearly independent column
vector 𝑷𝒋 are called basic variables.[25]
If 𝐱 𝐁 ≥ 0 , then xB is called a basic feasible solution, if 𝐱 𝐁 maximize or
minimize the objective function z, then 𝐱 𝐁 is called the optimal solutions.[25]
If one or more of the basic variables takes the value zero, then the basic solution is
said to be degenerate.[25]
Figure 2.321 Linear programming plot.
2.33 Simplex algorithm
As defined above if we have a basic feasible solution of the
standard linear programming problem 𝒙𝑩 , we have: [27]
𝑴𝒂𝒙 𝒛 = 𝒄𝑩 𝒙𝑩
𝑩𝒙𝑩 = 𝒃
𝒙𝑩 ≥ 𝟎
15
Then each column of A can be expressed as a linear
combination of the corresponding column of B: [27]
𝒂𝒋 = 𝜷𝟏,𝒋 𝒃𝟏 + ⋯ + 𝜷𝒎,𝒋 𝒃𝒎
𝒎
=
𝒃𝒊 𝜷𝒊,𝒋 = 𝑩𝜷𝒋
𝒊=𝟏
𝜷𝒋 are known coefficients for the given basic feasible solution.
To obtain a new basic feasible solution is simply change one
column of B. The new quantities are denoted with a bar. Then 𝑩
is formed from B by removing 𝒃𝒓 and replacing it by 𝒂𝒌 from A.
we have:
𝒂𝒌 = 𝜷𝟏,𝒌 𝒃𝟏 + 𝜷𝒓,𝒌 𝒃𝒓 … + 𝜷𝒎,𝒌 𝒃𝒎
Thus we have:
𝒎
𝒙𝑩𝒊 𝒃𝒊 = 𝒃 (𝟏)
𝒊=𝟏
𝒎
𝒙𝑩𝒊 −
𝒊=𝟏
𝒊≠𝒓
𝜷𝒊,𝒌
𝒙𝑩𝒓
𝒙𝑩𝒓 𝒃𝒊 +
𝒂 = 𝒃. (𝟐)
𝜷𝒓,𝒌
𝜷𝒓,𝒌 𝒌
Such that:
𝒙𝑩𝒊 = 𝒙𝑩𝒊 −
𝜷𝒊,𝒌
𝒙 ≥ 𝟎, 𝒊 = 𝟏, … , 𝒎, 𝒊 ≠ 𝒓, (𝟑)
𝜷𝒓,𝒌 𝑩𝒓
And
𝒙𝑩𝒓
≥ 𝟎. (𝟒)
𝜷𝒓,𝒌
𝒙𝑩𝒓 =
If 𝒙𝑩𝒓 > 0, then 𝜷𝒓,𝒌 > 0, then (3) requires:
𝒆𝒊𝒕𝒉𝒆𝒓
𝒐𝒓 𝜷𝒊,𝒌 ≥ 𝟎
𝜷𝒊,𝒌 ≤ 𝟎
𝒙𝑩𝒊
𝒙𝑩𝒓
𝒂𝒏𝒅
≥
.
𝜷𝒊,𝒌
𝜷𝒓,𝒌
𝒊 = 𝟏, … , 𝒎, 𝒊 ≠ 𝒓.
Hence the new basic feasible solution can be chose by:
𝒙𝑩𝒓
𝜷𝒓,𝒌
= 𝒎𝒊𝒏
𝒙𝑩𝒊
: 𝜷𝒊,𝒌
𝜷𝒊,𝒌
> 0 . (5)
Since we have a new basic feasible solution, it is possible to find 𝒛.
𝒎
𝒛=
𝒄𝑩𝒊 𝒙𝑩𝒊 .
𝒊=𝟏
𝒎
=
𝒊=𝟏
𝒄𝑩𝒊 𝒙𝑩𝒊 −
𝒎
=
𝒄𝑩𝒊 𝒙𝑩𝒊 −
𝒊=𝟏
𝜷𝒊,𝒌
𝒙𝑩𝒓
𝒙𝑩𝒓 + 𝒄𝒌
𝜷𝒓,𝒌
𝜷𝒓,𝒌
𝒙𝑩𝒓
𝜷𝒓,𝒌
𝒎
𝒄𝑩𝒊 𝜷𝒊,𝒌 + 𝒄𝒌
𝒊=𝟏
𝒙𝑩𝒓
=𝒛−
𝒛 − 𝒄𝒌
𝜷𝒓,𝒌 𝒌
So
(𝟔)
𝒛 > 𝑧 if and only if:
𝒛𝒌 − 𝒄𝒌 < 0 𝑎𝑛𝑑
Finally calculate 𝜷𝒊,𝒋 , we have:
16
𝒙𝑩𝒓
> 0.
𝜷𝒓,𝒌
𝒙𝑩𝒓
𝜷𝒓,𝒌
𝒎
𝒂𝒋 =
𝜷𝒊,𝒋 𝒃𝒊 + 𝜷𝒓,𝒋 𝒃𝒓
𝒊=𝟏
𝒊≠𝒓
And
𝒎
𝒂𝒌 =
𝒊=𝟏
𝒊≠𝒓
𝜷𝒊,𝒌 𝒃𝒊 + 𝜷𝒓,𝒌 𝒃𝒓
𝒎
𝒂𝒋 =
𝜷𝒊,𝒋 −
𝒊=𝟏
𝒊≠𝒓
𝜷𝒊,𝒌 𝜷𝒓,𝒋
𝜷𝒓,𝒋
𝒃𝒊 +
𝒂
𝜷𝒓,𝒌
𝜷𝒓,𝒌 𝒌
And
𝒂𝒌 =
𝒎
𝒊=𝟏
𝒊≠𝒓
𝜷𝒊,𝒋 𝒃𝒊 + 𝜷𝒓,𝒋 𝒃𝒓.
Thus:
𝜷𝒊,𝒌 𝜷𝒓,𝒋
,
𝜷𝒓,𝒌
𝜷𝒓,𝒋
𝜷𝒓,𝒋 =
𝜷𝒓,𝒌
𝜷𝒊,𝒋 = 𝜷𝒊,𝒋 −
𝒊≠𝒓
(𝟕)
Iterative steps:
1. Test for optimal solution. If 𝒛𝒋 − 𝒄𝒋 ≥ 𝟎 for all j, then the
solution is optimal.
2. If there exists at least one j for which 𝒛𝒋 − 𝒄𝒋 < 0 and 𝜷𝒊,𝒋 > 0
for at least one i for each of these j, then variable 𝒙𝒌 becomes
basic, where k is chosen by the rule:
𝒛𝒌 − 𝒄𝒌 = 𝒎𝒊𝒏 𝒛𝒋 − 𝒄𝒋 : 𝒛𝒋 − 𝒄𝒋 < 0, 𝜷𝒊,𝒋 > 0, 𝑖 = 1, … , 𝑚 .
3. If (2) holds, the variable 𝒙𝑩𝒓 becomes non-basic, where r is
chosen by the equation (5).
4. Compute 𝒙𝑩𝒊 , 𝒛, 𝜷𝒊,𝒋 , 𝒂𝒏𝒅 𝒛 − 𝒄𝒋 , for all i and j, from equation
(3)-(7). Repeat step (2)-(4) until (1) holds.
The above method description are quoted from 《An introduction to linear
programming》written by G.R. Walsh [27], full details including theorems, lemmas,
and explanations are given in the book [27]. A numerical example is made below,
which the coefficients are set to be simple in order to make the calculation easy.
2.34 Numerical examples:
A linear programming problem solved by simplex tableaus.[27]
Suppose we want to
Maximize: Z = 2x1 + 3x2 ,
Subject to: x1 + 2x2 ≤ 4,
2x1 + x2 ≤ 7,
x1 , x2 ≥ 0.
Adding slack variables x3 , x4 , the constraint equations become:
17
x1 + 2x2 + x3 = 4,
2x1 + x2 + x4 = 7.
By setting x1 , x2 equal to zero, we get a feasible solution:
x1 = 0, x2 = 0, x3 = 4, x4 = 7.
The tableau:
c'
cB
basic
c1
c2
c3
c4
variables
cB1
xB1
β11
β12
β13
β14
cB2
xB2
β21
β22
β23
β24
z
z1 − c1
z2 − c2
z3 − c3
z4 − c4
c'
2
3
0
0
cB
basic
variables
x1
x2
x3
x4
cB1 = c3 = 0
x3
xB1 = 4
1
2
1
0
cB2 = c4 = 0
x4
xB2 = 7
2
1
0
1
zj − cj
0
-2
-3
0
0
Equation used for computation:
xBi = xBi − βik βrk xBr ≥ 0, i = 1, … , m, i ≠ r,
xBr = xBr βrk ≥ 0
1.1 .
βij = βij − βik βrj βrk , i ≠ r,
βrj = βrj βrk .
zj − cj = zj − cj −
1.2 .
βrj
z − ck .
βrk k
1.3 .
By simplex rules, because
zj − cj < 0 𝑎𝑛𝑑 βij > 0
For some i for each of these j, and the basic variable is chosen by the rule:
zk − ck = min zj − cj : zj − cj < 0, βij > 0 = 1, … , 𝑚 .
The original basic variable xBr becomes non-basic, where r is chosen by the rule:
xBr βrk = min xBi βik : βik > 0 .
So by these rules, x2 become basic and min
xB 1 xB 2
,
β 12 β 22
= min
4 7
,
2 1
= 2.
Hence xB1 = x3 become non-basic. The element in the tableau at the intersection
of the x2 , column and the x3 row, indication the pair of variables to be exchanged, is
called the pivot, the pivot βrk = β12 = 2, r=1,k=2,is denoted by an asterisk.
Then by equation (1.1), (1.2), (1.3) compute the values.
18
βrj = βrj βrk .
1
2
=1
1
=
2
=0
β11 =
β12
β13
β14
βij = βij − βik βrj βrk , i ≠ r,
β21 = β21 − β22 β11 β12 = 2 −
1 3
= ,
2 2
1
β22 = 0, β23 − , β24 = 1
2
xBr = xBr βrk ≥ 0
xB1 4
xB1 =
= = 2,
β12 2
xBi = xBi − βik βrk xBr
xB2 = xB2 − β22 β12 xB1 = 7 − 4 ∗
1
= 5.
2
βrj
z − ck .
βrk k
β11
1
1
z1 − c1 = z1 − c1 −
z2 − c2 = −2 − ∗ −3 = − ,
β12
2
2
2
z2 − c2 = 0, z3 − c3 = , z4 − c4 = 0.
3
Then the new tableau becomes:
zj − cj = zj − cj −
c'
2
3
0
0
cB
basic
variables
x1
x2
x3
x4
cB1 = c2 = 3
x2
xB1 = 2
1/2
1
1/2
0
x4
xB2 = 5
3/2
0
1/2
1
zj − cj
6
-1/2
0
2/3
0
cB2 = c4 = 0
For the second tableau, x1 will become basic at the next iteration.
The variable to become non-basic is determined from:
min 4,10/3 = 10/3.
This shows that xB2 = 5 become non-basic. Hence the pivot is βrk = β21 = 3/2,
and by the same steps above construct the new tableau.
βrj = βrj βrk .
1
β21 = 1, β22 = 0, β23 = , β24 = 2/3
3
βij = βij − βik βrj βrk , i ≠ r,
19
β11 = β11 − β11 β21 β21 = 0,
1
β12 = 1, β13 = , β14 = −1/3.
3
xBr = xBr βrk ≥ 0
xB2 5 10
xB2 =
= = ,
β21 3
3
2
xBi = xBi − βik βrk xBr
1 1
xB1 = xB1 − β11 β21 xB2 = 2 − 5 ∗ = .
3 3
βrj
zj − cj = zj − cj −
z − ck .
βrk k
β21
z1 − c1 = z1 − c1 −
z − c1 = 0,
β21 1
5
1
z2 − c2 = 0, z3 − c3 = , z4 − c4 = .
6
3
Then the new tableau becomes:
c'
2
3
0
0
variables
x1
x2
x3
x4
1
3
0
1
1/3
-1/3
10
3
1
0
1/3
2/3
0
0
5/6
1/3
cB
basic
cB1 = c2 = 3
x2
xB1 =
cB2 = c1 = 2
x1
xB2 =
zj − cj
23/3
The above tableau is optimal, since zj − cj ≥ 0 for all j. The optimal solution is:
x1 =
10
1
, x2 = ,
3
3
z = 23/3.
Usually the linear programming problem in real life is much more complicated
than the example above, it is very difficult to calculate by hand. So computer software
such as MATLAB is designed to solve it.
Solve a linear programming problem to maximize 𝒗biomass subject to the
constraints that defined by example (1), suppose the uptake fluxes are evenly
distributed between A and E:
Maximize: 𝐅 = 𝐯𝐛𝐢𝐨𝐦𝐚𝐬𝐬 ,
Subject to: v1 − v2 − v3 = 0,
v2 − v4 − 2v6 = 0,
v3 + v6 − v7 = 0,
2v4 + v7 − vbiomass = 0
−v4 + v5 = 0
20
v6 − v9 = 0
0 ≤ xi,i≠1,5. ≤ ∞,
x1,5. = 0.5.
The MATLAB codes:
>> f=[0,0,0,0,0,0,0,-1,0];
>> b=[0;0;0;0;0;0;0;0;0];
>> b=[0;0;0;0;0;0];
>> ub=[0.5,Inf,Inf,Inf,0.5,Inf,Inf,Inf,Inf];
>> lb=[0,0,0,0,0,0,0,0,0];
>> simlp(f,data,b,lb,ub,[],6)
The solution is:
v1 = 0.5, v2 = 0.5, v3 = 0, v4 = 0.5, v5 = 0.5, v6 = 0, v7 = 0, vbiomass = 1, v9 = 0
F = vbiomass = 1.
2.4 Elementary Flux Modes Analysis.
2.41 Introduction
Metabolic pathway analysis has been recognized as a central approach to discover
and analyze the structure of a metabolic network. [28] This approach identifies the
topology of cellular metabolism based on the stoichiometric structure and
thermodynamic constraints of reactions where kinetic parameters are not required for
the model. It has been successfully applied to various organisms to investigate
metabolic network structure, robustness, fragility, regulation, metabolic flux
distribution. The metabolic pathway analysis is developed based on the first principle
of mass conservation of internal metabolites within a system. [28]
Elementary flux modes analysis is a pathway analysis for evaluating metabolism
based on the set of metabolic reactions of a given network. An elementary flux mode
is a minimal subset of enzymes in a network that can operate at steady state with all
irreversible reactions proceeding in the direction as regulated by the thermodynamics.
The flux distributions in a cell or organism are defined as nonnegative linear
combination of elementary modes. This approach identifies the pathways from an
educt to a production. However, in a large-scale network, calculation based on
elementary modes sometime facing the combinatorial explosion problem, therefore
splitting the whole network into several sub-networks is necessary sometimes [28].
EFM includes all pathways, which indicate the route of a certain educts to
production. The number of pathways can demonstrate the sensibility of the network,
hence been used to investigate network structure robustness, fragility regulation,
metabolic flux vector and rational strain design. [29]
Klamt and coworkers calculated the elementary modes for E. coli model by
FluxAnalyzer, this model contains 89 metabolites, 110 reactions, obtained 4.85 ×
1013 elementary modes. [30]
21
Elementary flux modes analysis has become an important theoretical tool for
system biology, biotechnology and metabolic engineering now days. Some major
applications of this method are listed below: [31]
1. Identification of pathways: the set of EFM consists all
possible pathways.
2. Network flexibility: the number of EFMs is at rough measure
of the network’s flexibility to perform a certain function.
3. Identification of all pathways with optimal yield: consider the
linear optimization problem, where all flux vectors with
optimal product tield are to be identified, i.e. where the moles
of products generated per mole of educts is maximal. Then,
one or several of the EFMs reach this optimum and any
optimal flux vector is a convex combination of these optimal
EFMs.
4. Redundancy: Wilhelm and coworkers developed a new
measure method which studies the number of EFMs after
knockout some enzymes. They compared the metabolic
network for E. coli and human erythropoietin. It theoretically
analyzed and compared the environmental viability of E. coli
and human erythropoietin.
5. Importance of reactions: predicate the growth ability, if a
reaction is involved in all growth-related EFMs, its deletion
will make the related EFMs disappear.
6. Reaction correlations: EFMs can be used to analyze
structural couplings between reactions, which could give hints
for underlying regulatory circuits, hence obtain the enzyme or
reaction subset.
7. Detection of thermodynamically infeasible cycles: EFMs
representing internal cycles are infeasible by laws of
thermodynamics and thus reflect structural inconsistencies.
8. Pathway analysis can also be combined with regulatory rules
and stoichiometric constraints to study the metabolic network.
9. Minimal cut sets: EFMs can be used to calculate the minimal
cut sets, thus the minimal reaction subset in a metabolic
network, losing such reaction subset will cause certain
function invalidated for the metabolic network. By this
property, many other applications will be available, such as
phenotype prediction, analyzing the structure flexibility,
metabolic network structure analysis and identify the drug
target etc.
Above points of view are quoted from the paper 《Computation of elementary
modes: a unifying framework and the new binary approach》written by Julien
22
Gagneur and Steffen Klamt [31].
2.42 Mathematics behind EFMs
Suppose for a given metabolic network, which consists m metabolites and n
reactions. The stoichiometry matrix S is a m×n matrix. In such network, each EFM is
defined by a vector e, composed of n elements, each describing the net rate of the
corresponding reaction (i.e. vector e is a flux distribution). Such network can be
mathematically represented as the following equation:[31,32,33]
𝐒𝐞 = 0.
Where the equation must satisfy three conditions:
1. Pseudo steady-state, which restricts Se equal to 0. This
ensures that none of the metabolites is consumed or produced
in the overall stoichiometry. [31]
2. Feasibility, there are two sets of reactions, the set of
irreversible reactions and the set of reversible reactions, only
the irreversible reaction are thermodynamically feasible in
only one direction. Thus the rate 𝒆𝒊 ≥ 𝟎 if reaction i is
irreversible. This demands that only thermodynamically
realizable fluxes are contained in e. [31]
3. Non-decomposability, let P(m) be the set of reactions that do
not occur in elementary moed m. then for all other modes
n≠m it follows that P(n) is not a a proper subset of P(m). [31]
Furthermore, if above three conditions are satisfied, then all feasible steady-state
flux distributions v can be described by a non-negative combination of all EFMs.
Thus the whole model is:
𝑷 = 𝒗 ∈ 𝑹𝒒 : 𝑵𝒗 = 𝟎 𝒂𝒏𝒅 𝒗𝒊 ≥ 𝟎, 𝒊 ∈ 𝑰𝒓𝒓𝒆𝒗
P is a set of vectors that obey a finite set of homogeneous linear
equations and inequalities, by definition, a convex polyhedral
cone.
𝒆 ∈ 𝑷,
𝑺𝒆 = 𝟎: 𝒒𝒖𝒂𝒔𝒊 𝒔𝒕𝒆𝒂𝒅𝒚 𝒔𝒕𝒂𝒕𝒆
𝒆𝒊 ≥ 𝟎, 𝒊 ∈ 𝑰𝒓𝒓𝒆𝒗: 𝒕𝒉𝒆𝒓𝒎𝒐𝒅𝒚𝒏𝒂𝒎𝒊𝒄𝒂𝒍 𝒇𝒆𝒂𝒔𝒊𝒃𝒊𝒍𝒊𝒕𝒚
𝒇𝒐𝒓 𝒂𝒍𝒍 𝒆′ ∈ 𝑷: 𝑷 𝒆′ ∈ 𝑷 𝒆 → 𝒆′ = 𝟎𝒐𝒓 𝒆′ = 𝒆 𝒐𝒓 𝒆′ = −𝒆: 𝒆𝒍𝒆𝒎𝒆𝒏𝒕𝒂𝒓𝒊𝒕𝒚
In other words, e is an EM if and only if it works at quasi steady
state, is thermodynamically feasible and there is no other non-null
flux vector (up to a scaling) that both satisfies these constraints
and involves a proper subset of its participating reactions. Note
that with this convention, reversible modes are here considered as
23
two vectors of opposite directions.
These descriptions are quoted from the paper 《Computation of elementary modes:
a unifying framework and the new binary approach》written by Julien Gagneur and
Steffen Klamt. [31]
2.43 Calculate EFM matrix.
Some basic properties of EFMs. [35]
Lemma 1. All vectors V fulfilling conditions 1 and 2 above either
represent elementary modes or are positive linear combinations of
vectors representing elementary modes,
𝑽=
𝜼𝒍 𝒎𝒍 , 𝜼𝒍 > 0,
𝒍
Where η is the rank of stoichiometry matrix, the sum runs over
at least two different indices l, and all 𝒎𝒍 have zero components
wherever V has zero components and include at least one
additional zero each,
𝑺 𝑽 ⊂ 𝑺 𝒎𝒍 .
All those 𝒎𝒍 that enter the above equation represent reversible
elementary modes if and only if V represents a reversible flux
mode. [35]
Lemma 2. For any pair of vectors, 𝑽∗ 𝒂𝒏𝒅 𝑽∗∗ , with 𝑽∗
representing an elementary flux mode and 𝑽∗∗ representing a
flux mode and having zero components wherever 𝑽∗ has zero
components:
𝑺 𝑽∗ ⊆ 𝑺 𝑽∗∗ ,
𝑽∗∗ either represents the same elementary mode as 𝑽∗ or the
same elementary mode as -𝑽∗, which implies 𝑺 𝑽∗ = 𝑺 𝑽∗∗ . [35]
Lemma 3. For two reaction systems A and B differing only in that
some reactions are reversible in B while being irreversible in A, all
elementary modes of A are also elementary modes in B, which
may involve additional elementary modes. [35]
Lemma 4. Any vector representing an elementary mode involves at
least γ -1 zero components, with γ = r – η denoting the dimension
of the null-space of N. where N denote the stoichiometry matrix,
and r is the number of reactions. [35]
24
Above lemmas are quoted from the paper 《Reaction routs in biochemical reaction
systems: Algebraic properties, validated calculation procedure and example from
nucleotide metabolism》written by S.schuster, C. Hilgetag. J.H. Woods and D.A. Fell
[35].
There are possibly hundreds of EFMs in a system even for a small metabolic
network, the computation for them are usually achieved with the help of computer
software, such as program EMPAHT by John Woods, Oxford, C program
METATOOL, Pfeiffer et al., 1999 and MAPLE program METAFLUX, Klaus Mauch,
Stuttgart. In this section, the mathematical concept behind it will be described with
example (1). [34]
The algorithm basically seeks special solutions to a system of linear homogeneous
equations and inequalities.
1. Start with the tableau:
𝐓
0
=
𝐓
𝐍𝐫𝐞𝐯
𝐈 𝟎
.
𝐓
𝐍𝐢𝐫𝐫 𝟎 𝐈
Where N represents the stoichiometric matrix of the network.
For example (1), the stoichiometric matrix was already constructed, thus we have
the first tableau:
A
B
C
D
E
F
R1
R2
R3
R4
R5
R6
R7
R8
R9
R1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
R2
-1
1
0
0
0
0
0
1
0
0
0
0
0
0
0
R3
-1
0
1
0
0
0
0
0
1
0
0
0
0
0
0
R4
0
-1
0
2
-1
0
0
0
0
1
0
0
0
0
0
R5
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
R6
0
-2
1
0
0
1
0
0
0
0
0
1
0
0
0
R7
0
0
-1
1
0
0
0
0
0
0
0
0
1
0
0
R8
0
0
0
-1
0
0
0
0
0
0
0
0
0
1
0
R9
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
1
𝐓0
This example is relatively easy because there is no reversible reaction in the
network.
Form the tableau 𝑻(𝒋) 𝒋 = 𝟎, 𝟏, 𝟐, … , 𝒏 − 𝟏 , the elements of which are
(𝒋)
𝑻
denoted by 𝒕𝒊,𝒉 , calculate the next tableau, 𝑻(𝒋) = 𝑵(𝒋+𝟏) 𝑴𝒋
flux mode) in the following way:[34]
25
(M denote
1.
(𝒋)
For each row of 𝑴𝒋 , determine the set 𝑺(𝒎𝒊 ) recording the position of the
zeroes.
𝑻
2. First each row of 𝑻(𝒋) with a zero in the (j+1)th column of 𝑵(𝒋) are copied
into 𝑻(𝒋+𝟏) . Then, new rows formed by allowed linear combinations of pairs
𝑻
of rows of 𝑵(𝒋) consecutively fo into 𝑻(𝒋+𝟏) if they fulfill the conditions:
𝒋
𝒋
𝒕𝒊,𝒋+𝟏 ∙ 𝒕𝒎,𝒋+𝟏 ≠ 𝟎,
This condition will constraint the reversible reaction goes in the right
direction.
𝑺 𝒎𝒊
𝒋
𝒋
(𝒋+𝟏)
∩ 𝑺 𝒎𝒎
⊈ 𝑺(𝒎𝒍
)
Above descriptions are quoted from the paper 《Description of the algorithm for
computing elementary flux modes.》written by Stefan Schuster, Thomas Dandekar and
David Fell.
Where 𝒎𝒊
𝒋
represents the i-th row of the right part of 𝐓 (𝐣) ,
𝐒 𝐦𝐢
𝐣
is the set
of zeroes in that row. This indicates that such enzymes are not used in the set of
reactions.
In 𝐓 𝟎 , the entries of first column in 4,5,6,7,8,9th rows are 0, so they can be copied
into the next tableau without any combinations. Combining first row with second and
third row respectively will make their first column equal to 0, and put into the second
tableau. NB: If there exists reversible reactions, the result of combining two reversible
reactions should be put into the reversible part of next tableau, however the
combination of reversible and irreversible reactions should be put into the irreversible
part of next tableau. [34]
By combining R1 with R2 and R3 respectively, we get tableau 1:
A
B
C
D
E
F
R1
R2
R3
R4
R5
R6
R7
R8
R9
R1+R2
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
R1+R3
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
R4
0
-1
0
2
-1
0
0
0
0
1
0
0
0
0
0
R5
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
R6
0
-2
1
0
0
1
0
0
0
0
0
1
0
0
0
R7
0
0
-1
1
0
0
0
0
0
0
0
0
1
0
0
R8
0
0
0
-1
0
0
0
0
0
0
0
0
0
1
0
R9
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
1
T
1
26
Combine first row with third row and fifth row respectively.
A
B
C
D
E
F
R1
R2
R3
R4
R5
R6
R7
R8
R9
R1+R2+R4
0
0
0
2
-1
0
1
1
0
1
0
0
0
0
0
R1+R3
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
R5
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
2R1+2R2+R6
0
0
1
0
0
1
2
2
0
0
0
1
0
0
0
R7
0
0
-1
1
0
0
0
0
0
0
0
0
1
0
0
R8
0
0
0
-1
0
0
0
0
0
0
0
0
0
1
0
R9
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
1
T2
Applying the same algorithm, zero one column at a step.
A
B
C
D
E
F
R1
R2
R3
R4
R5
R6
R7
R8
R9
R1+R2+R4
0
0
0
2
-1
0
1
1
0
1
0
0
0
0
0
R1+R3+R7
0
0
0
1
0
0
1
0
1
0
0
0
1
0
0
R5
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
2R1+2R2+R6+R7
0
0
0
1
0
1
2
2
0
0
0
1
1
0
0
R8
0
0
0
-1
0
0
0
0
0
0
0
0
0
1
0
R9
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
1
R1
R2
R3
R4
R5
R6
R7
R8
R9
T
A
B
C
D
E
3
F
R1+R2+R4+2R8
0
0
0
0
-1
0
1
1
0
1
0
0
0
2
0
R1+R3+R7+R8
0
0
0
0
0
0
1
0
1
0
0
0
1
1
0
R5
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
2R1+2R2+R6+R7+R8
0
0
0
0
0
1
2
2
0
0
0
1
1
1
0
R9
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
1
T
4
A
B
C
D
E
F
R1
R2
R3
R4
R5
R6
R7
R8
R9
R1+R2+R4+2R8+R5
0
0
0
0
0
0
1
1
0
1
1
0
0
2
0
R1+R3+R7+R8
0
0
0
0
0
0
1
0
1
0
0
0
1
1
0
2R1+2R2+R6+R7+R8
0
0
0
0
0
1
2
2
0
0
0
1
1
1
0
R9
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
1
T5
Thus the final elementary modes is
R1 R2 R3 R4 R5 R6 R7 R8 R9
R1+R2+R4+2R8+R5
R1+R3+R7+R8
1
1
1
0
2R1+2R2+R6+R7+R8
2
2
𝐓
0
1
1
0
1
0
0
0
0
1
2
1
0
0
0
0
0
1
1
1
1
6
As stated above, there are numbers of computer program to evaluate elementary
flux modes for real life network, “efmtool” package is used in this thesis, it is a Java
27
based program and has been integrated into MATLAB.[50] Only two inputs required,
the stoichiometric matrix and the vector determines the reversibility of reactions, 1 for
reversible reactions and 0 for irreversible reactions, full details are given in the
Appendix A. The above example is performed by the program, which gives the same
result.
2.44 Calculation of flux in elementary modes.
Now we know how to evaluate the elementary flux modes matrix, the next step
would be estimating the flux distribution.
Let vector w of length m denote the elementary mode coefficients which determine
how much of each elementary modes contribute to the whole network. At steady state,
the flux carried by any particular reaction is the sum of fluxes of that reaction in each
elementary flux modes, multiplied by the coefficient, so denote the fluxes as a row
vector v, then the mathematical representation of these fluxes is expressed: [36]
vi =
j=m
j=1
Eij wj
(1).
Or in the matrix form:
𝐯 = 𝐄𝐰.
(2)
It is possible to gain some knowledge of flux vector v through experiments, or by
determine certain reaction flux in the way that we are interested, therefore we could
use these data to estimate the elementary mode coefficients w.
Thus:
𝐰 = f 𝐄, 𝐯𝟎 , (2)
Where v0 is a vector of observed or determined fluxes. f are some functions that
evaluate 𝐰.Then these results can be substituted back to the above equation, hence
estimate the unobserved fluxes.[36]
𝐯=𝐄𝐰
For the non-invertible problem of E, it can be solved by the method of
pseudo-inverse, single valued decomposition is the method which will be discussed
later. Then denote the generalized inverse matrix 𝐄﹟ , then the above equation can be
rearranged.[36]
𝐰 = 𝐄﹟ 𝐯 T ,
4
As stated above, only some of the fluxes can be observed, so divided the
vector v into two components:
𝐯 = 𝐯𝟎 , 𝐯𝐱 ,
28
flux
Where 𝐯𝟎 are the observed fluxes, and 𝐯𝐱 are the non-observed fluxes.
The same apply to E.
𝐄
𝐄= 𝟎 .
𝐄𝐱
So by the observed part, we can calculate the w.
#
𝐰 = 𝐄𝟎 𝐯𝟎𝐓
(5).
Hence estimate the unobserved fluxes:
𝐯𝟎 , 𝐯𝐱 = 𝐰
𝐄𝟎
.
𝐄𝐱
The above equation has the following properties: [36]
1. In the case that the problem is an over determined system, 𝒘
represents the least-squares solution of the problem.
2. In the case that the system is exactly determined, where
#
𝑬𝟎 = 𝑬−𝟏
𝟎 and this then simply represents the solution of a set of
linear simultaneous equations.
3. In the case that the system is underdetermined, the most likely
case in this context, equation (4) generates the minimum norm
solution, i.e., it minimizes
𝒘𝟐𝒊
The above properties are quoted from the paper 《A method for the determination
of flux in elementary modes, and its application to Lactobacillus rhamnosus.》written
by M.G. Poolman, K.V. Venakatesh, M.K. Pidcock and D.A. Fell [36].
As mentioned above, pseudo-inverse could be used to calculate the inverse of
elementary flux modes matrix. The basic mathematics of pseudo-inverse is single
value decomposition.
Single value decomposition is a way to factorize a matrix into a product of three
simpler matrices, hence study the underlying structure of the matrix. This method has
been used in many fields, such as solving homogeneous linear equations, latent
semantic analysis, clustering and features, etc. [37]
Suppose we have a m×n matrix S, we have the following theorem.[37-39]
Let S be any real m ×n matrix with rank r. the we can write
𝑺 = 𝑼𝜮𝑽𝑻 , where U and V are orthogonal, and Σ is m by n
diagonal matrix, with r nonzero entries given by the positive
square roots of eigenvalues of 𝑺𝑻 𝑺 and 𝑺𝑺𝑻 , known as singular
29
values of S.
The columns of U and V are called the left singular vectors and
right singular vectors of S respectively.
Matrix Σ takes the form:
𝜹𝟏 𝟎
𝟎 𝜹𝟐
𝜮= 𝟎 𝟎
… …
𝟎 𝟎
And 𝜹 take the order:
𝟎 … 𝟎
𝟎 … 𝟎
𝜹𝟑 … 𝟎
… … …
𝟎 … 𝜹𝒓
𝜹𝟏 ≥ 𝜹𝟐 ≥ 𝜹𝟑 ≥ ⋯ ≥ 𝜹𝒓 ≥ 𝟎,
𝟎
𝟎
𝟎
…
𝟎
𝒓 = 𝒎𝒊𝒏 𝒎, 𝒏
An example of SVD.
Suppose we have a 3 ×4 matrix S.
2 −1
0
−1 2
−1
0 −1 2
0
0 −1
S=
SS T =
1
5 −4
−4
−4 6
1 −4
6
0 1
−4
0
0
−1
2
0
1 = ST S
−4
5
Eigenvalues and eigenvectors of S T S are:
λ1 = 13.09, λ2 = 6.854, λ3 = 1.91, λ4 = 0.146.
e1 = 0.372, −0.602, 0.602, 0.372 ,
e2 = −0.602, 0.372, 0.372, 0.602 ,
e3 = 0.602, 0.372, −0.372 0.602 ,
e4 = −0.372, −0.602, −0.602, 0.372 .
Thus
3.618
0
Σ=
0
0
U=
0.372
−0.602
0.602
−0.372
0
2.618
0
0
0
0
1.328
0
0
0
0
0.328
−0.602 0.602 0.372
0.372
0.372 0.602
0.372 −0.372 0.602
−0.602 −0.602 0.372
30
V=
0.372 −0.602
−0.602 0.372
0.602
0.372
−0.372 −0.602
0.602 0.372
0.372 0.602
−0.372 0.602
−0.602 0.372
Then
0.372 −0.602 0.602 0.372
0.372 0.602
UΣV = −0.602 0.372
0.602
0.372 −0.372 0.602
−0.372 −0.602 −0.602 0.372
T
=
3.618
0
0
0
0
2.618
0
0
0
0
1.328
0
0
0
0
0.328
2 −1
0
−1 2
−1
0 −1 2
0
0 −1
0.372 −0.602 0.602 0.372
−0.602 0.372
0.372 0.602
0.602
0.372 −0.372 0.602
−0.372 −0.602 −0.602 0.372
0
0 = S.
−1
2
SVD for pseudo-inverse.[37-39]
Pseudo-inverse is a way to solve the following linear equation: [37-39]
𝐒 ∙ 𝐚 = 𝐛,
𝐚 ∈ Rm ; 𝐛 ∈ Rn ; 𝐒 ∈ Rm×n .
The general solution can be find by the equation:
𝐚 = 𝐒 +𝐛.
S + is the pseudo-inverse matrix of S, there are some properties for the unique
matrix: [37-39]
𝑺𝑺+ 𝑺 = 𝑺,
𝑺+ 𝑺𝑺+ = 𝑺+ ,
𝑺𝑺+ 𝑻 = 𝑺𝑺+ ,
𝑺+ 𝑺 𝑻 = 𝑺+ 𝑺
Sometime the matrix S is invertible, so SVD is used to find 𝐒 +.
𝐒 + = 𝐕𝚺 +𝐔 𝐓 .
Where the matrix 𝚺 + is equal to:
1/δ1
0
0
1/δ2
Σ+ =
0 …
0
…
0 0
0 … 0 0
0 … 0 0
1/δ3
… 0 0
…
… … …
0
… 1/δr 0
So using the same example above, we find S −1 :
0.8
S−1 = 0.6
0.4
0.2
0.6
1.2
0.8
0.4
31
0.4
0.8
1.2
0.6
0.2
0.4
0.6
0.8
T
UΣ +V T
0.372 −0.602 0.602 0.372
0.372 0.602
= −0.602 0.372
0.602
0.372 −0.372 0.602
−0.372 −0.602 −0.602 0.372
1/3.618
0
0
0
0
1/2.618
0
0
0
0
0
1/1.328
1/0.328
0
0
0
0.8
= 0.6
0.4
0.2
0.6
1.2
0.8
0.4
0.4
0.8
1.2
0.6
0.372 −0.602 0.602 0.372
−0.602 0.372
0.372 0.602
0.602
0.372 −0.372 0.602
−0.372 −0.602 −0.602 0.372
0.2
0.4 = S+ = S −1 .
0.6
0.8
Above method could find 𝐄# to help calculate the flux vector in elementary flux
mode analysis. Furthermore singular value decomposition can be applied to
decompose stoichiometric matrixes to evaluate the flux vector. [40].
2.45 Shannon’s MEP for elementary mode analysis
Apart from using Poolman‟s Morre-Penrose generalized inverse method, there are
few other methods which could determine the metabolic flux vector for elementary
flux modes. For example α-spectrum method [41], this method defines which
elementary mode can constitute the metabolic flux vector of a physiological state and
the available ranges of elementary mode coefficient by maximizing and minimizing
each elementary mode, the term elementary mode coefficient is like a weighting
factor which represents how much each elementary mode contributes to the whole
network. This method only gives range of flux vectors, so in order to obtain a final
solution, each maximum and minimum elementary mode coefficient are averaged to
get the statistical mean value using linear programming [42], constrained by
experimentally determined fluxes. Based on this method, Kurata et al. further
introduced a heuristic, non-mechanistic model to determine the range of flux vectors
of mutants by incorporating into the model the enzymatic activities of mutant relative
to the wild type and its metabolic flux vectors.[41,43]. The optimization method is
also close toα-spectrum which based on linear programming and constrained by some
experimentally determined exchange fluxes. Thus for both methods, more
experimentally determined fluxes or transcriptional regulatory constraints available
would give a better result.
Although all the methods introduced above have been successfully applied in real
live problems, for exampleα-spectrum has been applied to investigate the E.coli
central metabolism (Wiback et al. 2004) and human red blood cell metabolism
(Wiback et al. 2003). Kurata‟s method has been applied to several Saccharomyces
species under various growth conditions. [41]. They have one disadvantage in
common which is lack of a physical or a biological background behind those methods.
Quanyu Zhao and Hiroyuki Kurata proposed a method which utilize maximum
entropy principle and nonlinear programming method to optimize elementary mode
coefficients. The maximum entropy principle (MEP) is derived from Shannon‟s
information theory and is widely used in physic, chemistry, and bioinformatics for
32
T
gene expression and sequence analysis. [43].
Entropy is the core concept of thermodynamics and statistical physics, law of
entropy also known as second law of thermodynamics is one of the greatest results in
natural science in 19th century. It was first proposed by Clausius in 1864. In 1948,
C.E.Shannon introduced the concept into information theory, entropy was regarded as
the measurement of the uncertainty of a random event or the amount of information.
Hence the uncertainty of a random event can be described by probability distribution
function. [44]
Maximum entropy principle was introduced by E.R.Jaynes, its basic idea is, when
only a part of information of the distribution is known, the one with the maximum
entropy which satisfy the known part should be chose. In other word, the probability
distribution that satisfies the known information is not unique, the one which gives the
maximum entropy should be the best and we should not make any assumptions about
the unknown part. Thus by the constraints of known information, the estimation of
unknown information is the most indeterminate or most random estimation, any other
choice would presume that other constraints or assumptions are added, however those
extra constraints and assumptions cannot be supported by the known information,
hence bring destruction of the results.[44]
Shannon‟s entropy describes the uncertainty of a random event, let the probability
of a discrete random variable x taking value 𝐀 𝐱 be 𝐏𝐤 , k = 1,2, … , N, 𝐏𝐤 > 0,
n
k=1 𝐏𝐤 = 1. Then the entropy is defined as :
N
H x = −
K=1
Pk ln Pk , (1)
The flux distribution for elementary mode analysis is generally denoted as:
𝐯 = 𝐏 ·𝛌,
(2)
Where P is the elementary mode matrix in which the rows represent the reactions,
and the columns correspond to the elementary modes. λ is the elementary mode
coefficient vector and v is the flux vector. [42]. The elementary mode matrix P is
taken the form:
e1,1 e1,2 … e1,m
e2,1 e2,2 … e2,m
𝐏= ⋮
⋮
⋮
⋮ .
en,1 en,2 … en,m
In Zhao and Kurata‟s Shannon;s MEP for elementary mode analysis method, each
elementary mode is regarded as a random event. As stated above, elementary modes
are a set of all possible pathways for a metabolic network, each elementary mode
excluding internal loops must have an uptake reaction. Thus from equation 2 above,
the flux of a substrate uptake reaction should be calculated as:[42]
33
i esubstrate uptake ,i
vsubstrate
Where 𝑒𝑠𝑢𝑏𝑠𝑡𝑟𝑎𝑡𝑒
𝑢𝑝 𝑡𝑎𝑘𝑒 ,𝑖
·λi
= 1.
uptake
is the element for the substrate uptake reaction in the i-th
elementary mode in matrix P, and 𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆 is the flux of substrate uptake.
This is obvious from equation (2), the flux is equation to dot produce of elementary
mode coefficient vector and the rows of elementary mode matrix.
Then based on equation 1, the probability of the i-th elementary mode in
Shannon‟s entropy is provided as follows: [42]
ρi =
1
vsubstrate
uptake
esubstrate
uptake ,i
·λi ,
3
Thus the algorithm of Zhao and Kurata‟s method is defined below:
ne
Max −
i=1
ρi ln ρi ,
Subject to 𝐞𝐝 ·𝛌 = 𝐯𝐝
4
5
ne
i=1
ρi = 1,
(6)
𝛌 > 0 (7)
Where ne is the total number of elementary modes, 𝐯𝐝 is the vector whose fluxes
are to be determined and 𝐞𝐝 is the elementary mode sub-matrix that consists of rows
corresponding to the determined fluxes. It is clear that the objective function here
(equation (4)) is nonlinear and constrained by experimentally determined fluxes. The
results would be the maximum elementary mode coefficients which satisfy the
constraints, the flux distribution is then calculated by equation (5) [42].
This method has its disadvantage too, thus the calculation complexity for a
moderate or large-scale metabolic model. To obtain reliable estimation for such model,
Zhao and Kurata propose the MEP algorithm coupled with Lagrange Multipliers. This
method readily optimizes hundreds of thousands of elementary mode coefficients in
large-scale networks under different types of environmental and genetic perturbations.
[45].
Lagrange multiplier
Lagrange multiplier is a method of finding the maxima or minima of a function
subject to constrains. It is named after Lagrange who was one of the greatest
mathematicians in 18th century. [46,47]
Suppose we want to find:
Max f x1 , x2 , … , xn ,
34
y1 x1 , x2 , … , xn = c1
Subject to: y2 x1 , x2 , … , xn = c2 .
⋮
ym x1 , x2 , … , xn = cm
Lagrange introduced a new concept called Lagrange multiplier (usually denoted as
λs), and refined the Lagrange function as:
F x1 , x2 , … , xn
= f x1 , x2 , … , xn + λ1 y1 x1 , x2 , … , xn − c1
+ λ2 y2 x1 , x2 , … , xn − c2 + ⋯ + λm ym x1 , x2 , … , xn − cm
By the first order conditions for a constrained optimum, maximizing xi s implies
dF
dx i
= 0.[48].
Thus we have:
dF
df
dy1
dy2
dym
=0=
+ λ1
+ λ2
+ ⋯ λm
dx1
dx1
dx1
dx1
dx1
dF
df
dy1
dy2
dym
=0=
+ λ1
+ λ2
+ ⋯ λm
dx2
dx2
dx2
dx2
dx2
⋮
dF
df
dy1
dy2
dym
=0=
+ λ1
+ λ2
+ ⋯ λm
dxn
dxn
dxn
dxn
dxn
Thus by solving above n equations and m constraint equations should be able to
find the a unique solution of n+m variables, thus (x1 , x2 , … , xn ) and (λ1 , λ2 , … , λm )
(just the problem of solving the determined function). [48]
Numerical example:
Find the optimal solution of the following objective function:
Max f x, y = 2x 2 + xy − y
Subject to the constraint:
x + 2y = 3
Solution:
The Lagrange function is:
F = 2x 2 + xy − y + λ(x + 2y − 3)
Let the partial derivatives equal to zero:
dF
= 4x + y + λ = 0
dx
dF
= x − 1 + 2λ = 0
dy
dF
= x + 2y − 3 = 0
dλ
Solving above equations:
x=7
y = −2
λ = −6
Check this solution by bring the result into the constraint equation:
35
3 = x + 2y = 7 − 4 = 3
This gives the solution of the objective function:
f x, y = 2x 2 + xy − y = 98 − 14 + 2 = 86
Lagrange multipliers coupled MEP algorithm.
As stated above, the goal is to find:[42,45]
𝒏𝒆
𝑴𝒂𝒙 −
𝝆𝒊 𝒍𝒏 𝝆𝒊 ,
(𝟏)
𝒊=𝟏
Subject to 𝒆𝒅 ·𝝀 = 𝒗𝒅 (𝟐)
𝒏𝒆
𝝆𝒊 = 𝟏,
(𝟑)
𝒊=𝟏
Where
𝝆𝒊 =
𝟏
𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆
𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝝀𝒊 (𝟒)
Rearrange equation (1):
𝝀𝒊 =
𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆
𝝆 , (𝟓)
𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝒊
Bring the new equation into the first constraint equation:[45]
𝒏𝒅
𝒗𝒅 =
𝒊=𝟏
𝒆𝒅,𝒊
𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆
𝝆 (𝟔)
𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 𝒊
Where nd is the number of determined fluxes. [45]
For easier notation, let:
𝑿𝒅,𝒊 =
𝒆𝒅,𝒊
𝒗𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆
𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊
𝒊𝒇 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 ≠ 𝟎
𝟎
(𝟕)
𝒊𝒇 𝒆𝒔𝒖𝒃𝒔𝒕𝒓𝒂𝒕𝒆 𝒖𝒑𝒕𝒂𝒌𝒆,𝒊 = 𝟎
The constraint equation becomes: [45]
𝒏𝒅
𝒊=𝟏
𝑿𝒅,𝒊 𝝆𝒊 = 𝒗𝒅 (𝟖)
The Lagrange function with Lagrange multipliers 𝝋𝒋 then can be defined as:
𝒏𝒆
𝑭 𝝆𝒊 , 𝝋𝒋 = −
𝒏𝒅
−
𝒋=𝟏
𝝋𝒋
𝒏𝒆
𝝆𝒊 𝒍𝒏 𝝆𝒊 − 𝝋𝟎
𝒊=𝟏
𝒏𝒅
𝒊=𝟏
𝑿𝒅,𝒊 𝝆𝒊 − 𝒗𝒅
36
𝝆𝒊 − 𝟏
𝒊=𝟏
(𝟗)
Let the partial derivatives equal to zero:
𝒏𝒅
𝒅𝑭
= 𝟎 = −𝟏 − 𝐥𝐧 𝝆𝒊 − 𝝋𝟎 −
𝝋𝒋 𝑿𝒅,𝒊 𝒊 = 𝟏, 𝟐, … , 𝒏
𝒅𝝆𝒊
𝒋=𝟏
Let
𝝋𝟎 = 𝐥𝐧(𝒛) − 𝟏
We have:
(𝟏𝟎)
𝒏𝒅
𝒍𝒏𝝆𝒊 + 𝒍𝒏 𝒁 = −
𝒋=𝟏
𝝋𝒋 𝑿𝒅,𝒊
𝒏𝒅
𝒍𝒏(𝝆𝒊 𝒁) = −
𝝋𝒋 𝑿𝒅,𝒊
𝒏𝒅
𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊
𝒆𝒙𝒑(−
𝝆𝒊 =
𝒋=𝟏
)
𝒁
(𝟏𝟏)
By the second constraint:
𝒏𝒆
𝝆𝒊 = 𝟏
𝒊=𝟏
Then:
𝒏𝒆
𝒏𝒅
𝒆𝒙𝒑 −
𝒊=𝟏
𝒋=𝟏
𝝋𝒋 𝑿𝒅,𝒊
=𝒁
And
𝝆𝒊 =
𝒆𝒙𝒑(−
𝒏𝒆
𝒊=𝟏 𝒆𝒙𝒑
𝒏𝒅
𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊 )
− 𝒏𝒅
𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊
(𝟏𝟐)
Thus by the rearranged constraint: [45]
𝒏𝒆
𝒏𝒅
𝒊=𝟏 𝑿𝒅,𝒊 𝒆𝒙𝒑 − 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊
𝒏𝒆
𝒏𝒅
𝒊=𝟏 𝒆𝒙𝒑 − 𝒋=𝟏 𝝋𝒋 𝑿𝒅,𝒊
– 𝒗𝒓 = 𝟎
𝒓 = 𝟏, 𝟐, … , 𝒎
(𝟏𝟑)
The above method described are quoted from the paper 《Use of maximum entropy
principle with Lagrange multipliers extends the feasibility of elementary mode
analysis》written by Quanyu Zhao and Hiroyuki Kurata. [45]
A worked example:
Example (1) is used to perform above method. The given network is a typical
metabolic network which has been used in many papers as an example for different
methods. The network is defined as: [49]
37
Figure 2.451. Simple metabolic network.
The stoichiometrix matrix was read from the network in section 2.1:
v1 v2 v3 v4 v5 v6 v7 v8 v9
A
1
-1
-1
0
0
0
0
0
0
B
C
0
0
1
0
0
1
-1
0
0
0
-2
1
0
-1
0
0
0
0
D
E
F
0
0
0
0
0
0
0
0
0
2
-1
0
0
1
0
0
0
1
1
0
0
-1
0
0
0
0
-1
Table 2.451. Stoichiometrix matrix of the simple metabolic network.
Hence the elementary mode were calculated above.
E1 E2 E3
v1
2
1
1
v2
v3
v4
2
0
0
0
1
0
1
0
1
v5
v6
v7
0
1
1
0
0
1
1
0
0
v8
1
1
2
v9
1
0
0
Table 2.451. Elementary flux modes matrix of the simple metabolic network.
Thus we have:
v1
2
=
v5
0
1 1
0 1
λ1
λ2
λ3
v1 = 2λ1 + λ2 + λ3
v5 = λ3
38
vu.p = v1 + v5 = 2λ1 + λ2 + 2λ3
By equation (5) above:
(∗)
vu.p
vu.p
ρ1 =
ρ
eu.p
2 1
vu.p
vu.p
λ2 =
ρ2 =
ρ
eu.p
1 2
vu.p
λ3 =
ρ = v5
2 3
λ1 =
Thus the new constraint is:
vu.p ρ1 + vu.p ρ2 + v5 = v1
The Lagrange function is:
ne
F ρi , φj = −
i=1
ρi ln ρi – φ1 ρ1 + ρ2 + ρ3 − 1
− φ2 vu.p ρ1 + vu.p ρ2 + v5 − v1
dF
= −1 − lnρ1 − φ1 − vu.p φ2 = 0
dρ1
dF
= −1 − lnρ2 − φ1 − vu.p φ2 = 0
dρ2
dF
= −1 − lnρ3 − φ1 = 0
dρ3
Thus:
ρ1 = ρ2 = exp −1 − φ1 − φ2
ρ3 = exp −1 − φ1 =
ρ1 = ρ2 =
2v5
vu.p
exp −1 − φ1 − φ2
2
2 vu.p exp −1 − φ1 − φ2 = v1 − v5
ρ1 = ρ2 = exp −1 − φ1 − φ2 =
v1 − v5
2vu.p
Checking the results:
ρ1 + ρ2 + ρ3 =
v1 − v5 v1 − v5 v5
2v1 − 2v5 + 4v5
2(v1 + v5)
+
+
=
=
=1
2vu.p
2vu.p
vu.p
2vu.p
2vu.p
v1 − v5 v1 − v5
+
+ v5 = v1
2
2
vu.p v1 − v5
v1 − v5
λ1 =
=
2 2vu.p
4
vu.p v1 − v5
v1 − v5
λ2 =
=
1 2vu.p
2
vu.p ρ1 + vu.p ρ2 + v5 =
39
λ3 =
vu.p
ρ = v5
2 3
These results of 𝝀 also satisfy the constraint𝒆𝒅 ·𝝀 = 𝒗𝒅 .
We can see in the original method there is only one uptake reaction, so only one
row defines the uptake reaction in the elementary mode matrix. However there are
two reactions which take fluxes into the network, so an extra assumption is made that
the star equation must to be true, thus the two uptake reactions v1 and v5 are
independent, so the sum of them would give the total uptake fluxes and also the
corresponding rows of elementary mode matrix are independent as well so sum of
them could give the total uptake reaction row in the elementary mode matrix.
2.46 Summary
In this section, two of the most commonly used methods for metabolic network
analysis have been introduced. The computation behind them can be very complicated
for a large-scale network, and usually obtained by different software, such as COBRA
method in MATLAB for FBA, and C programming METATOOL and FluxAnalyzer
based on MATLAB for EFMS.
The methods of metabolic network analysis have some great contributions of our
biological researches, every method have its advantage and disadvantage. In order to
achieve the desired information, it is important to choose the “best fit” model with
given observations and specifically demand. The summary of two methods introduced
above is given by the following figure. [32]
Figure 2.452. Stoichiometric modeling: flux and pathway analysis.
40
Figure 2.452 is taken from the paper 《 Design and analysis of amino acid
supplementation in hepatocyte culture using IN VITRO experiment and mathematical
modeling》written by Hong Yang [32].
41
3. Numerical results
The two methods studied above are applied to analyze the tricarboxylic acid cycle,
glyoxylate shunt adjacent amino acid network in this section. [28] The network
defined below is a simplified network that nevertheless incorporates the key central
metabolic elements of the citric acid cycle. The tricarboxylic acid cycle also known as
citric acid cycle is the core of all living cells especially in aerobic organisms. The
TCA cycle appears in the metabolic pathways that decompose major food groups such
as carbohydrates, lipids and proteins into carbon dioxide and water to generate usable
energy. Glyoxylate shunt also known as glyoxylate cycle is an alternative route
involved in the network. This shunt ensures the system remain functional when there
is only simple carbon compounds as a carbon source. It is a two-step bypass of TCA
cycle derived by two key enzymes (isocitrate lyase Icl and malate synthase Mas).
How the network works with two different uptake resources and how it reacts on
changing the proportion of two resources are analyzed in this section.
Figure 3.1 The tricarboxylic acid cycle, glyoxylate shunt and adjacent reactions.
Abbreviations of metabolites and enzymes are given in the Appendix B.
For above network, glucose goes into PG and acetate goes into AcCoA. Flux
balance analysis and MEP combined elementary modes analysis are applied to find:
1. How the biomass varies by shifting the uptake fluxes form PG to AcCoA.
2. The activation of the glyoxylate shunt.
To indentify whether the glyoxylate shunt is active or not, there are two important
pathways:
1. Bypass 1 of glyoxylate shunt:
acetyl-CoA (AcCoA) + oxaloacetate (OAA) → cirtrate (Cit) → isocitrate (IsoCit)
42
→ glyoxylate (Gly) + acetyl-CoA (AcCoA)
(OAA).
→ malate (Mal) → oxaloacetate
2. Bypass 2 of glyoxylate shunt:
acetyl-CoA (AcCoA) + oxaloacetate (OAA) → cirtrate (Cit) → isocitrate (IsoCit)
→ succinate (Succ) →fumarase (Fum) → malate (Mal) → oxaloacetate (OAA).
The two key reactions of glyoxylate shunt are: isocitrate (IsoCit) → glyoxylate
(Gly) + succinate (Succ) and acetyl-CoA (AcCoA) + glyoxylate (Gly) →malate
(Mal).
3.1 Flux balance analysis
1. Reconstruct the network.
Enzymes
Reactions
1
Eno
PG↔PEP
2
Ppc
PEP→OAA
3
Pyk
PEP→Pyr
4
Pck
OAA→PEP
5
Pps
Pyr→PEP
6
AceEF
Pyr→AcCoA
7
GltA
OAA+AcCoA→Cit
8
Mdh
OAA↔Mal
9
Ac n
Cit↔IsoCit
10
Icl
IsoCit→Gly+Succ
11
Mas
AcCoA+Gly→Mal
12
Fum
Mal↔Fum
13
Sdh
Succ↔Fum
14
SucCD
SucCoA↔Succ
15
SucAB
OG→SucCoA
16
Icd
IsoCit→OG
17
AspC
OAA+Glu↔OG+Asp
18
AspA
Asp→Fum
19
Gdh
OG↔Glu
20
IlvE/AvtA
OG+Ala↔Pyr+Glu
21
Eno
*→PG
22
EX_AcCoA
*→AcCoA
23
AspCon
Asp→AspCon
24
AlaCon
Ala→AlaCon
25
GluCon
Glu→GluCon
26
SucCoACon
SucCoa→SucCoACon
27
Biomass
Asp+Glu+Ala+SucCoA→
Table 3.11 Reactions read from figure 1.
43
Reaction 21 and 22 represent the two possible uptake reactions.
The original network contain only 25 reactions [28], however in this thesis, the goal
is to analysis the variation of biomass and activation of glyoxylate shunt as the uptake
fluxes shift from glucose to acetate, so two extra reactions are added into the network,
which are the biomass reaction and the income reaction of AcCoA.
2. Mathematically represent metabolic reactions and constraints.
Table 3.12 Stoichiometric matrix construct by table 3.11.
3. Define the constraints and objective function
Max F = vbiomass
27
Such that
j=1
Sij vi = 0,
𝐯𝐣𝐦𝐢𝐧 ≤ 𝐯𝐢 ≤ 𝐯𝐣𝐦𝐚𝐱 ,
i∈M
j∈N
0
For irreversible reactions
−∞
For revversible reaction
v
For uptake reactions.
= uptake
∞
Otherwise.
𝐯𝐣𝐦𝐢𝐧 =
𝐯𝐣𝐦𝐚𝐱
The fluxes are normalized which means if all uptake fluxes are coming from
reaction 21, then vuptake ,21 = 1. Because we are interested in the variation of
biomass when the uptake flux shift from glucose to acetate, so the final step is to find:
𝐯biomass = x 𝐏𝐆 + 1 − x 𝐀𝐜𝐂𝐨𝐀 ,
44
x = 1,0.99,0.98, … ,0.01,0. (1)
Since the fluxes are normalized, that means run above calculation x times, each
time the uptake flux shift 0.01 from PG to AcCoA.
The result is calculated by optimization toolbox in MATLAB.
Results:
x=
1
0.8
0.6
0.4
0.2
0
R1
1.00
0.80
0.60
0.40
0.20
0.00
R2
0.60
0.60
0.42
0.24
0.07
0.00
R3
0.40
0.20
0.18
0.16
0.13
0.11
R4
0.00
0.00
0.00
0.00
0.00
0.11
R5
0.00
0.00
0.00
0.00
0.00
0.00
R6
0.20
0.00
0.00
0.00
0.00
0.00
R7
0.20
0.20
0.29
0.38
0.47
0.56
R8
0.20
0.20
-0.04
-0.29
-0.53
-0.78
R9
0.20
0.20
0.29
0.38
0.47
0.56
R10
0.00
0.00
0.11
0.22
0.33
0.44
R11
0.00
0.00
0.11
0.22
0.33
0.44
R12
0.20
0.20
0.07
-0.07
-0.20
-0.33
R13
0.20
0.20
0.07
-0.07
-0.20
-0.33
R14
0.20
0.20
0.18
0.16
0.13
0.11
R15
0.00
0.00
0.00
0.00
0.00
0.00
R16
0.20
0.20
0.18
0.16
0.13
0.11
R17
0.20
0.20
0.18
0.16
0.13
0.11
R18
0.00
0.00
0.00
0.00
0.00
0.00
R19
0.60
0.60
0.53
0.47
0.40
0.33
R20
-0.20
-0.20
-0.18
-0.16
-0.13
-0.11
R21
1.00
0.80
0.60
0.40
0.20
0.00
R22
0.00
0.20
0.40
0.60
0.80
1.00
R23
0.00
0.00
0.00
0.00
0.00
0.00
R24
0.00
0.00
0.00
0.00
0.00
0.00
R25
0.00
0.00
0.00
0.00
0.00
0.00
R26
0.00
0.00
0.00
0.00
0.00
0.00
R27
0.20
0.20
0.18
0.16
0.13
0.11
Table 3.13 Some of the results from MATLAB.
The MATLAB code for FBA is given in the Appendix C. The full flux table is
given in the supplementary file.
By plotting equation (1), we can see the biomass general decreases as the uptake
fluxes shifting from glucose to acetate. When the glucose is the only uptakes, the
45
fluxes of the end products take up 20% of the total income fluxes. Once the acetate
income fluxes flow in the network exceeds 20% of the total flux in, the biomass starts
to decrease smoothly. At the end, when only acetate flows into the network, the
biomass reaches the lowest point which is only approximately half of the amount
when there is only glucose flow in.
Biomass = xPG + (1-x)AcCoA
0.2500
biomass
0.2000
0.1500
0.1000
FBA
0.0500
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.0000
x
Figure 3.11. Biomass plot as uptake fluxes shift from PG to AcCoA by FBA.
TCA cycle:
0.7
0.6
0.5
Flux
0.4
OAA→PEP
0.3
PEP→OAA
0.2
PEP→Pyr
0.1
Pyr→AcCoA
-0.1
1
0.93
0.86
0.79
0.72
0.65
0.58
0.51
0.44
0.37
0.3
0.23
0.16
0.09
0.02
0.0
x
Figure 3.12. Fluxes of the reactions that enter the TCA cycle and produce AcCoA
from glucose.
46
Figure 3.12 shows how the fluxes flow into the TCA cycle from glucose and
produce AcCoA. We can see when all the uptake fluxes are glucose, 60% of the fluxes
from glucose flow to TCA cycle and rest of the fluxes flow to the reactions that
produce AcCoA. The flux of the pathway that produces AcCoA decrease with the
income fluxes of glucose decrease. When the fluxes of glucose drop below 80% of the
total uptake fluxes, there is no more AcCoA produced by such pathway, and the fluxes
flow to the TCA cycle start to drop and vanish as soon as the income fluxes of glucose
drop under 12%.
Glyoxylate shunt:
Figure 3.13. Two key reaction of activation of glyoxylate shunt.
Figure 3.13 shows how the glyoxylate shunt actives in such network with given
uptake flux patterns. Opposite to the TCA cycle, when there is not enough acetate
flow into AcCoA, the glyoxylate shunt is deactivated, once the fluxes of glucose drop
under 80%, in other word, the income fluxes flow from AcCoA into the network
exceeds 20% of the total uptake fluxes, the glyoxylate shunt become active.
Summary.
We can see from above figures, flux balance analysis suggests that such system
gives maximum biomass when at least 80% of uptake fluxes are glucose, once the
percentage drop below this point, biomass start to decrease, however the glyoxylate
shunt become activate. Beside it also suggests another turning point which is when the
uptake fluxes are distributed as 12% of glucose and 88% of acetate, this is when there
is no more fluxes flow into the TCA cycle from PG, showed in figure 3.12, so the
entire system is relying on the glyoxylate shunt, however for such distribution
biomass is only approximately half of the amount when the flux is distributed as 80%
of glucose and 20% of acetate.
47
3.2 Elementary flux modes analysis with Shannon’s MEP.
1. Determinate the elementary flux modes.
Using “efmtool” package [50], import the stoichiometric matrix defined above, 16
elementary flux modes were found for original network, 42 were found for the
adjusted network where two extra reactions were added. The codes and elementary
flux modes matrix are given in the Appendix D.
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
E13
E14
E15
E16
R1
0
2
3
1
3
2
3
1
1
2
1
2
1
0
0
1
R2
0
0
0
1
1
1
0
1
0
1
1
0
0
0
1
0
R3
1
2
3
0
2
1
3
0
2
1
0
2
1
0
0
1
R4
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
R5
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
R6
0
2
3
0
2
1
3
0
2
1
0
2
0
0
0
1
R7
0
1
2
0
1
1
2
0
1
1
0
1
0
0
0
1
R8
0
-2
-2
1
0
0
-2
0
-2
0
0
-1
0
-1
0
-1
R9
0
1
2
0
1
1
2
0
1
1
0
1
0
0
0
1
R10
0
1
1
0
1
0
1
0
1
0
0
1
0
0
0
0
R11
0
1
1
0
1
0
1
0
1
0
0
1
0
0
0
0
R12
0
-1
-1
1
1
0
-1
0
-1
0
0
0
0
-1
0
-1
R13
0
1
1
-1
-1
0
1
0
1
0
-1
0
0
0
0
1
R14
0
0
0
-1
-2
0
0
0
0
0
-1
-1
0
0
0
1
R15
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
R16
0
0
1
0
0
1
1
0
0
1
0
0
0
0
0
1
R17
0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
0
R18
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
R19
0
1
0
0
0
0
1
1
0
1
1
0
1
1
0
0
R20
0
0
0
0
0
0
0
0
0
0
0
0
-1
0
0
0
R21
0
2
3
1
3
2
3
1
1
2
1
2
1
0
0
1
R22
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
R23
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
R24
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
R25
0
0
1
1
2
1
0
0
0
0
1
1
0
0
0
0
2
7
16
10
11
13
14
3
6
12
8
9
4
5
1
15
Table 3.21. 16 elementary flux modes for original network.
Table 3.21 shows the 16 elementary flux modes for original network, the last line
identifies the order given in the paper [28]. The 42 elementary flux modes matrix will
be used in this section.
Quanyu Zhao and Hiroyuki Kurata coded the algorithm of elementary flux modes
analysis with Shannon‟s MEP method as well as the method combined with Lagrange
multipliers, the packages were given in the supplementary file of their original papers
48
[42, 45, 51, 52]. This package is adapted in this thesis, the given metabolic network is
relatively small which only contains 42 elementary mode, therefore the simple
method is used. In Quanyu Zhao and Hiroyuki Kurata‟s paper [42, 45], the metabolic
networks they analyzed only contain one uptake reaction, so as the code. In case to
perform such analysis to the TCA cycle and glyoxylate shunt network where contain
two uptake reactions, the code were adjusted. The new codes are given in the
Appendix E.
Results:
Biomass
0.2
0.18
0.16
0.14
Flux
0.12
0.1
0.08
0.06
0.04
0.02
1
0.96
0.92
0.88
0.84
0.8
0.76
0.72
0.68
0.64
0.6
0.56
0.52
0.48
0.44
0.4
0.36
0.32
0.28
0.24
0.2
0.16
0.12
0.08
0.04
0
0
x
Figure 3.21. The Biomass plot as uptake fluxes shift from PG to AcCoA by MEP.
Figure 3.21 indicates that the biomass as the uptake fluxes shift from glucose to
acetate. When the glucose is the only uptakes, the fluxes of the end products equal 18%
of the total income fluxes. Once the income fluxes of acetate flow in the network
exceed 36% of the total flux, the biomass starts to decrease smoothly. At the end,
when only acetate flows into the network, the biomass reaches the lowest point which
is only approximately half of the amount when glucose is the only uptakes.
49
TCA cycle:
Figure 3.22. Fluxes of the reversible reaction that enter the TCA cycle and produce
AcCoA from glucose by MEP.
Figure 3.22 shows how the fluxes flow through the TCA cycle from glucose and
produce AcCoA from glucose. We can see when all the uptake fluxes are glucose,
47.8% of the fluxes from glucose flow to TCA cycle and 56.9% of the fluxes flow to
the reactions that produce AcCoA. The flux of the pathway that produces AcCoA
decrease with the income fluxes from glucose. When the fluxes of glucose drop below
64% of the total uptake fluxes, there is no more AcCoA produced by such pathway,
and the fluxes flow to the TCA cycle start to drop, and nearly reached zero as all
uptake fluxes come from AcCoA. The vertical line shows a turning point where the
uptake fluxes are distributed as 64% of glucose and 46% of acetate.
50
Glyoxylate shunt:
AcCoA+Gly→Mal
0.5
0.4
0.4
0.3
0.3
flux
0.5
0.2
0.2
0.1
0.1
0
0
1
0.91
0.82
0.73
0.64
0.55
0.46
0.37
0.28
0.19
0.1
0.01
1
0.91
0.82
0.73
0.64
0.55
0.46
0.37
0.28
0.19
0.1
0.01
flux
IsoCit→Gly+Succ
x
x
Figure 3.23. Two key reaction of activation of glyoxylate shunt by MEP.
Figure 3.23 shows how the glyoxylate shunt actives in such network with given
uptake flux patterns. The MEP method shows that although only a very small amount
of fluxes, approximately 9% of total fluxes appears in those reactions, the glyoxylate
shunt is active even 100% of the uptake fluxes come from PG. Once the fluxes of
glucose drop under 64%, in other word, the income fluxes flow from AcCoA into the
network exceeds 36% of the total uptake fluxes, the glyoxylate shunt become more
active.
Summary.
We can see from above figures, the MEP method suggests that such system gives
maximum biomass when at least 64% of uptake fluxes are glucose, once the
percentage drop below this point, biomass start to decrease. The glyoxylate shunt is
active no matter how the uptake fluxes distributed. Moreover there are always fluxes
flows into oxaloacetate (OAA) from PEP which represents the activation of TCA
cycle. So the TCA cycle and the glyoxylate shunt are always work together.
3.3 Comparison of two methods.
Both methods determined a turning point, for FBA the ratio of glucose and acetate
is 8:2, for MEP the ratio is 6.4:3.6, once the fluxes of acetate exceeds such point,
fluxes estimated by above two methods have a similar trend for most reactions, all
comparison figures are given in Appendix F. The comparison of biomass estimation
and activation of glyoxylate shunt are given below.
51
FBA vs MEP
0.2500
0.2000
Biomass
0.1500
0.1000
FBA
MEP
0.0500
1
0.94
0.88
0.82
0.76
0.7
0.64
0.58
0.52
0.46
0.4
0.34
0.28
0.22
0.16
0.1
0.04
0.0000
x
Figure 3.31. The predicted biomass results of FBA and MEP.
Figure 3.31 shows the biomass predicted by FBA and MEP, although these two
methods suggest different maximum and minimum value of biomass and different
turning points, they all predict a decreasing trend once the ratio of two uptake fluxes
reach a certain point and the rate of decreasing are very close too, as for FBA
0.1111/0.2 = 0.5555, for MEP 0.1/0.1818 = 0.5501.
AcCoA+Gly→Mal
0.5
0.5
0.4
0.4
0.3
0.3
MEP
0.2
flux
0.2
FBA
0.1
FBA
0.1
0
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
0
-0.1
MEP
-0.1
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
IsoCit→Gly+Succ
x
x
Figure 3.32. The fluxes of two key reaction of activation of glyoxylate shunt predicted
by two methods.
The major different between FBA and MEP is that although both methods suggest
that growth of acetate is necessary for the activation of glyoxylate shunt, FBA predicts
no glyoxylate shunt activation when there is not enough acetate flow into AcCoA,
MEP on the other side suggest the glyoxylate shunt is always active, and become
more active when more acetate flow into the network. Moreover FBA suggests that
once the fluxes of glucose drop under 12% of total uptakes, there is zero flux flow
52
into the TCA from phosphoenolpyruvate (PEP) as we can see from figure 3.12, the
flux of reaction PEP→OAA is zero after that point, thus the system totally rely on the
glyoxylate shunt. However on the other hand, MEP method suggests that TCA cycle
and glyoxylate shunt both works at all the time.
IsoCit→OG
0.25
0.2
flux
0.15
MEP
0.1
FBA
0.05
1
0.94
0.88
0.82
0.76
0.7
0.64
0.58
0.52
0.46
0.4
0.34
0.28
0.22
0.16
0.1
0.04
0
x
Figure 3.33. The flux of IsoCit→OG estimated by FBA and MEP.
Pyr→PEP
0.025
0.02
flux
0.015
0.01
MEP
0.005
FBA
-0.005
1
0.94
0.88
0.82
0.76
0.7
0.64
0.58
0.52
0.46
0.4
0.34
0.28
0.22
0.16
0.1
0.04
0
x
Figure 3.34. The flux of Pyr→PEP estimated by FBA and MEP.
Figure 3.33 shows another different trends of fluxes estimated by FBA and MEP,
where FBA predict a decrease, MEP suggest a slightly increase. Figure 3.34 shows
MEP predict small amount fluxes of the reaction where FBA estimated none, the
small amount of fluxes ensured that there is always possible of generate OAA by PEP
even there is no fluxes flow in from PG to PEP.
It is usually believed that this reaction is responsible for the swithch between TCA
53
cycle and the glyoxylate shunt [28], the glutamate (Glu) synthesis requires both
2-oxoglutarate (OG) and oxaloacetate OAA (either from PEP or regenerated via
glyoxylate), so it is sensible to assume that IsoCit→OG always active or even
increase to ensure the fluxes exist in the Pyr→PEP reaction as the fluxes flow from
PEP to OAA decrease with the decrease of glucose flow in.
54
4. Conclusion
In this thesis, flux balance analysis and elementary flux modes with Shannon‟s
MEP analysis were studied and applied to analyzing real metabolic network. Both
methods predict a similar flux distribution trend, however two very different results
on the activation of glyoxylate shunt.
Flux balance analysis is based on linear optimization programming with
stoichimetric constraints, which is easy to conduct since complex biochemical and
physical information are not required such as kinetic parameters, and it gives a
reasonable estimation of the flux distribution, hence FBA is one of the most used
methods in metabolic network flux analysis. However the simple objective function
and constraints assume that all metabolites process as their maximum rate which may
not be always true, so for further studies, how to construct the model with more
biochemical background related objective function and constraints is a major goal.
Elementary flux modes analysis method performs better in pathway and network
structure analysis, when combining with Shannon‟s MEP, it utilizes experimentally
data as constraints and a stronger biochemical related objective function and certainly
evaluated some interesting behaviors of the metabolic network. However this method
so far has only been proved efficient with one uptake reaction. In this thesis two
uptake reactions were just simply added together to form the total uptake reaction
which is too naive, it may be sensible when two uptakes are totally independent like
the example given in section 2.45, however in the TCA cycle and glyoxylate shunt
network, it is possible that PG and AcCoA regenerate one by another which could
affect the estimation, so a more mathematically or biochemically procedure to obtain
the total uptake reaction by several sub-uptake reaction could be helpful to further
improve this method.
55
Reference
[1] Fiehn O, Metabolomics—the link between genotypes and phenotypes. Plant
Molecular Biology. 2002.
[2] Ma H, Zenf A·P: Reconstruction of metabolic networks from genome data and
analysis of their global strcture for various organisms. Bioinformatics 2003.
[3] Patil KR, Nielsen J: Uncovering transcriptional regulation of metabolism by using
metabolic network topology. Proc Natl Acad Sci 2005.
[4] Watts DJ, Strogatz SH: Collective dtnamics of „small-world‟ networks. Nature
1998.
[5] Wagner A, Fell D A. The small world inside large metabolic networks.
Proceedings Biological Sciences 2001.
[6] Ma H W and Zeng A P, Reconstruction of metabolic networks from genome data
and analysis of their global structure for various organisms, Bioinformatics, 2003.
[7] Salas A, Yao Y G, Macaulay V, et al. A critical reassessment of the role of
mitochondria in tumorigenesis. PloS Med 2005.
[8] Jiang D. Network-based Metabolic Flux and Structure Analysis. 2006.
[9] Peter van Nes. Metabolic network modeling 2006.
[10] Jeffrey D Orth, Ines Thiele & Bernhard Palsson. What is flux balance analysis.
Primer Computational Biology.
[11] Qian H, Beard D A. Thermodynamics of stoichiometric biochemical networks in
living systems far from equilibrium. Biophysical Chemistry, 2005.
[12] Roded Sharan. Analysis of Biological Networks: Constraint-Based Modeling of
Metabolic Networks. 2006
[13] German J B, Hammock B D, Watkins SM. Metabolomics: building on a century
of biochemistry to guide human health. Metaboloics, 2005
[14] Kenneth J Kauffman, Purusharth Prakash and Jeremy S Edwards. Adbances in
flux balance analysis. 2003.
[15] Tomer Benyamini, Ori Folger, Eytan Ruppin and Tomer Shlomi. Flux balance
analysis accounting for metabolite dilution.
[16] Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG
for representation and analysis of molecular networks involving diseases and drugs.
Nucleic Acids Res. 38, D355-D360 (2010).
[17] Covert M W, Schilling C H, Palsson B. Regulation of gene expression in flux
balance models of metabolism. Journal of Theoretical Biology, 2001.
[18] Lee J M, Gianchandani E P, Papin J A. Flux balance analysis in the era of
metabolomics. Bioinformatics, 2006.
[19] Beard D A, Liang S D, Qian H. Energy balance for analysis of complex
metabolic networks. Biophysical Journal, 2002.
[20] Segre D, Vitkup D, Church G M. Analysis of optimality in natural and perturbed
metabolic networks. Pro Natl Acad Sci USA, 2002.
[21] Shlomi T, Berkman O, Ruppin E. Regulatory on/off minimization of metabolic
flux changes after genetic perturbations. Proc Natl Acad Sci USA, 2005.
56
[22] Burgard A P, Pharkya P, Maranas C D. Optknock: a bilevel programming
framework for identifying gene knockout strategies for microbial strain optimization.
Biotechnology and Bioengineering, 2003.
[23] R. Mahadevan, J.S. Edwards, F.J. Doyle III. Dynamic flux balance analysis for
metabolic modeling.
[24] Ralf Steuer and Bjorn H. Junker. Computational Models of Metabolism: Stability
and Regulation in Metabolic Networks.
[25] S.H Zhou. Algorithm Research and Improve in Optimize design.
[26] F.X Wang. Linear programming. 2002.
[27] G.R. Walsh. An introduction to linear programming.
[28] Stefan Schuster, Thoms Dandekar and David A. Fell. Detection of elementary
flux modes in biochemical networks: a promising tool for pathway analysis and
metabolic engineering.
[29] Cong T. Triinh, Aaron Wlaschin, Friedrich Srienc. Elementary Mode Analysis: A
Useful Metabolic Pathway Analysis Tool for Characterizing Cellular Metabolism.
[30] Klamt S, Stelling J. Combinatorial complexity of pathway analysis in metabolic
networks. Molecular Biology Reports, 2002.
[31] Julien Gagneur, Steffen Klamt. Computation of elementary modes: a unifying
framework and the new binary approach.
[32] Hong Yang, Design and analysis of amino acid supplementation in hepatocyte
culture using IN VITRO experiment and mathematical modeling.
[33] Steffen Klamt, Jorg Stelling. Two approaches for metabolic pathway analysis.
[34] Stefan Schuster, Thomas Dandekar, David Fell. Description of the algorithm for
computing elementary flux modes.
[35] S.schuster, C. Hilgetag. J.H. Woods, D.A. Fell. Reaction routs in biochemical
reaction systems: Algebraic properties, validated calculation procedure and example
from nucleotide metabolism. Mathematical Biology, 2002.
[36] M.G. Poolman, K.V. Venakatesh, M.K. Pidcock, D.A. Fell. A method for the
determination of flux in elementary modes, and its application to Lactobacillus
rhamnosus.
[37] Neil Muller, Lourenco Magaia, B.M. Herbst. Singular Value Decomposition,
Eigenfaces, and 3D reconstructions.
[38] David Clark. A note on the pseudo-inverse.
[39] Zhihua Zhang. Lecture Notes 4: Examples of the SVD.
[40] Iman Famili, Bernhard O. Palsson. MBW: Singular value decomposition of
stoichiometric matrices.
[41] Cong T. Trinh Aaron Wlaschin Friedrich Srienc. Elementary mode analysis: a
useful metabolic pathway analysis tool for characterizing cellular metabolism.
[42] Quanyu Zhao, Hiroyuki Kurata. Maximum entropy decomposition of flux
distribution at steady state to elementary modes.
[43] Hiroyuki Kurata, Quanyu Zhao, Ryuichi Okuda, Kazuyuki Shimizu. Integration
of enzyme activities into metabolic flux distributions by elementary mode analysis.
[44] Bo Liu. Entropy and laser.
[45] Quanyu Zhao, Hiroyuki Kurata. Use of maximum entropy principle with
57
Lagrange multipliers extends the feasibility of elementary mode analysis.
[46] Dan Klein. Lagrange Multipliers without Permanent Scarring.
[47] Com S 477/577 Lagrange Multipliers 2008,
http://www.cs.iastate.edu/~cs577/handouts/lagrange-multiplier.pdf]
[48] Xiaojun Chen. First order conditions for nonsmooth discretized constrained
optimal control problems.
[49] http://bio.freelogy.org/wiki/User:JeremyZucker
[50] “efmtool” package, Computational systems biology,
http://www.csb.ethz.ch/tools/efmtool
[51] Quanyu Zhao, Hiroyuki Kurata. “mep1” package,
http://www.cadlive.jp/MetabolicEngineering/MEP/ManualMEP.htm
[52] Quanyu Zhao, Hiroyuki Kurata. Use of maximum entropy principle extends the
feasibility of elementary mode analysis.
http://www.cadlive.jp/MetabolicEngineering/MEPLM/InstructionMEPLM.htm
58
Appendix
Appendix A. Calculate the elementary flux modes matrix for example metabolic
network by “efmtool” package.
“efmtool” package is a EFM matrix calculation toolbox for MATLAB which can be
download from http://www.csb.ethz.ch/tools/efmtool.
First download and install the package, then set the path to where the package
installed for MATLAB.
Second, input the stoichiometric matrix into excel or text file.
Then import the data to MATLAB.
Finally perform the following codes:
>> stru.stoich=[data]; % read in the stoichiometric matirx
>> stru.reversibilities=[0,0,0,0,0,0,0,0,0]; % define the reversibility of each reaction,
1 for reversible reaction, 0 for irreversible reactions.
>> mnet=CalculateFluxModes(stru)
The outcome:
2 1 1
2 1 0
0 0 1
0 1 0
0 1 0
1 0 0
1 0 1
1 2 1
1 0 0
This is the same with hand calculated result.
Appendix B. Abbreviations of metabolites and enzymes for TCA cycle, glyoxylate
shunt and adjacent reactions network.
Abbreviations of metabolites:
AcCoA -- acetyl-CoA;
Ala -- alanine;
Asp -- aspartate;
Cit -- citrate;
Fum -- fumarate;
Glu -- glutamate;
Gly -- glyoxylate;
IsoCit -- isocitrate;
Abbreviations of enzymes:
AceEF -- pyruvate dehydrogenase;
Acn -- aconitase;
AspA -- aspartase;
AspC -- aspartate aminotransferase;
Eno -- enolase;
Fum -- fumarase;
Gdh -- glutamate dehydrogenase;
GltA -- citrate synthase;
Mal -- malate;
Icd -- isocitrate dehydrogenase (in E.
59
coli with cofactors NADP/NADPH);
OAA -- oxaloacetate;
Icd -- isocitrate dehydrogenase (in E.
coli with cofactors NADP/NADPH);
OG -- 2-oxoglutarate;
PEP -- phosphoenolpyruvate;
Icl -- isocitrate lyase;
Mas -- malate synthase;
PG -- 2-phosphoglycerate;
IlvE/AvtA -- branched-chain amino
acid aminotransferase/valine-pyruvate
aminotransferase;
Pyr -- pyruvate;
Mdh -- malate dehydrogenase;
Succ -- succinate;
Pck -- PEP carboxykinase (in E. coli
with cofactors ADP/ATP);
SucCoA -- succinyl-CoA.
Ppc -- PEP carboxylase;
Pps -- PEP synthetase;
Pyk -- pyruvate kinase;
Sdh -- succinate dehydrogenase;
SucAB -- 2-oxoglutarate dehydrogenase;
SucCD -- succinyl-CoA synthetase (in
E. coli with cofactors ADP/ATP);
AlaCon, AspCon, GluCon and SucCoACon are consumption of alanine, aspartate,
glutamate and succinyl-CoA, respectively.
Appendix C. MATLAB code for FBA to analyze the TCA cycle and adjacent
reactions network.
First import the stoichiometric matrix shown in Table 3.12 into MATLAB.
Then perform the following codes:
>> f=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1];
>> b=zeros(16,1);
>> lb=[-Inf,0,0,0,0,0,0,-Inf,-Inf,0,0,-Inf,-Inf,-Inf,0,0,-Inf,0,-Inf,-Inf,0,0,0,0,0,0,0];
>>ub=[Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,Inf,1,0,Inf,Inf,Inf,I
nf,Inf];
>> simlp(f,data,b,lb,ub,[],16)
60
Appendix D. The codes for calculate the elementary flux modes matrix for TCA
cycle and glyoxylate shunt network and the elementary flux modes matrix.
First, inport the stoichiometric matrix shown in Table 3.12 into MATLAB.
Then perform the following codes:
>> stru.stoich=[data]; % read in the stoichiometric matrix
>> stru.reversibilities=[1,0,0,0,0,0,0,1,1,0,0,1,1,1,0,0,1,0,1,1,0,0,0,0,0,0,0]; %
define the reversibility of each reaction, 1 for reversible reaction, 0 for irreversible
reactions.
>> mnet=CalculateFluxModes(stru)
The elementary flux modes matrix:
Appendix E. “mep1” package code for elementary flux modes analysis with
Shannon‟s MEP method.
This program was originally written by Quanyu Zhao and Hiroyuki Kurata and
given in the supplementary file of their original papers
First download the package from the website:
http://www.cadlive.jp/MetabolicEngineering/MEP/ManualMEP.htm
Install the package and set the path to the file for MATLAB
Rewrite the code so it can perform with two uptake reactions.
function example
clear all
global uptakeflux uptake
disp('Please input the name of data file for the optimization')
disp('Make sure the date format is right')
alldata=input('The file name is;
','s')
61
datas=textread(alldata);
nr=datas(1,1);% the number of reactions
nd=datas(1,2);% the number of determined fluxes for optimization
uptakeflux=datas(1,3);% the flux for uptake reaction
%input the matrix A
for i=2:1:(nd+1)
emmatrix(i-1,:)=datas(i,:);
end
[a1,a2]=size(emmatrix);
%input the mxtrix b
for i=1:1:nd
fluxv(1,i)=datas(nd+2,i);
end
%input the vector for uptake reaction in elmentary mode matrix
uptake1=datas(nd+3,:);
uptake2=datas(nd+4,:);
uptake=uptake1+uptake2;
%input elementary mode matrix
for i=(nd+4):1:(nd+3+nr)
emall(i-nd-3,:)=datas(i,:);
end
x0=uptakeflux/a2*ones(a2,1);
lb=1e-9*ones(a2,1);
hb=100*ones(1,a2);
options=optimset('LargeScale','off');
[x,fval]=fmincon(@func,x0,[],[],emmatrix,fluxv,lb,hb,[],options)
fluxdistribution=emall*x;
disp('Flux distribution is: ')
disp(fluxdistribution)
save('outputEMC.dat','x','-ascii');
save('outputflux.dat','fluxdistribution','-ascii');
disp('Optimization is successful. You can close it now.')
pause
function f=func(y)
global uptakeflux uptake
f=0;
[u1,u2]=size(uptake);
for i=1:u2
if uptake(1,i)~=0.0
f=f-uptake(1,i)*y(i)/uptakeflux*log(uptake(1,i)/uptakeflux*y(i));
end
disp(f)
end
An input text file is needed to perform which all information about the metabolic
62
network are included, such as number of reactions, number of determined reactions,
elementary flux modes matrix. Full details of how to construct the input file is given
in the supplementary file which can be found in the above website.
Then input “mep1” in the MATLAB command window and input the text file name
to run the program.
Appendix F. Comparison table of FBA results and MEP results.
FBA vs MEP
0.2500
Biomass flux
0.2000
0.1500
FBA
0.1000
MEP
0.0500
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.0000
x
PEP→Pyr
0.6
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
Flux
0.4
0.3
MEP
0.2
FBA
0.1
MEP
FBA
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
0
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
flux
PEP→OAA
x
x
63
0.025
0.02
0.015
MEP
flux
FBA
0.01
-0.005
x
x
Pyr→AcCoA
OAA+AcCoA→Cit
flux
flux
MEP
Cit↔IsoCit
0.7
0.2
0.6
0
0.5
MEP
flux
0.4
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
FBA
0.4
MEP
0.3
FBA
0.2
-0.6
0.1
-0.8
0
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
FBA
x
OAA↔Mal
-0.4
MEP
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
FBA
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
x
-1
FBA
0.005
0
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-0.2
MEP
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.02
Pyr→PEP
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
flux
OAA→PEP
x
x
64
AcCoA+Gly→Mal
0.5
0.4
0.4
0.3
0.3
MEP
FBA
0.1
0
0
-0.1
x
x
Succ↔Fum
Mal↔Fum
0.3
0.3
0.2
0.2
0.1
0.1
0
MEP
FBA
flux
-0.1
-0.2
-0.2
-0.3
-0.3
-0.4
-0.4
-0.5
-0.5
x
FBA
OG→SucCoA
SucCoA↔Succ
0.12
0.25
0.1
0.2
0.08
MEP
0.1
FBA
flux
0.15
0.06
0.02
0
0
-0.02
x
MEP
FBA
0.04
0.05
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
MEP
x
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
-0.1
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
0
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
FBA
0.1
-0.1
MEP
0.2
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
0.2
flux
0.5
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
IsoCit→Gly+Succ
x
65
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
0.1
0
0.3
0.2
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
MEP
0.15
MEP
FBA
flux
0.25
0.25
0.2
0.2
FBA
0.05
x
0.7
OG↔Glu
0
0.6
0.5
-0.05
0.4
0
-0.25
x
66
1
0.88
0.76
0.64
0.52
0.4
0.28
0.16
0.04
0.15
flux
1
0.89
0.78
0.67
0.56
0.45
0.34
0.23
0.12
0.01
flux
IsoCit→OG
OAA+Glu↔OG+Asp
0.1
MEP
0.05
FBA
0
x
OG+Ala↔Pyr+Glu
-0.1
MEP
-0.15
FBA
0.1
-0.2
x
Download