Model Fitting Tools for BioSPICE and the Systems Biology Workbench

advertisement
Model Fitting Tools for BioSPICE and the Systems
Biology Workbench
Vijay S. Chickarmane1∗, Cameron Wellock1 ,Herbert M. Sauro1,2
1
Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA
2
Control and Dynamical Systems 107-81,
California Institute of Technology, CA 91125, USA
May 19, 2004
∗
Address for Correspondence: Vijay S Chickarmane, Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA, Phone: (909) 607 0101 Fax: (909) 607 8086 e-mail: vchickar@kgi.edu
1
Abstract
As a contribution to the BioSPICE project, we have developed an optimization module, for the task of fitting kinetic rate constants to time series concentration data. The
algorithms use, both local searches, such as the Levenberg-Marquardt, and simplex, as well
as global search methods such as simulated annealing and real coded genetic algorithms.
We provide a description of the modules in terms of the algorithms used, and the software components. We describe a few test cases which serve the purpose of demonstrating
the software. We comment on determination of confidence limits, and issues related to
observability of the fitted parameters.
Keywords:
Optimization, Systems Biology, Software, Model Fitting
2
1
Introduction
The development of computational models of biochemical networks is becoming an important part of
modern research in both the commercial and academic settings. The task of building a model is at
minimum a two stage process. First the network of reactions and interactions must be known, such
information can be gleaned from traditional sources or by more recent genomic approaches (Papin et
al 2003; Covert et al 2001). Secondly once the network is described, rates laws and associated rate
constants must be assigned. This process is arguably the most difficult, since not only must a decision
be made on the type of rate law for each reaction, but values for all the rate constants must also be
determined. The later problem is the subject of this paper.
The traditional approach to parameter determination is to fit a model to a set of time-series data. Such
data will describe the time course of concentrations of molecular species in the network (Voit 2000).
However such data is also inherently noisy, usually as a result of instrumentation or experimental error.
The problem of estimating parameter values is not easy and depends on availability of suitable data, the
level of noise in the data and of course the quality of the underling network model. Indeed, parameter
estimation may also be used as a means to compare different network models to determine which is
most likely.
Several existing software applications already exist that can be used to fit time-series concentration data
to network models (Kuzmic 1991, Mendes 1993, Goryanin et. al. 1999, Yoshimura, 2003). However,
they are either closed source, commercially related or web based. We wanted to create an open source
freely available toolkit for parameter estimation. In addition we wanted the software to fit into the
extensible computational platforms, BioSPICE and SBW (Sauro et. al. 2003). Both projects aim to
provide a computational platform for the simulation/and analysis of biochemical systems.
Our aim was to permit users to choose between different optimization methods, namely, local searches,
which can be used when the user has a good idea about the initial conditions, and global searches, when
no such initial guess can be made. In addition, we wanted built-in support for estimating parameter
uncertainty and observability, a feature often missing form optimizations packages. The paper is arranged
as follows: In Section 2, we discuss the basic methodology of the optimization techniques and the
individual algorithms used. Although these methods will be familiar to a lot of readers, we hope to
make this discussion self-contained. Parameter estimation is discussed in Section 3, where our methods
use both simulated data as well as real data. We also describe Monte Carlo simulations for the purposes
of determining confidence intervals and parameter observability. In Section 4 we discuss the software
components, which were used in building the optimization toolkit, and provide some screen shots of the
user interface and implementation details. We summarize our conclusions in Section 5.
2
Optimization Techniques
Assuming we are presented with noisy time-series data on the concentration of molecular species in a
network, our goal is to fit the data to a given model such that the fit with the data and best prediction
of the model is minimized. Hence we have to find the set of parameters, so as to increase the likelihood
of observing the given data. If we assume that the residual, i.e. the difference between the model
and the data, is Gaussian distributed, then this naturally leads to a maximum likelihood analysis of the
parameter estimation problem, which implies that, the likelihood is maximized if the sums of squares of
the error between the model and the data is minimized (Press et al 1992; see for example Ewens and
3
Grant 2001). The sums of squares is represented by1 ,
(θ) =
N M
Wk (Si k − Si k (θ))2 ,
(1)
i=1 k=1
where Si , Si (θ), θi , i = 1 : p, are the measured sampled concentration data, the concentration as
predicted by the model, and the parameter set, respectively, and where the sum is over the number
of samples i, and the number of species Sk . The above equation also contains a weighing factor W ,
which is generally chosen to normalise the above sum, i.e to make the contributions from the various
2
species be in the same range. In our case we have chosen, W = ( S 1 k ) . The model for the network
max
is in general described by a set of differential equations,
dS
= f (S, θ),
dt
(2)
where fi are generally nonlinear. The optimization task involves making an initial guess for θ, integrating
the above equations to obtain the solution to Si for all the species, obtain (Equ. 1), finding a rule
by which the next best guess can be obtained, so that the sum is minimized, and finally generate a
better estimate to θi (Mendes and Kell 1998). The rules are the optimization methods we will discuss
below. This process of changing the parameter, simulation of the model, and iterating these steps, is
computationally very expensive, however given the computational resources that are available nowadays,
this is not a major obstacle. Bio-chemical networks can vary in size, with hundreds of parameters, and
hence efficient methods must be developed for parameter estimation. We now describe briefly each
method that we have implemented in software. The methods fall into local and global search categories.
There are also deterministic searches, but here we will consider only stochastic searches (Moles et al
2003). It has also been argued that the simplex algorithm is semi-global, but here we categorise it as
local.
2.1
2.1.1
Local Searches
Levenberg-Marquardt(LM)
If we imagine the error surface () as a function of the parameters, then the LM method is a weighted
mixture between two types of searches (Marquardt 1963; Press et al 1992). The first is a gradient search,
and the other is the approximation of the error surface as a quadratic function of the parameters. In
gradient descent, the approach is to rapidly descend down the error surface in a direction opposite to
the local gradient, i.e. the direction of maximum change. The step size is set to be constant. For the
(k + 1) iteration, θi is changed according to,
θi k+1 = θi k − µd,
(3)
∂
where the gradient d = ∂θ
, and µ is the step size. If we approximate the error surface as a quadratic
function of the parameters, we can Taylor expand about θ0 ,
= 0 + (θ − θ 0 )d + (θ − θ 0 )H(θ − θ 0 )T ,
(4)
2
where the Hessian, Hij = 12 ∂θ∂ i θj , describes the curvature of the surface. The minima of the surface is
now found by differentiating the above expression with respect to θ, and setting the result to zero. We
therefore obtain the parameter value, in a single step, as,
θi k+1 = θi k − H−1 d.
1
(5)
Henceforth bold font lower case, e.g x represents a column vector, and bold font upper case, eg. X
represents a matrix
4
The basic approach for the LM method is to merge gradient descent with the quadratic approximation,
such that when the error surface is very steep, the gradient descent is chosen, and when the surface can
indeed be approximated by a quadratic, the above more exact method (i.e Eqn 5), can be used, see Fig
1.
Far from the Minimum
use Gradient Search
Global Minimum
Close to the Minimum use
Quadratic Approximation.
Figure 1: The figure shows how the Levenberg-Marquardt method blends gradient descent
with quadratic approximation.
The combined method can be described by the following equation (6),
θi k+1 = θi k − (H + diag(H)/µ)−1 d.
(6)
The main point to notice is that even if the algorithm is using gradient descent, the step size is no
longer not constant, but depends on the curvature of the error surface, and hence, it can take longer
steps in regions where the gradient is less, which is exactly what we would like.
The algorithm starts with using gradient descent, and if the error can be reduced, which means that
its successful, it decreases the step size µ, hence the quadratic approximation takes over. This process
is continued until the change in the reduces to a very small number. The LM method has been very
successful, but requires a good starting condition, or else it will converge to the closest local minimum.
k
i (θ)
The Hessian is approximated (Stortelder 1996) in terms of the sensitivities( ∂S∂θ
),
H=
M
k=1
Wk
∂Si k (θ) ∂Si k (θ)
,
∂θ
∂θ
(7)
where it is assumed that the higher order derivative terms cancel out, because the residual is assumed
to be random. Hence, the computation of the Hessian does not involve second derivatives, the first
derivative terms can be obtained by finite differences. The calculation of the Hessian is often used for
computing the confidence intervals.
2.1.2
Simplex
In all of the methods that we will subsequently discuss, we only need to compute the objective function,
and not their derivatives, unlike in the LM case. The simplex method, as described by Nelder-Mead,
5
(Nelder and Mead 1965) is a robust search method, in which the objective function, in our case , is
computed at several test points, and the test point with the highest value for , is replaced by another
point which has a lower value for . The replacement of the worst point involves some rules, which
we discuss below. In a parameter space of N dimensions, an N+1 dimensional geometrical object is
created, called as the simplex, with its vertices, initialized to some starting values. The N+1 vertices,
of the simplex are the points at which the objective function is evaluated. The simplex then evolves by
the following steps:
• The simplex reflects the worst point through the opposite face, to a new point.
• If the reflection above results in a better point, i.e. lower error, it is further stretched in that
direction (expansion).
• A contraction of the worst point towards the opposite face of the simplex.
• A contraction along all the faces towards the best point.
Reflection
Expansion
Contraction
Figure 2: A cartoon showing how a 2-D simplex changes its shape according to the rules
discussed, in section 2.1.2, and makes its way across the fitness landscape.
By successively evolving according to the above steps, the simplex slowly makes its way along the error
surface, see Fig 2. The shape of the simplex adapts to the landscape, by stretching and contracting.
In our experience, the simplex method is very successful, unless the initial starting point is a very poor
guess. The above steps can be made to converge, either when the simplex size converges to a very
small region, or when there is no significant improvement in the error from one iteration to the next.
2.2
2.2.1
Global Searches
Simulated Annealing
The simulated annealing method, derives its name from thermal physics, where the minimization of
, is equivalent to the way a system (such as a metal) reaches its lower state, as it is slowly cooled
(Kirkpatrick et al 1983). At a given temperature, the atoms of the metal collide with each other, thereby
keeping track of the total energy. As the temperature is slowly reduced, the atoms begin to form a
crystalline structure and eventually reach the minimum energy state. The important point is that the
metal has to be cooled slowly, or else, there would be pockets, where the metal is in a higher energy
state, than its neighboring regions. For optimization problems, the algorithm works in the following
way: Given an initial state i, which in our case would be a set of parameters, the system could jump to
another state j, with the Boltzmann probability,
exp
(i − j )
,
T
6
(8)
where T is the temperature. Hence the energy surface is likened to the thermal energy of the system
and there is therefore a probability for the system to jump to a higher energy state at a finite value
of T. If j < i , then this is accepted immediately, since in any case this corresponds to the system
finding its ground state. However more interesting is the first case, where states with higher energies
can be accessed, which implies that it is possible for the system to jump out from a local minima, and
eventually find its global minima, see Fig 3.
System can Jump to Higher
Energy States and Escape
Local Minima
Figure 3: A cartoon showing how the optimization procedure that uses simulated annealing
can jump out of a local minima by accepting steps of higher energy (higher value of )
At a given temperature, the system must be given enough time to sample all the configurations which are
accessible using the probability distribution given above. There is no simple way to design a temperature
scheduling (temperature as a function of time/iterations), several methods exist and the one that works
best depends on the problem at hand. We follow the algorithm as described in (Press et al 1992), in
which the authors consider an adaptation of the simplex method. To each vertex is added a positive,
logarithmically distributed random variable, mimicking a thermal energy kick.
T = − T log(z),
(9)
where the random number z = [0, 1] is uniformally distributed, and T , is the temperature. The vertex
with the highest value of the objective function, is then selected, and replaced. Once the selection is
made, we then subtract a similarly generated random thermal energy from the objective value of the
selected vertex. Thereafter, using the same rules as before (section 2.1.2) we evolve the simplex, by
a series of reflections, expansions and contractions. The addition and subsequent subtraction of the
thermal energy makes it possible for the system to accept larger objective values, and actually will
move uphill. However, it also allows the system to escape local minima. In the limit that T → 0,
the algorithm described above becomes the ordinary simplex algorithm described earlier. Hence in the
simulated annealing-simplex method, the simplex is made to walk on a temperature dependent surface,
which tends to smooth out the corrugations of the real energy surface below, at high temperature,
therefore allowing it to walk around freely (Torres et al 1997). At a given temperature, the simplex
algorithm is run for several iteration steps. As the temperature is reduced, the simplex begins to find
valleys, into which it rapidly climbs down. For the temperature scheduling, we monitored,
highest − lowest ,
7
(10)
and reduced T → 0.95 ∗ T , whenever this quantity increased by greater than a certain fraction from the
previous step. The greater the above quantity, the more probable that one of the vertices is poised on
the lower part of a valley, and hence by reducing the temperature, the simplex is allowed to explore that
part of the fitness landscape. At the same time, there is still the possibility that the simplex will jump
out of the local minima to continue its search elsewhere. In general the algorithm takes a long time to
find the minimum, with the maximum number of simulations, as compared to the other methods, since
at each temperature, there are several iterations for the simplex, and several temperature iterations.
The overall temperature is reduced at a very slow rate, T → 0.995 ∗ T , just in case the initial simplex
choice was inferior, and the simplex is unable to find any minima.
2.2.2
Genetic Algorithms
Genetic Algorithms are very popular since they have been very successful at optimization problems. GA’s
are special cases of evolutionary algorithms. They are motivated by real biological processes such as
selection, cross over and mutation. The Schema theorem of Holland (Goldberg 1989), addresses these
intuitive notions, and proves that these operations serve to increase the fitness of a population. In our
case we will consider real coded GA’s, where the ”gene” is a string of kinetic parameters, which are all
real and nonnegative. We start with a random population of individuals, and follow the steps as detailed
below for each generation. We monitor , which decreases as a function of the generation number.
General considerations show that the function of crossover, whereby two parent genes, exchange, genetic
material is crucial to preserve the parents best qualities up into successive generations. They also serve
to spread good mutations. This is all due to the fact that only the fittest members are crossed over, and
hence their progeny survive into the future generations. For each generation, we repeat the following
N
2 times,
• Selection: Randomly choose z members from the population, and then rank them according to
their fitness. This step is called as tournament selection, since z members are made to play a
tournament and the winners will be decided based on fitness. The best member always gets
selected, this is called as Elitism, and for the remaining members we use roulette selection. This
involves selection based on the fitness, i.e, the fittest individual is more likely to be selected.
Increasing z has the effect of increasing competition among the best individuals but also rules out
the possibility of a weaker individual getting into the population, and perhaps having some genes
which might be useful at a later stage of evolution. Hence z is an important parameter, and for
our examples, we found an optimal value to be 4. At the end of the selection step we have 2
parents.
• Crossover: The selected parents are crossed over (Herrara et al 1998), using an arithmetic mean
defined in the following way: Assuming we represent the parents as, θ1 = (θ1 1 , θ1 2 θ1 3 , ...) and
θ2 = (θ2 1 , θ2 2 θ2 3 , ...), the cross over generates two children β1 , β2 ,
β1 i = λi θ1 i + (1 − λi )θ2 i ,
(11)
β2 = λi θ2 + (1 − λi )θ1 ,
i
i
i
where λ is a uniform random number in −0.5, 1.5. Normally arithmetic cross-over maintains the
convexity property, but the rule defined above, allows a larger region of parameter space to be
explored, since the new points could lie outside the line joining the parents.
• Selection within each family: The best among the two parents and two children, always make it
to the next generation, however among the remaining 3, we apply the roulette selection.
8
• Mutation: having obtained two survivors from the previous step, for each survivor, one parameter
θi is randomly selected and changed according to,
θi = z θmax i ,
(12)
where, z = random [0, 1], is uniformly distributes, and θi max is the maximum possible value of
the i component of the parameter set.
Genetic Algorithm
Tournament
Selection
Cross
Over
Mutation
Select
Family
G2
G1
Hybrid Genetic Algorithm
Tournament
Selection
Cross
Over
Mutation
Select
Family
Local
Search
G2
G1
G3
Figure 4: Schematic figures to show how the GA and GA-simplex are implemented in terms
of the basic operations such as selection, crossover and mutation. G1:Initial Population,
G2:Final Population.
The above described steps are displayed as a schematic in Fig 4. The operations of cross-over and
mutation occur with certain probabilities, which are adjustable parameters. However the mutation
rate is generally a small number < 0.05. Mutations allow the system to explore new regions, whereas
crossovers spread these mutations over the population, thereby transferring information about interesting
regions in the fitness landscape. Hence if the mutation rate is very high, large regions will be explored,
but the members may not survive up into the next generation, since the search is much too exploratory,
and not enough information about the landscape has been exploited by the cross-overs. The fitness of
the best member in each generation is monitored, and if it turns out that there is little improvement in
the fitness, the computation is stopped, and the resulting optimized parameters are examined.
2.2.3
Global+Local search
Combining a local search within a global search algorithm, is a very attractive possibility (Yoshimura
et al 2003). The global search provides the initial seed point, which a local search would use to make
a further optimization. We have implemented the following algorithm. For each generation, we take
9
Optimizer
GA
SAsimplex
GAsimplex
Iterations Simulation time Simulations
500
2 min
4980
3.94
56
8 min
24959
0.152
29
4 min
11432
0.104
Table 1: Performance comparison for different global optimizers.
the best two members and use them as initial conditions, for a simplex search. The resulting fitter
members are replaced back into the population. Hence in every generation, two local searches are
performed. The second diagram in Fig 4, shows these steps. We will now discuss a typical test case
comparing the different optimizers. The model we will consider is a simple oscillator, which arises from
positive feedback (Heinrich 77). The model was simulated and random noise was added to the time
series to produce noisy data. Four parameters were fitted to the data for the same initial guesses for
the parameters, for each of the optimizers. We then compared the time taken to reach a good fit, the
number of simulations, and the value of , for the three global optimizers (the local optimizers, i.e.
the Levenberg-Marquardt and simplex, were unable to find a fit with the same initial conditions). The
data and the fitted curve (bold lines) for the best set of fitted parameters obtained, by using the hybrid
optimizer, are displayed in Fig 5.
Time series for data/simulation
4.5
4
Concentration
3.5
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
Time s
Figure 5: A plot comparing the simulation (bold lines), with the data (thin lines), for the
concentration time series, for two metabolites for the model as described in section 2.2.3.
One can see that the GA has the fewest simulations, whereas the simulated annealing takes a lot
longer. The hybrid search led to the best optimization, with a modest time, and although the number
of simulations are quite large, they are still smaller than the simulated annealing algorithm.
10
3
Parameter Estimation
A single run of a given optimization method, will estimate a set of parameters, and we may be satisfied
with the fit. However, it is very useful to know, by how much the parameter values vary, for similarly
produced data. Confidence limits describe variability of the value of the parameter, i.e the likelihood
for a parameter value to be found within a given confidence limit, for example 95% of the time, the
parameter can be found between ±δθ95% around the value θ. Consider a function y = f (x, θ), assume
that data has been provided to fit this function, and we wish to study the propagation of an error in the
parameter value θ, then a simple analysis shows (Bevington 1969), the variance in the output y is,
σ2y = σ2θ (
∂f 2
) ,
∂θ
(13)
where ∂f
∂θ , is the sensitivity. If we assume that the errors in the measurement are statistically independent
and Gaussian distributed, and equal variance (after scaling by the weights), then the least square is
identical to maximum likelihood estimation. We Taylor expand , in terms of the parameters to second
order in θ, and obtain,
(14)
E(δθδθT ) = σ 2 (2H)−1 ,
where H is the Hessian, and σ 2 = N −m
is the unbiased variance in the measurements, N − m, are
the number of independent dimensions, where N, m are the number of samples, and the number of
fitted parameters, respectively. It is known that σ2 is χ2 distributed with N − m degrees of freedom
(Bevington 1969). For a confidence level of 95%, it can be shown that the quoted limits θ0 ± δθ, would
be,
((2H)−1 )ii δθi = ±1.96
,
(15)
N −m
where the diagonal element of the covariance matrix (which is twice the inverse of the Hessian), give
the spread in the parameter values2 . However, very often, the confidence limits as computed above will
not always be accurate. For very nonlinear models, the approximation in Eqn 14 will not hold true,
and higher order terms in the Taylor series will be required. The above discussion is for the case when
the noise is Gaussian, which is, in most cases a reasonable assumption, but for non-Gaussian noise, we
will have to resort to other means to compute the confidence limits. Monte Carlo simulations are a
natural way to generate these confidence limits (Press et al 1992; Rawlings and Eckherdt 2002). Though
computationally expensive, they have an easy interpretation, and with current computational resources
available, this is not a major issue. Proceeding as follows; we select an initial parameter guess, and fit
the data to a model, eventually producing a new set of parameters, for which the fit seems to be good.
We now wish to study the distribution in the parameter space, had we fitted similar data, starting from
the same initial conditions. If we had knowledge about the probability distribution of residuals, i.e the
difference between the model and the data, we could create additional data sets by adding the noise.
We then rerun the optimization algorithm with the same initial guess for all these additional similar
data sets. Thus we can obtain a distribution of parameter values. We first discuss an example with
simulated data.
We consider a linear chain of irreversible unimolecular reactions, with mass-action kinetics,
A → B → C → D → E → F.
2
(16)
For the case where the measurement noise is Gaussian, and the model is linear, it can be shown that
the minimum variance in the parameter is equal to the square root of the inverse of the Fisher Information
matrix (which is the Hessian). The greater the information, the smaller is the uncertainty in the parameter,
implying tighter confidence bounds (Spall 2003)
11
Parameter Fits
12
Dotted Lines − DATA
Solid Lines − Simulation
Metabolite Concentrations
10
8
6
4
2
0
0
0.5
1
1.5
2
Time s
Figure 6: Plot of the simulated noisy time series concentration data, and the fitted curves
for a linear sequence of of reactions governed by irreversible mass-action kinetics.
J0 k
2.16 ± 0.05
J1 k
J2 k
J3 k
J4 k
1.99 ± 0.042 1.99 ± 0.083 1.96 ± 0.183 1.94 ± 0.199
Table 2: The table shows 95% confidence limits for the estimated parameters based on the
equation (15).
The noisy concentration data for the six metabolites, which is displayed in Fig 6, was simulated for all
the kinetic rate coefficients set to 2, and the initial concentration of the first substrate set to 10, and
all the others to zero.
The data in Fig. 6 has 100 points. The noise was assumed to be exponentially distributed and was
added to the simulated curves, and presented to the optimizer as the data for fitting the model. We
used the simplex (the Levenberg-Marquardt gives similar results) to fit the parameters to the data. The
parameters were the five rate constants, which were initialized to 0.1. The fit is shown in the same
figure (6). We then ran a Monte Carlo simulation, by generating the data (exponentially distributed
noise was added to the initial best fit), and for each such data set, the parameters were optimized to fit
the data. Confidence limits were obtained using Eqn 15, which gave for the parameters, the following
values,
In Fig 7 we show cluster plots for the distribution in the various parameters.
The confidence limits, can also be evaluated by choosing limits around the mean values of the parameters,
and making sure that 95%, of the points fall between them. The confidence limits set this way agree
12
2.1
2.05
2
J2_k
J1_k
2.05
1.95
1.9
2.1
2
1.95
2.15
J0_k
2.2
1.9
2.1
2.25
2.1
2.05
2.05
2
J3_k
J4_k
2.1
1.95
1.9
1.85
1.9
2.15
J0_k
2.2
2.25
2
1.95
1.9
1.95
2
2.05
1.85
1.9
2.1
J2_k
1.95
2
2.05
2.1
J1_k
Figure 7: Cluster plots for the distribution of fitted parameters for the Monte carlo simulation, for various parameter combinations.
13
ks
250.9225 ± 0.69
kr
kp
0.199 ± 0.266 35.04 ± 109.9
ki
kde
0.0469 ± 7.46 0.2292 ± 3.76
Table 3: The table shows 95% confidence limits for the estimated parameters using the
Hessian computed at the end of the optimization.
well with those computed using Eqn. 15. In addition the Hessian was found to be well behaved and no
significant correlation was found between the parameters (Fig. 7).
In the second example we fit rate constants to data obtained from a model for irreversible inhibition of
HIV proteinase (Kuzmic 1996, DynaFit 1991). The data that we analysed was obtained from (DynaFit
1991), comprised of two different time courses at different inhibitor concentrations. The parameters to
be optimized were five rate constants, and we used the initial guess values for the parameters, and the
initial substrate concentrations as described in (Kuzmic 1996)3 . The model is described by the following
equations,
M +M
E+S
SE
E+P
E+I
EI
k
a
kd
kon
ks
k
→r
kon
kp
kon
ki
kde
→
E
(17)
E
E
EP
EI
EJ
In Fig 8, we display the two data sets along with the fits.
We ran several several combinations of the optimizers, the simulated annealing and GA-simplex took
the maximum time, but of the two, the latter gave better results. We also considered the option of
running the GA, followed by running the simplex, and this too was very successful. As can be seen in
Fig 8, the fits seem to be quite satisfactory.
Using Eqn. 15 to compute the parameter confidence limits, we get, The confidence limits for the second
to the fifth variables are substantially greater than the mean values, which means that we cannot trust
these numbers (Muller et al 2002). We performed a Monte Carlo simulation by generating 500 data
sets, and obtaining statistics for the parameter values. Each data set uses the simulated time series
obtained after the first fitting procedure, and to this is added the residual noise. Since the nature of
the noise is not known(there could be several kinds of measurement errors, for which it may be difficult
to model for a probability distribution), we use the bootstrap technique (Efron and Tibshirini 1986).
The basic idea of the bootstrap is to create from the only available realization of the noise time series,
several other residual series. This is achieved by sampling with uniform probability, from the residual
3
In the example described above we chose the 4th and 5th data sets (Kuzmic 1996, curves D, E of Fig 1,
pg 264) ), and used initial values of kon = 100, kd = 0.0001, ka = 0.1, I = 0.004, E = 0.004, and S = 27.
In Kuzmic 1996, some of the species levels are also optimized, but here we are more interested in fitting the
parameters
14
Estimate for two time courses
0.2
Data
Simulation
0.18
Concentration of P
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
500
1000
1500
2000
2500
3000
3500
4000
Time s
Figure 8: The plot showing the two time series concentration data along with the fitted
curves.
ks
±0.31
kr
kp
±0.006 ±0.37
ki
±0.01
kde
±0.11
Table 4: The table shows 95% confidence limits for the estimated parameters, using MonteCarlo simulation.
noise, all N components, allowing for replacement. One allows for replacement, since it is assumed
that each component of the residual noise series is an independent and identically distributed random
number. In Fig 9 we display the cluster plots generated from 500 data sets for some combinations of
the parameters.
From the cluster plots, we can establish simple 95% confidence limits as shown in Table 3.
These limits differ significantly from the limits computed using the Hessian (see Table 2). Although
computationally intensive, the Monte Carlo is necessary to check on the real confidence interval estimation. As indicated by the arrows, in Fig 9, there is significant correlation between these parameters.
This implies that certain combinations of parameters could change, having no effect on the concentration time series. This is also known as the observability problem. For example, consider a simple
Michaelis-Menton set of reactions,
S+E ES → P + E,
(18)
where the reversible step between the substrate and complex are k1 , k−1 , and the forward reaction
rate between complex and product is k2 . In terms of these basic mass action reactions we derive the
Michaelis-Menton rate, by assuming that after a very brief transient time the complex forms and after
that it remains constant. Under these assumptions, the rate between S and P is,
Vmax S
,
Km + S
(19)
1
. Notice that if k2 is kept
where Vmax = k2 e0 , e0 being the enzyme concentration, and Km = k−1k+k
2
constant but k1 , k−1 are changed such that Km remains the same, then the net rate does not change.
15
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0.18
0.2
kr
0.22
0
0
0.24
37
kde
kde
0.8
0.05
ki 0.1
0.15
0.2
0.12
0.1
36
ki
kp
0.08
0.06
35
0.04
34
249
250
ks 251
0.02
0.18
252
0.2
kr
0.22
0.24
Figure 9: Cluster plots for the parameter distributions showing significant correlations for
some combinations.
This is obvious, since we are only changing the final amount of complex generated, not the product.
Hence presented with time series data from such a simple model, we will notice a correlation between
the spread in the k1 , k−1 values. This is generally true for larger models, but it may not be possible
to find simple combinations of these parameters which are truly independent. The important point is
that the cluster plots show observability of the parameters. It is then a simple step to quantify this, by
making the observation that the 2-D cluster plots are sections of the m × m, (where m is the number of
parameters) probability distribution of the fitted parameters. The eigenvectors of the Hessian (inverse
of the covariance matrix), corresponding to the lowest eigenvalues, are directions along which, if the
parameters change, then no significant change in the sums of squares, results. The eigenvalues of the
Hessian are, 665, 0.51, 0.0076, 10−5 , 10−5 . It is interesting to study the eigenvectors corresponding
to the lowest eigenvalues, they are,
⎡
⎢
⎢
⎢
0 =⎢
⎢
⎣
⎤
−0.0019 0.0777
−0.0024 −0.0028 ⎥
⎥
⎥
0.9999
0.0043 ⎥
⎥
−0.0102 0.9006 ⎦
0.0118
0.4277
(20)
Notice that the first eigenvector implies that there is freedom to change kp , and for the second eigenvector, the combination of ki , kde , which is seen as correlated in the second subplot in Fig 9. Calculating
the Hessian is useful since its eigenstructure can often be used to study degeneracy in the model.
Software Implementation
The core optimization methods described in the previous section were implemented using Matlab. In
order to make the routines more accessible and flexible we also provided a software link to the Systems
Biology Workbench (Sauro et al, 2003) or SBW and hence also to BioSPICE.
16
The SBW is a software infrastructure that permits applications written in different languages to communicate with each other. SBW uses fast binary transfer of data and remote procedure calls via sockets. A
number of language bindings exist, including C/C++, Java, Delphi, Python, Perl and Matlab. The Matlab bindings in particular permit other software applications to access Matlab code. Matlab functions
are compiled into software libraries (DLL or so files) using the Matlab C compiler. These libraries are
then used by a SBW Matlab chaperon application which makes and manages the connection between
the Matlab functions and SBW. The advantage of this approach is that a client need not have Matlab
installed in order for this to work, they merely need to install the SBWMatlab chaperon (available at
www.sys-bio.org).
There are a growing number of other modules which can connect to SBW. Of particular interest here
are model editing tools such as JDesigner (Sauro et. al 2003) and simulation engines such as Jarnac
(Sauro, 2000). In addition to these, we have also developed two additional modules, an optimization GUI
controller and a data manager. All five modules form a complete optimization toolkit. Note that both
the model entry and simulation engine are plugable modules and can be replaced with other equivalent
modules. The purpose of SBW is to encourage the development of a range of reusable modules which
can be plugged together to form more complex analysis suites.
The logical and physical arrangement of the toolkit is illustrated in Fig 10, and Fig 11, below.
Model Editor
SBW
Optimization
Methods
Optimization
Controller
Data Manager
Simulator
Figure 10: Physical Relationship of Modules to SBW
Model Editor
Simulator
Optimization
Controller
Experimental
Data
Optimization
Methods
Data Manager
Figure 11: Logical Layout of Modules
17
Figure 12: Front screen shot of the optimizer controller showing the options to choose
parameter initial guesses, the optimizer, configuration for the optimizers (i.e setting up the
optimization parameters) and the the graphical display of the sum of squares.
The central focus of the toolkit for the user is the optimization controller shown in Fig 12.
The function of the optimization controller is to manage the model and experimental data and to initiate
the optimization process.
Each optimizer is coded in a separate Matlab library, of which five have been developed as described
in the previous section. One feature of SBW is that it permits similar modules to be categorized into
function groups. This permits another module to load all modules of the same category regardless of
how the modules are implemented. The optimizer modules represent such a grouping, each optimization
module has the same interface but implements a different optimization method. However, this does not
address the problem that each optimizer has a different set of options with which to tailer a particular
method. As a result the optimizer modules can, when called with a particular option, return to the caller
an XML string which specifies what kind of optimizer is implemented and what options are available for
the user to adjust. This information is used by the optimizer controller to build at runtime GUI form
from which the user can adjust any optimization options prior to calling the optimizer itself. Details of
the format of the XML string and interface are detailed in the documentation which accompanies the
optimization software.
The operation of the optimizer involves three stages, loading the model from the current model editor,
loading the experimental data which will be used to fit the model and finally selecting the optimizer
method and initializing the parameter values. The user may also decided to fit only a subset of parameters. Once these tasks have be set, the optimization is started. A graph panel on the controller is
updated in realtime to indicate the progress of the optimization. Finally, when an optimum has been
located, the user may plot the fitted model against the experimental data to compare the fit, see Fig 13
The standard error for each parameter as defined by the Hessian is also returned as well as the number of
iterations carried out and the number of simulations that were necessary to compute the final parameter
set. If the model is highly nonlinear the user may also choose to carry out a Monte-Carlo simulation
and optimization to gain better estimates for uncertainty in the estimated parameters and to determine
the observability of the parameters (see Fig. 9)
18
Figure 13: Background shot of the optimizer controller showing the experimental data loader
and graphical interface, where both the experimental and fitted plots can be displayed.
Documentation and a tutorial on how to use the optimization kit are provided in the download at
www.sys-bio.org.
4
Summary
We have developed an optimization module for the BioSPICE project, which fits kinetic rate constants
to a model for which time series concentration data is available. The module uses several local and
global search algorithms such as Leveberg-Marquardt, Simplex, Simulated annealing and real coded
Genetic algorithms, as well as Hybrid searches, Genetic algorithm- simplex. We discussed the individual
algorithms and the software components from which the module was constructed. We commented on the
issue of confidence limits for the fitted parameters, and the use of Monte Carlo simulations to obtain
them using the bootstrap method. Our experience with the various methods used for optimization,
suggest that the simplex is very robust and fast, and should be the first choice for fitting models.
However, should it fail, then we recommend the Hybrid, GA-simplex method.
For the future, we plan to implement other algorithms such as evolutionary algorithms, since some recent
work has shown that these are very successful for finding the optimum, for large models. We also plan
to generalize our optimization objectives, so that arbitrary objective functions can be specified such as
flux maximization through pathways, and stability of the network.
5
Acknowledgements
Support for VSC, CW and HMS was received via a grant awarded from the DARPA/IPTO BioCOMP
program, contract number MIPR #03-M296-01. HMS received additional support from the Keck
Graduate Institute. We wish to acknowledge the BioSPICE team at SRI and Berkeley for their invaluable
19
assistance in enabling BioSPICE/SBW integration.
References
Bevington. P. R, 1969. Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill.
Covert M. W. et al, 2001. Metabolic modeling in microbial strains in silico, TRENDS in Biochemical
Sciences, 26:179-186.
Dynafit 1991. www.biokin.com/dynafit/index.html
Efron. B, Tibshirini. R, 1986. Bootstrap methods for Standard error, Confidence Intervals, and other
Measures of Statistical Accuracy. Statistical Science, 1:54-77.
Goryanin I., Hodgman T. C, Selkov E. Mathematical simulation and analysis of cellular metabolism and
regulation. Bioinformatics. 1999 Sep;15(9):749-58.
Ewens W. J, Grant G. R, 2001. Statistical Methods in Bioinformatics: An Introduction, SpringerVerlag, New York Inc
Goldberg. D. E, 1989. Genetic Algorithms in Search, Optimization, and machine Learning. AddisonWesley.
Herrara. F, Lozano. M, Verdegay. J.C 1998. Tackling Real-Coded GA’s:Operator and Tools for
behavioural Analysis. Artificial Intelligence Review, 12:265-319.
Kirkpatrick. S, Gelatt. C. D, Vecchi. M. P, 1983. Optimization by Simulated Annealing, Science,
220:671-680.
Kuzmic. P. 1996, Program Dynafit for the analysis of enzyme kinetic data: Application to HIV protinase.
Analytical Biochemistry, 237:260-273.
Marquardt D. W, 1963. An algorithm for least-squares estimation of nonlinear parameters, Journal of
the society for industrial and applied mathematics, 11:431-441.
Mendes P. 1993. Gepasi:A software package for modelling the dynamics, steady states and control of
biochemical and other systems. comput. appl. Biosci. 9:563-571.
Mendes. P, Kell. D, 1998. Nonlinear optimization of biochemical pathways: Applications to metabolic
engineering and parameter estimation. Bioinformatics, 14:869-883.
Moles. C. G, Mendes. P, Banga. J R, 2003. Parameter Estimation in Global pathways: A Comparison
of Global Optimization Methods. Genome Research, 13:2467-2474.
Muller. T. G, Noykova. N, Gyllenberg. M, Timmer. J, 2002. Parameter Identification in Dynamical
Models of Anaerobic waste water treatment. Mathematical Biosciences, [177, 178]:147-160.
Nelder. J. A, Mead. R, 1965. A Simplex Method for Function Maximization. Comput. J, 7:308-313.
Papain. J. A et al, 2003. Metabolic pathways in the post-genome era, TRENDS in Biochemical
Sciences, 28:250-258.
Press. W, Teukolsky. S, Vetterling. W. T, Flannery. B. P, 1992. Numerical Recipes in C, Cambridge
University press
Sauro, H. M. (2000) JARNAC: a system for interactive metabolic analysis. In Animating the Cellular
Map (Hofmeyr, J.-H. S., Rohwer, J. M. & Snoep, J. L., eds), pp. 221228. Stellenbosch University
Press, Stellenbosch.
Sauro H. M 2003. Next Generation Simulation Tools: The Systems Biology Workbench and BioSPICE
20
Integration, OMICS 7:355:371.
Stortelder. W.J.H, 1996. Parameter Estimation in Chemical Engineeering: A Case Study for Resin
Production, Technical report, CWI, Amsterdam, Netherlands. NM-R9610; ISSN 0619-0388:1-16.
Spall. J. C, 2003. Monte-carlo based Computation of the Fisher-Information matrix in Nonstandard
Settings, Proc. Amer. Control. Conf. Denver, Colorado, June4-6, 3797-3802.
Rawlings. J. B, Ekerdt. J. S,2002. Chemical Reactor Analysis and Design Fundamentals, Nob-Hill
publishing, Madison, Wisconsin.
Torres. F. M, Agichtein. E, Grinberg. L, Yu. G, Topper. R. Q, 1997. A note on the application of the
Boltzmann Simplex-Simulated Annealing algorithm to Global Optimization of argon and water clusters.
J. Mol. Struc, 419:85-95.
Voit E. B. 2000. Computational Analysis of Biochemical Systems, Cambridge University Press.
Yoshimura. J, Shimonobou. T, Sekiguchi. T, Okamoto. M. 2003. Development of the parameterfitting Module for web-based biochemical reactor simulator Best-Kit. Chem-Bio Informatics Journal,
3:114-129.
21
Far from the Minimum
use Gradient Search
Global Minimum
Close to the Minimum use
Quadratic Approximation.
22
Reflection
Expansion
23
Contraction
System can Jump to Higher
Energy States and Escape
Local Minima
24
Genetic Algorithm
Tournament
Selection
Cross
Over
Mutation
Select
Family
G2
G1
Hybrid Genetic Algorithm
Tournament
Selection
Cross
Over
Mutation
Select
Family
Local
Search
G2
G1
25
G3
Time series for data/simulation
4.5
4
Concentration
3.5
3
2.5
2
1.5
1
0.5
0
0
0.5
1
Time s
26
1.5
2
Parameter Fits
12
Dotted Lines − DATA
Solid Lines − Simulation
Metabolite Concentrations
10
8
6
4
2
0
0
0.5
1
Time s
27
1.5
2
2.1
2.05
2
J2_k
J1_k
2.05
1.95
1.9
2.1
2
1.95
2.15
J0_k
2.2
1.9
2.1
2.25
2.1
2.05
2.05
2
J3_k
J4_k
2.1
1.95
1.9
1.85
1.9
2.15
J0_k
2.2
2.25
2
1.95
1.9
1.95
2
2.05
2.1
J2_k
1.85
1.9
1.95
2
J1_k
28
2.05
2.1
Estimate for two time courses
0.2
Data
Simulation
0.18
Concentration of P
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
500
1000
1500
2000
Time s
29
2500
3000
3500
4000
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0.18
0.2
kr
0.22
0.24
37
0
0
kde
kde
0.8
0.05
ki 0.1
0.15
0.2
0.12
0.1
36
ki
k
p
0.08
0.06
35
0.04
34
249
250
ks 251
252
30
0.02
0.18
0.2
kr
0.22
0.24
Model Editor
SBW
Optimization
Methods
Optimization
Controller
Data Manager
Simulator
31
Model Editor
Simulator
Optimization
Controller
Experimental
Data
Optimization
Methods
Data Manager
32
33
34
Download