Model Fitting Tools for BioSPICE and the Systems Biology Workbench Vijay S. Chickarmane1∗, Cameron Wellock1 ,Herbert M. Sauro1,2 1 Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA 2 Control and Dynamical Systems 107-81, California Institute of Technology, CA 91125, USA May 19, 2004 ∗ Address for Correspondence: Vijay S Chickarmane, Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, USA, Phone: (909) 607 0101 Fax: (909) 607 8086 e-mail: vchickar@kgi.edu 1 Abstract As a contribution to the BioSPICE project, we have developed an optimization module, for the task of fitting kinetic rate constants to time series concentration data. The algorithms use, both local searches, such as the Levenberg-Marquardt, and simplex, as well as global search methods such as simulated annealing and real coded genetic algorithms. We provide a description of the modules in terms of the algorithms used, and the software components. We describe a few test cases which serve the purpose of demonstrating the software. We comment on determination of confidence limits, and issues related to observability of the fitted parameters. Keywords: Optimization, Systems Biology, Software, Model Fitting 2 1 Introduction The development of computational models of biochemical networks is becoming an important part of modern research in both the commercial and academic settings. The task of building a model is at minimum a two stage process. First the network of reactions and interactions must be known, such information can be gleaned from traditional sources or by more recent genomic approaches (Papin et al 2003; Covert et al 2001). Secondly once the network is described, rates laws and associated rate constants must be assigned. This process is arguably the most difficult, since not only must a decision be made on the type of rate law for each reaction, but values for all the rate constants must also be determined. The later problem is the subject of this paper. The traditional approach to parameter determination is to fit a model to a set of time-series data. Such data will describe the time course of concentrations of molecular species in the network (Voit 2000). However such data is also inherently noisy, usually as a result of instrumentation or experimental error. The problem of estimating parameter values is not easy and depends on availability of suitable data, the level of noise in the data and of course the quality of the underling network model. Indeed, parameter estimation may also be used as a means to compare different network models to determine which is most likely. Several existing software applications already exist that can be used to fit time-series concentration data to network models (Kuzmic 1991, Mendes 1993, Goryanin et. al. 1999, Yoshimura, 2003). However, they are either closed source, commercially related or web based. We wanted to create an open source freely available toolkit for parameter estimation. In addition we wanted the software to fit into the extensible computational platforms, BioSPICE and SBW (Sauro et. al. 2003). Both projects aim to provide a computational platform for the simulation/and analysis of biochemical systems. Our aim was to permit users to choose between different optimization methods, namely, local searches, which can be used when the user has a good idea about the initial conditions, and global searches, when no such initial guess can be made. In addition, we wanted built-in support for estimating parameter uncertainty and observability, a feature often missing form optimizations packages. The paper is arranged as follows: In Section 2, we discuss the basic methodology of the optimization techniques and the individual algorithms used. Although these methods will be familiar to a lot of readers, we hope to make this discussion self-contained. Parameter estimation is discussed in Section 3, where our methods use both simulated data as well as real data. We also describe Monte Carlo simulations for the purposes of determining confidence intervals and parameter observability. In Section 4 we discuss the software components, which were used in building the optimization toolkit, and provide some screen shots of the user interface and implementation details. We summarize our conclusions in Section 5. 2 Optimization Techniques Assuming we are presented with noisy time-series data on the concentration of molecular species in a network, our goal is to fit the data to a given model such that the fit with the data and best prediction of the model is minimized. Hence we have to find the set of parameters, so as to increase the likelihood of observing the given data. If we assume that the residual, i.e. the difference between the model and the data, is Gaussian distributed, then this naturally leads to a maximum likelihood analysis of the parameter estimation problem, which implies that, the likelihood is maximized if the sums of squares of the error between the model and the data is minimized (Press et al 1992; see for example Ewens and 3 Grant 2001). The sums of squares is represented by1 , (θ) = N M Wk (Si k − Si k (θ))2 , (1) i=1 k=1 where Si , Si (θ), θi , i = 1 : p, are the measured sampled concentration data, the concentration as predicted by the model, and the parameter set, respectively, and where the sum is over the number of samples i, and the number of species Sk . The above equation also contains a weighing factor W , which is generally chosen to normalise the above sum, i.e to make the contributions from the various 2 species be in the same range. In our case we have chosen, W = ( S 1 k ) . The model for the network max is in general described by a set of differential equations, dS = f (S, θ), dt (2) where fi are generally nonlinear. The optimization task involves making an initial guess for θ, integrating the above equations to obtain the solution to Si for all the species, obtain (Equ. 1), finding a rule by which the next best guess can be obtained, so that the sum is minimized, and finally generate a better estimate to θi (Mendes and Kell 1998). The rules are the optimization methods we will discuss below. This process of changing the parameter, simulation of the model, and iterating these steps, is computationally very expensive, however given the computational resources that are available nowadays, this is not a major obstacle. Bio-chemical networks can vary in size, with hundreds of parameters, and hence efficient methods must be developed for parameter estimation. We now describe briefly each method that we have implemented in software. The methods fall into local and global search categories. There are also deterministic searches, but here we will consider only stochastic searches (Moles et al 2003). It has also been argued that the simplex algorithm is semi-global, but here we categorise it as local. 2.1 2.1.1 Local Searches Levenberg-Marquardt(LM) If we imagine the error surface () as a function of the parameters, then the LM method is a weighted mixture between two types of searches (Marquardt 1963; Press et al 1992). The first is a gradient search, and the other is the approximation of the error surface as a quadratic function of the parameters. In gradient descent, the approach is to rapidly descend down the error surface in a direction opposite to the local gradient, i.e. the direction of maximum change. The step size is set to be constant. For the (k + 1) iteration, θi is changed according to, θi k+1 = θi k − µd, (3) ∂ where the gradient d = ∂θ , and µ is the step size. If we approximate the error surface as a quadratic function of the parameters, we can Taylor expand about θ0 , = 0 + (θ − θ 0 )d + (θ − θ 0 )H(θ − θ 0 )T , (4) 2 where the Hessian, Hij = 12 ∂θ∂ i θj , describes the curvature of the surface. The minima of the surface is now found by differentiating the above expression with respect to θ, and setting the result to zero. We therefore obtain the parameter value, in a single step, as, θi k+1 = θi k − H−1 d. 1 (5) Henceforth bold font lower case, e.g x represents a column vector, and bold font upper case, eg. X represents a matrix 4 The basic approach for the LM method is to merge gradient descent with the quadratic approximation, such that when the error surface is very steep, the gradient descent is chosen, and when the surface can indeed be approximated by a quadratic, the above more exact method (i.e Eqn 5), can be used, see Fig 1. Far from the Minimum use Gradient Search Global Minimum Close to the Minimum use Quadratic Approximation. Figure 1: The figure shows how the Levenberg-Marquardt method blends gradient descent with quadratic approximation. The combined method can be described by the following equation (6), θi k+1 = θi k − (H + diag(H)/µ)−1 d. (6) The main point to notice is that even if the algorithm is using gradient descent, the step size is no longer not constant, but depends on the curvature of the error surface, and hence, it can take longer steps in regions where the gradient is less, which is exactly what we would like. The algorithm starts with using gradient descent, and if the error can be reduced, which means that its successful, it decreases the step size µ, hence the quadratic approximation takes over. This process is continued until the change in the reduces to a very small number. The LM method has been very successful, but requires a good starting condition, or else it will converge to the closest local minimum. k i (θ) The Hessian is approximated (Stortelder 1996) in terms of the sensitivities( ∂S∂θ ), H= M k=1 Wk ∂Si k (θ) ∂Si k (θ) , ∂θ ∂θ (7) where it is assumed that the higher order derivative terms cancel out, because the residual is assumed to be random. Hence, the computation of the Hessian does not involve second derivatives, the first derivative terms can be obtained by finite differences. The calculation of the Hessian is often used for computing the confidence intervals. 2.1.2 Simplex In all of the methods that we will subsequently discuss, we only need to compute the objective function, and not their derivatives, unlike in the LM case. The simplex method, as described by Nelder-Mead, 5 (Nelder and Mead 1965) is a robust search method, in which the objective function, in our case , is computed at several test points, and the test point with the highest value for , is replaced by another point which has a lower value for . The replacement of the worst point involves some rules, which we discuss below. In a parameter space of N dimensions, an N+1 dimensional geometrical object is created, called as the simplex, with its vertices, initialized to some starting values. The N+1 vertices, of the simplex are the points at which the objective function is evaluated. The simplex then evolves by the following steps: • The simplex reflects the worst point through the opposite face, to a new point. • If the reflection above results in a better point, i.e. lower error, it is further stretched in that direction (expansion). • A contraction of the worst point towards the opposite face of the simplex. • A contraction along all the faces towards the best point. Reflection Expansion Contraction Figure 2: A cartoon showing how a 2-D simplex changes its shape according to the rules discussed, in section 2.1.2, and makes its way across the fitness landscape. By successively evolving according to the above steps, the simplex slowly makes its way along the error surface, see Fig 2. The shape of the simplex adapts to the landscape, by stretching and contracting. In our experience, the simplex method is very successful, unless the initial starting point is a very poor guess. The above steps can be made to converge, either when the simplex size converges to a very small region, or when there is no significant improvement in the error from one iteration to the next. 2.2 2.2.1 Global Searches Simulated Annealing The simulated annealing method, derives its name from thermal physics, where the minimization of , is equivalent to the way a system (such as a metal) reaches its lower state, as it is slowly cooled (Kirkpatrick et al 1983). At a given temperature, the atoms of the metal collide with each other, thereby keeping track of the total energy. As the temperature is slowly reduced, the atoms begin to form a crystalline structure and eventually reach the minimum energy state. The important point is that the metal has to be cooled slowly, or else, there would be pockets, where the metal is in a higher energy state, than its neighboring regions. For optimization problems, the algorithm works in the following way: Given an initial state i, which in our case would be a set of parameters, the system could jump to another state j, with the Boltzmann probability, exp (i − j ) , T 6 (8) where T is the temperature. Hence the energy surface is likened to the thermal energy of the system and there is therefore a probability for the system to jump to a higher energy state at a finite value of T. If j < i , then this is accepted immediately, since in any case this corresponds to the system finding its ground state. However more interesting is the first case, where states with higher energies can be accessed, which implies that it is possible for the system to jump out from a local minima, and eventually find its global minima, see Fig 3. System can Jump to Higher Energy States and Escape Local Minima Figure 3: A cartoon showing how the optimization procedure that uses simulated annealing can jump out of a local minima by accepting steps of higher energy (higher value of ) At a given temperature, the system must be given enough time to sample all the configurations which are accessible using the probability distribution given above. There is no simple way to design a temperature scheduling (temperature as a function of time/iterations), several methods exist and the one that works best depends on the problem at hand. We follow the algorithm as described in (Press et al 1992), in which the authors consider an adaptation of the simplex method. To each vertex is added a positive, logarithmically distributed random variable, mimicking a thermal energy kick. T = − T log(z), (9) where the random number z = [0, 1] is uniformally distributed, and T , is the temperature. The vertex with the highest value of the objective function, is then selected, and replaced. Once the selection is made, we then subtract a similarly generated random thermal energy from the objective value of the selected vertex. Thereafter, using the same rules as before (section 2.1.2) we evolve the simplex, by a series of reflections, expansions and contractions. The addition and subsequent subtraction of the thermal energy makes it possible for the system to accept larger objective values, and actually will move uphill. However, it also allows the system to escape local minima. In the limit that T → 0, the algorithm described above becomes the ordinary simplex algorithm described earlier. Hence in the simulated annealing-simplex method, the simplex is made to walk on a temperature dependent surface, which tends to smooth out the corrugations of the real energy surface below, at high temperature, therefore allowing it to walk around freely (Torres et al 1997). At a given temperature, the simplex algorithm is run for several iteration steps. As the temperature is reduced, the simplex begins to find valleys, into which it rapidly climbs down. For the temperature scheduling, we monitored, highest − lowest , 7 (10) and reduced T → 0.95 ∗ T , whenever this quantity increased by greater than a certain fraction from the previous step. The greater the above quantity, the more probable that one of the vertices is poised on the lower part of a valley, and hence by reducing the temperature, the simplex is allowed to explore that part of the fitness landscape. At the same time, there is still the possibility that the simplex will jump out of the local minima to continue its search elsewhere. In general the algorithm takes a long time to find the minimum, with the maximum number of simulations, as compared to the other methods, since at each temperature, there are several iterations for the simplex, and several temperature iterations. The overall temperature is reduced at a very slow rate, T → 0.995 ∗ T , just in case the initial simplex choice was inferior, and the simplex is unable to find any minima. 2.2.2 Genetic Algorithms Genetic Algorithms are very popular since they have been very successful at optimization problems. GA’s are special cases of evolutionary algorithms. They are motivated by real biological processes such as selection, cross over and mutation. The Schema theorem of Holland (Goldberg 1989), addresses these intuitive notions, and proves that these operations serve to increase the fitness of a population. In our case we will consider real coded GA’s, where the ”gene” is a string of kinetic parameters, which are all real and nonnegative. We start with a random population of individuals, and follow the steps as detailed below for each generation. We monitor , which decreases as a function of the generation number. General considerations show that the function of crossover, whereby two parent genes, exchange, genetic material is crucial to preserve the parents best qualities up into successive generations. They also serve to spread good mutations. This is all due to the fact that only the fittest members are crossed over, and hence their progeny survive into the future generations. For each generation, we repeat the following N 2 times, • Selection: Randomly choose z members from the population, and then rank them according to their fitness. This step is called as tournament selection, since z members are made to play a tournament and the winners will be decided based on fitness. The best member always gets selected, this is called as Elitism, and for the remaining members we use roulette selection. This involves selection based on the fitness, i.e, the fittest individual is more likely to be selected. Increasing z has the effect of increasing competition among the best individuals but also rules out the possibility of a weaker individual getting into the population, and perhaps having some genes which might be useful at a later stage of evolution. Hence z is an important parameter, and for our examples, we found an optimal value to be 4. At the end of the selection step we have 2 parents. • Crossover: The selected parents are crossed over (Herrara et al 1998), using an arithmetic mean defined in the following way: Assuming we represent the parents as, θ1 = (θ1 1 , θ1 2 θ1 3 , ...) and θ2 = (θ2 1 , θ2 2 θ2 3 , ...), the cross over generates two children β1 , β2 , β1 i = λi θ1 i + (1 − λi )θ2 i , (11) β2 = λi θ2 + (1 − λi )θ1 , i i i where λ is a uniform random number in −0.5, 1.5. Normally arithmetic cross-over maintains the convexity property, but the rule defined above, allows a larger region of parameter space to be explored, since the new points could lie outside the line joining the parents. • Selection within each family: The best among the two parents and two children, always make it to the next generation, however among the remaining 3, we apply the roulette selection. 8 • Mutation: having obtained two survivors from the previous step, for each survivor, one parameter θi is randomly selected and changed according to, θi = z θmax i , (12) where, z = random [0, 1], is uniformly distributes, and θi max is the maximum possible value of the i component of the parameter set. Genetic Algorithm Tournament Selection Cross Over Mutation Select Family G2 G1 Hybrid Genetic Algorithm Tournament Selection Cross Over Mutation Select Family Local Search G2 G1 G3 Figure 4: Schematic figures to show how the GA and GA-simplex are implemented in terms of the basic operations such as selection, crossover and mutation. G1:Initial Population, G2:Final Population. The above described steps are displayed as a schematic in Fig 4. The operations of cross-over and mutation occur with certain probabilities, which are adjustable parameters. However the mutation rate is generally a small number < 0.05. Mutations allow the system to explore new regions, whereas crossovers spread these mutations over the population, thereby transferring information about interesting regions in the fitness landscape. Hence if the mutation rate is very high, large regions will be explored, but the members may not survive up into the next generation, since the search is much too exploratory, and not enough information about the landscape has been exploited by the cross-overs. The fitness of the best member in each generation is monitored, and if it turns out that there is little improvement in the fitness, the computation is stopped, and the resulting optimized parameters are examined. 2.2.3 Global+Local search Combining a local search within a global search algorithm, is a very attractive possibility (Yoshimura et al 2003). The global search provides the initial seed point, which a local search would use to make a further optimization. We have implemented the following algorithm. For each generation, we take 9 Optimizer GA SAsimplex GAsimplex Iterations Simulation time Simulations 500 2 min 4980 3.94 56 8 min 24959 0.152 29 4 min 11432 0.104 Table 1: Performance comparison for different global optimizers. the best two members and use them as initial conditions, for a simplex search. The resulting fitter members are replaced back into the population. Hence in every generation, two local searches are performed. The second diagram in Fig 4, shows these steps. We will now discuss a typical test case comparing the different optimizers. The model we will consider is a simple oscillator, which arises from positive feedback (Heinrich 77). The model was simulated and random noise was added to the time series to produce noisy data. Four parameters were fitted to the data for the same initial guesses for the parameters, for each of the optimizers. We then compared the time taken to reach a good fit, the number of simulations, and the value of , for the three global optimizers (the local optimizers, i.e. the Levenberg-Marquardt and simplex, were unable to find a fit with the same initial conditions). The data and the fitted curve (bold lines) for the best set of fitted parameters obtained, by using the hybrid optimizer, are displayed in Fig 5. Time series for data/simulation 4.5 4 Concentration 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 1.5 2 Time s Figure 5: A plot comparing the simulation (bold lines), with the data (thin lines), for the concentration time series, for two metabolites for the model as described in section 2.2.3. One can see that the GA has the fewest simulations, whereas the simulated annealing takes a lot longer. The hybrid search led to the best optimization, with a modest time, and although the number of simulations are quite large, they are still smaller than the simulated annealing algorithm. 10 3 Parameter Estimation A single run of a given optimization method, will estimate a set of parameters, and we may be satisfied with the fit. However, it is very useful to know, by how much the parameter values vary, for similarly produced data. Confidence limits describe variability of the value of the parameter, i.e the likelihood for a parameter value to be found within a given confidence limit, for example 95% of the time, the parameter can be found between ±δθ95% around the value θ. Consider a function y = f (x, θ), assume that data has been provided to fit this function, and we wish to study the propagation of an error in the parameter value θ, then a simple analysis shows (Bevington 1969), the variance in the output y is, σ2y = σ2θ ( ∂f 2 ) , ∂θ (13) where ∂f ∂θ , is the sensitivity. If we assume that the errors in the measurement are statistically independent and Gaussian distributed, and equal variance (after scaling by the weights), then the least square is identical to maximum likelihood estimation. We Taylor expand , in terms of the parameters to second order in θ, and obtain, (14) E(δθδθT ) = σ 2 (2H)−1 , where H is the Hessian, and σ 2 = N −m is the unbiased variance in the measurements, N − m, are the number of independent dimensions, where N, m are the number of samples, and the number of fitted parameters, respectively. It is known that σ2 is χ2 distributed with N − m degrees of freedom (Bevington 1969). For a confidence level of 95%, it can be shown that the quoted limits θ0 ± δθ, would be, ((2H)−1 )ii δθi = ±1.96 , (15) N −m where the diagonal element of the covariance matrix (which is twice the inverse of the Hessian), give the spread in the parameter values2 . However, very often, the confidence limits as computed above will not always be accurate. For very nonlinear models, the approximation in Eqn 14 will not hold true, and higher order terms in the Taylor series will be required. The above discussion is for the case when the noise is Gaussian, which is, in most cases a reasonable assumption, but for non-Gaussian noise, we will have to resort to other means to compute the confidence limits. Monte Carlo simulations are a natural way to generate these confidence limits (Press et al 1992; Rawlings and Eckherdt 2002). Though computationally expensive, they have an easy interpretation, and with current computational resources available, this is not a major issue. Proceeding as follows; we select an initial parameter guess, and fit the data to a model, eventually producing a new set of parameters, for which the fit seems to be good. We now wish to study the distribution in the parameter space, had we fitted similar data, starting from the same initial conditions. If we had knowledge about the probability distribution of residuals, i.e the difference between the model and the data, we could create additional data sets by adding the noise. We then rerun the optimization algorithm with the same initial guess for all these additional similar data sets. Thus we can obtain a distribution of parameter values. We first discuss an example with simulated data. We consider a linear chain of irreversible unimolecular reactions, with mass-action kinetics, A → B → C → D → E → F. 2 (16) For the case where the measurement noise is Gaussian, and the model is linear, it can be shown that the minimum variance in the parameter is equal to the square root of the inverse of the Fisher Information matrix (which is the Hessian). The greater the information, the smaller is the uncertainty in the parameter, implying tighter confidence bounds (Spall 2003) 11 Parameter Fits 12 Dotted Lines − DATA Solid Lines − Simulation Metabolite Concentrations 10 8 6 4 2 0 0 0.5 1 1.5 2 Time s Figure 6: Plot of the simulated noisy time series concentration data, and the fitted curves for a linear sequence of of reactions governed by irreversible mass-action kinetics. J0 k 2.16 ± 0.05 J1 k J2 k J3 k J4 k 1.99 ± 0.042 1.99 ± 0.083 1.96 ± 0.183 1.94 ± 0.199 Table 2: The table shows 95% confidence limits for the estimated parameters based on the equation (15). The noisy concentration data for the six metabolites, which is displayed in Fig 6, was simulated for all the kinetic rate coefficients set to 2, and the initial concentration of the first substrate set to 10, and all the others to zero. The data in Fig. 6 has 100 points. The noise was assumed to be exponentially distributed and was added to the simulated curves, and presented to the optimizer as the data for fitting the model. We used the simplex (the Levenberg-Marquardt gives similar results) to fit the parameters to the data. The parameters were the five rate constants, which were initialized to 0.1. The fit is shown in the same figure (6). We then ran a Monte Carlo simulation, by generating the data (exponentially distributed noise was added to the initial best fit), and for each such data set, the parameters were optimized to fit the data. Confidence limits were obtained using Eqn 15, which gave for the parameters, the following values, In Fig 7 we show cluster plots for the distribution in the various parameters. The confidence limits, can also be evaluated by choosing limits around the mean values of the parameters, and making sure that 95%, of the points fall between them. The confidence limits set this way agree 12 2.1 2.05 2 J2_k J1_k 2.05 1.95 1.9 2.1 2 1.95 2.15 J0_k 2.2 1.9 2.1 2.25 2.1 2.05 2.05 2 J3_k J4_k 2.1 1.95 1.9 1.85 1.9 2.15 J0_k 2.2 2.25 2 1.95 1.9 1.95 2 2.05 1.85 1.9 2.1 J2_k 1.95 2 2.05 2.1 J1_k Figure 7: Cluster plots for the distribution of fitted parameters for the Monte carlo simulation, for various parameter combinations. 13 ks 250.9225 ± 0.69 kr kp 0.199 ± 0.266 35.04 ± 109.9 ki kde 0.0469 ± 7.46 0.2292 ± 3.76 Table 3: The table shows 95% confidence limits for the estimated parameters using the Hessian computed at the end of the optimization. well with those computed using Eqn. 15. In addition the Hessian was found to be well behaved and no significant correlation was found between the parameters (Fig. 7). In the second example we fit rate constants to data obtained from a model for irreversible inhibition of HIV proteinase (Kuzmic 1996, DynaFit 1991). The data that we analysed was obtained from (DynaFit 1991), comprised of two different time courses at different inhibitor concentrations. The parameters to be optimized were five rate constants, and we used the initial guess values for the parameters, and the initial substrate concentrations as described in (Kuzmic 1996)3 . The model is described by the following equations, M +M E+S SE E+P E+I EI k a kd kon ks k →r kon kp kon ki kde → E (17) E E EP EI EJ In Fig 8, we display the two data sets along with the fits. We ran several several combinations of the optimizers, the simulated annealing and GA-simplex took the maximum time, but of the two, the latter gave better results. We also considered the option of running the GA, followed by running the simplex, and this too was very successful. As can be seen in Fig 8, the fits seem to be quite satisfactory. Using Eqn. 15 to compute the parameter confidence limits, we get, The confidence limits for the second to the fifth variables are substantially greater than the mean values, which means that we cannot trust these numbers (Muller et al 2002). We performed a Monte Carlo simulation by generating 500 data sets, and obtaining statistics for the parameter values. Each data set uses the simulated time series obtained after the first fitting procedure, and to this is added the residual noise. Since the nature of the noise is not known(there could be several kinds of measurement errors, for which it may be difficult to model for a probability distribution), we use the bootstrap technique (Efron and Tibshirini 1986). The basic idea of the bootstrap is to create from the only available realization of the noise time series, several other residual series. This is achieved by sampling with uniform probability, from the residual 3 In the example described above we chose the 4th and 5th data sets (Kuzmic 1996, curves D, E of Fig 1, pg 264) ), and used initial values of kon = 100, kd = 0.0001, ka = 0.1, I = 0.004, E = 0.004, and S = 27. In Kuzmic 1996, some of the species levels are also optimized, but here we are more interested in fitting the parameters 14 Estimate for two time courses 0.2 Data Simulation 0.18 Concentration of P 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 500 1000 1500 2000 2500 3000 3500 4000 Time s Figure 8: The plot showing the two time series concentration data along with the fitted curves. ks ±0.31 kr kp ±0.006 ±0.37 ki ±0.01 kde ±0.11 Table 4: The table shows 95% confidence limits for the estimated parameters, using MonteCarlo simulation. noise, all N components, allowing for replacement. One allows for replacement, since it is assumed that each component of the residual noise series is an independent and identically distributed random number. In Fig 9 we display the cluster plots generated from 500 data sets for some combinations of the parameters. From the cluster plots, we can establish simple 95% confidence limits as shown in Table 3. These limits differ significantly from the limits computed using the Hessian (see Table 2). Although computationally intensive, the Monte Carlo is necessary to check on the real confidence interval estimation. As indicated by the arrows, in Fig 9, there is significant correlation between these parameters. This implies that certain combinations of parameters could change, having no effect on the concentration time series. This is also known as the observability problem. For example, consider a simple Michaelis-Menton set of reactions, S+E ES → P + E, (18) where the reversible step between the substrate and complex are k1 , k−1 , and the forward reaction rate between complex and product is k2 . In terms of these basic mass action reactions we derive the Michaelis-Menton rate, by assuming that after a very brief transient time the complex forms and after that it remains constant. Under these assumptions, the rate between S and P is, Vmax S , Km + S (19) 1 . Notice that if k2 is kept where Vmax = k2 e0 , e0 being the enzyme concentration, and Km = k−1k+k 2 constant but k1 , k−1 are changed such that Km remains the same, then the net rate does not change. 15 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0.18 0.2 kr 0.22 0 0 0.24 37 kde kde 0.8 0.05 ki 0.1 0.15 0.2 0.12 0.1 36 ki kp 0.08 0.06 35 0.04 34 249 250 ks 251 0.02 0.18 252 0.2 kr 0.22 0.24 Figure 9: Cluster plots for the parameter distributions showing significant correlations for some combinations. This is obvious, since we are only changing the final amount of complex generated, not the product. Hence presented with time series data from such a simple model, we will notice a correlation between the spread in the k1 , k−1 values. This is generally true for larger models, but it may not be possible to find simple combinations of these parameters which are truly independent. The important point is that the cluster plots show observability of the parameters. It is then a simple step to quantify this, by making the observation that the 2-D cluster plots are sections of the m × m, (where m is the number of parameters) probability distribution of the fitted parameters. The eigenvectors of the Hessian (inverse of the covariance matrix), corresponding to the lowest eigenvalues, are directions along which, if the parameters change, then no significant change in the sums of squares, results. The eigenvalues of the Hessian are, 665, 0.51, 0.0076, 10−5 , 10−5 . It is interesting to study the eigenvectors corresponding to the lowest eigenvalues, they are, ⎡ ⎢ ⎢ ⎢ 0 =⎢ ⎢ ⎣ ⎤ −0.0019 0.0777 −0.0024 −0.0028 ⎥ ⎥ ⎥ 0.9999 0.0043 ⎥ ⎥ −0.0102 0.9006 ⎦ 0.0118 0.4277 (20) Notice that the first eigenvector implies that there is freedom to change kp , and for the second eigenvector, the combination of ki , kde , which is seen as correlated in the second subplot in Fig 9. Calculating the Hessian is useful since its eigenstructure can often be used to study degeneracy in the model. Software Implementation The core optimization methods described in the previous section were implemented using Matlab. In order to make the routines more accessible and flexible we also provided a software link to the Systems Biology Workbench (Sauro et al, 2003) or SBW and hence also to BioSPICE. 16 The SBW is a software infrastructure that permits applications written in different languages to communicate with each other. SBW uses fast binary transfer of data and remote procedure calls via sockets. A number of language bindings exist, including C/C++, Java, Delphi, Python, Perl and Matlab. The Matlab bindings in particular permit other software applications to access Matlab code. Matlab functions are compiled into software libraries (DLL or so files) using the Matlab C compiler. These libraries are then used by a SBW Matlab chaperon application which makes and manages the connection between the Matlab functions and SBW. The advantage of this approach is that a client need not have Matlab installed in order for this to work, they merely need to install the SBWMatlab chaperon (available at www.sys-bio.org). There are a growing number of other modules which can connect to SBW. Of particular interest here are model editing tools such as JDesigner (Sauro et. al 2003) and simulation engines such as Jarnac (Sauro, 2000). In addition to these, we have also developed two additional modules, an optimization GUI controller and a data manager. All five modules form a complete optimization toolkit. Note that both the model entry and simulation engine are plugable modules and can be replaced with other equivalent modules. The purpose of SBW is to encourage the development of a range of reusable modules which can be plugged together to form more complex analysis suites. The logical and physical arrangement of the toolkit is illustrated in Fig 10, and Fig 11, below. Model Editor SBW Optimization Methods Optimization Controller Data Manager Simulator Figure 10: Physical Relationship of Modules to SBW Model Editor Simulator Optimization Controller Experimental Data Optimization Methods Data Manager Figure 11: Logical Layout of Modules 17 Figure 12: Front screen shot of the optimizer controller showing the options to choose parameter initial guesses, the optimizer, configuration for the optimizers (i.e setting up the optimization parameters) and the the graphical display of the sum of squares. The central focus of the toolkit for the user is the optimization controller shown in Fig 12. The function of the optimization controller is to manage the model and experimental data and to initiate the optimization process. Each optimizer is coded in a separate Matlab library, of which five have been developed as described in the previous section. One feature of SBW is that it permits similar modules to be categorized into function groups. This permits another module to load all modules of the same category regardless of how the modules are implemented. The optimizer modules represent such a grouping, each optimization module has the same interface but implements a different optimization method. However, this does not address the problem that each optimizer has a different set of options with which to tailer a particular method. As a result the optimizer modules can, when called with a particular option, return to the caller an XML string which specifies what kind of optimizer is implemented and what options are available for the user to adjust. This information is used by the optimizer controller to build at runtime GUI form from which the user can adjust any optimization options prior to calling the optimizer itself. Details of the format of the XML string and interface are detailed in the documentation which accompanies the optimization software. The operation of the optimizer involves three stages, loading the model from the current model editor, loading the experimental data which will be used to fit the model and finally selecting the optimizer method and initializing the parameter values. The user may also decided to fit only a subset of parameters. Once these tasks have be set, the optimization is started. A graph panel on the controller is updated in realtime to indicate the progress of the optimization. Finally, when an optimum has been located, the user may plot the fitted model against the experimental data to compare the fit, see Fig 13 The standard error for each parameter as defined by the Hessian is also returned as well as the number of iterations carried out and the number of simulations that were necessary to compute the final parameter set. If the model is highly nonlinear the user may also choose to carry out a Monte-Carlo simulation and optimization to gain better estimates for uncertainty in the estimated parameters and to determine the observability of the parameters (see Fig. 9) 18 Figure 13: Background shot of the optimizer controller showing the experimental data loader and graphical interface, where both the experimental and fitted plots can be displayed. Documentation and a tutorial on how to use the optimization kit are provided in the download at www.sys-bio.org. 4 Summary We have developed an optimization module for the BioSPICE project, which fits kinetic rate constants to a model for which time series concentration data is available. The module uses several local and global search algorithms such as Leveberg-Marquardt, Simplex, Simulated annealing and real coded Genetic algorithms, as well as Hybrid searches, Genetic algorithm- simplex. We discussed the individual algorithms and the software components from which the module was constructed. We commented on the issue of confidence limits for the fitted parameters, and the use of Monte Carlo simulations to obtain them using the bootstrap method. Our experience with the various methods used for optimization, suggest that the simplex is very robust and fast, and should be the first choice for fitting models. However, should it fail, then we recommend the Hybrid, GA-simplex method. For the future, we plan to implement other algorithms such as evolutionary algorithms, since some recent work has shown that these are very successful for finding the optimum, for large models. We also plan to generalize our optimization objectives, so that arbitrary objective functions can be specified such as flux maximization through pathways, and stability of the network. 5 Acknowledgements Support for VSC, CW and HMS was received via a grant awarded from the DARPA/IPTO BioCOMP program, contract number MIPR #03-M296-01. HMS received additional support from the Keck Graduate Institute. We wish to acknowledge the BioSPICE team at SRI and Berkeley for their invaluable 19 assistance in enabling BioSPICE/SBW integration. References Bevington. P. R, 1969. Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill. Covert M. W. et al, 2001. Metabolic modeling in microbial strains in silico, TRENDS in Biochemical Sciences, 26:179-186. Dynafit 1991. www.biokin.com/dynafit/index.html Efron. B, Tibshirini. R, 1986. Bootstrap methods for Standard error, Confidence Intervals, and other Measures of Statistical Accuracy. Statistical Science, 1:54-77. Goryanin I., Hodgman T. C, Selkov E. Mathematical simulation and analysis of cellular metabolism and regulation. Bioinformatics. 1999 Sep;15(9):749-58. Ewens W. J, Grant G. R, 2001. Statistical Methods in Bioinformatics: An Introduction, SpringerVerlag, New York Inc Goldberg. D. E, 1989. Genetic Algorithms in Search, Optimization, and machine Learning. AddisonWesley. Herrara. F, Lozano. M, Verdegay. J.C 1998. Tackling Real-Coded GA’s:Operator and Tools for behavioural Analysis. Artificial Intelligence Review, 12:265-319. Kirkpatrick. S, Gelatt. C. D, Vecchi. M. P, 1983. Optimization by Simulated Annealing, Science, 220:671-680. Kuzmic. P. 1996, Program Dynafit for the analysis of enzyme kinetic data: Application to HIV protinase. Analytical Biochemistry, 237:260-273. Marquardt D. W, 1963. An algorithm for least-squares estimation of nonlinear parameters, Journal of the society for industrial and applied mathematics, 11:431-441. Mendes P. 1993. Gepasi:A software package for modelling the dynamics, steady states and control of biochemical and other systems. comput. appl. Biosci. 9:563-571. Mendes. P, Kell. D, 1998. Nonlinear optimization of biochemical pathways: Applications to metabolic engineering and parameter estimation. Bioinformatics, 14:869-883. Moles. C. G, Mendes. P, Banga. J R, 2003. Parameter Estimation in Global pathways: A Comparison of Global Optimization Methods. Genome Research, 13:2467-2474. Muller. T. G, Noykova. N, Gyllenberg. M, Timmer. J, 2002. Parameter Identification in Dynamical Models of Anaerobic waste water treatment. Mathematical Biosciences, [177, 178]:147-160. Nelder. J. A, Mead. R, 1965. A Simplex Method for Function Maximization. Comput. J, 7:308-313. Papain. J. A et al, 2003. Metabolic pathways in the post-genome era, TRENDS in Biochemical Sciences, 28:250-258. Press. W, Teukolsky. S, Vetterling. W. T, Flannery. B. P, 1992. Numerical Recipes in C, Cambridge University press Sauro, H. M. (2000) JARNAC: a system for interactive metabolic analysis. In Animating the Cellular Map (Hofmeyr, J.-H. S., Rohwer, J. M. & Snoep, J. L., eds), pp. 221228. Stellenbosch University Press, Stellenbosch. Sauro H. M 2003. Next Generation Simulation Tools: The Systems Biology Workbench and BioSPICE 20 Integration, OMICS 7:355:371. Stortelder. W.J.H, 1996. Parameter Estimation in Chemical Engineeering: A Case Study for Resin Production, Technical report, CWI, Amsterdam, Netherlands. NM-R9610; ISSN 0619-0388:1-16. Spall. J. C, 2003. Monte-carlo based Computation of the Fisher-Information matrix in Nonstandard Settings, Proc. Amer. Control. Conf. Denver, Colorado, June4-6, 3797-3802. Rawlings. J. B, Ekerdt. J. S,2002. Chemical Reactor Analysis and Design Fundamentals, Nob-Hill publishing, Madison, Wisconsin. Torres. F. M, Agichtein. E, Grinberg. L, Yu. G, Topper. R. Q, 1997. A note on the application of the Boltzmann Simplex-Simulated Annealing algorithm to Global Optimization of argon and water clusters. J. Mol. Struc, 419:85-95. Voit E. B. 2000. Computational Analysis of Biochemical Systems, Cambridge University Press. Yoshimura. J, Shimonobou. T, Sekiguchi. T, Okamoto. M. 2003. Development of the parameterfitting Module for web-based biochemical reactor simulator Best-Kit. Chem-Bio Informatics Journal, 3:114-129. 21 Far from the Minimum use Gradient Search Global Minimum Close to the Minimum use Quadratic Approximation. 22 Reflection Expansion 23 Contraction System can Jump to Higher Energy States and Escape Local Minima 24 Genetic Algorithm Tournament Selection Cross Over Mutation Select Family G2 G1 Hybrid Genetic Algorithm Tournament Selection Cross Over Mutation Select Family Local Search G2 G1 25 G3 Time series for data/simulation 4.5 4 Concentration 3.5 3 2.5 2 1.5 1 0.5 0 0 0.5 1 Time s 26 1.5 2 Parameter Fits 12 Dotted Lines − DATA Solid Lines − Simulation Metabolite Concentrations 10 8 6 4 2 0 0 0.5 1 Time s 27 1.5 2 2.1 2.05 2 J2_k J1_k 2.05 1.95 1.9 2.1 2 1.95 2.15 J0_k 2.2 1.9 2.1 2.25 2.1 2.05 2.05 2 J3_k J4_k 2.1 1.95 1.9 1.85 1.9 2.15 J0_k 2.2 2.25 2 1.95 1.9 1.95 2 2.05 2.1 J2_k 1.85 1.9 1.95 2 J1_k 28 2.05 2.1 Estimate for two time courses 0.2 Data Simulation 0.18 Concentration of P 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 500 1000 1500 2000 Time s 29 2500 3000 3500 4000 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0.18 0.2 kr 0.22 0.24 37 0 0 kde kde 0.8 0.05 ki 0.1 0.15 0.2 0.12 0.1 36 ki k p 0.08 0.06 35 0.04 34 249 250 ks 251 252 30 0.02 0.18 0.2 kr 0.22 0.24 Model Editor SBW Optimization Methods Optimization Controller Data Manager Simulator 31 Model Editor Simulator Optimization Controller Experimental Data Optimization Methods Data Manager 32 33 34