Systems & Control Letters 58 (2009) 529–539 Contents lists available at ScienceDirect Systems & Control Letters journal homepage: www.elsevier.com/locate/sysconle Identification and validation of quasispecies models for biological systems Paola Falugi a , Laura Giarré b,∗ a DEEE, Imperial College, London, UK b DIAS, Università di Palermo, Palermo, Italy article info Article history: Received 27 May 2008 Received in revised form 23 February 2009 Accepted 10 March 2009 Available online 8 April 2009 Keywords: Systems biology Identification Validation Set membership a b s t r a c t An identification procedure for biological systems cast as quasispecies models is proposed. Their identification is a challenging problem because of the bilinear dependence on the parameters and their physical constraints. The proposed solution is within the framework of set membership identification. We determine an estimate of the model parameters together with their interval of variability (Uncertainty Intervals), taking into account all the physical constraints. Invalidation/validation is performed on the basis of the predictive capability of the estimated models. The effectiveness of the proposed procedure has been illustrated by means of simulation experiments. © 2009 Elsevier B.V. All rights reserved. 1. Introduction The complexity of biology needs quantitative tools in order to support and validate biologists’ intuition and more traditional qualitative descriptions. It has been shown that evolution of molecules can be described in terms of chemical reaction network. A chemical reaction network is a collection of chemical species together with a list of reactions. Such a process can be approximated by a continuous-time deterministic model in the form of an ordinary differential equation. These reactions naturally lead under the assumption of the so-called mass-action kinetics to quadratic differential equations. In this paper we consider the well-known quasispecies models of evolutionary dynamics [1–4] that has been used in different contexts, such as population genetics [3,5] (see also [6], where the quasispecies models have been used to predict the existence of genetic chromosomal instability in relation to cancer) and autocatalytic reaction networks [1,7]. These models which arise in the biological sciences are, from a mathematical point of view, nonlinear positive systems of differential equations, subject to some global conservation constraint. Positive nonlinear systems satisfy some properties and constraints: if the initial conditions are positive, the state evolution takes place in the positive orthant. The importance of positive systems, from a practical point of view, stands in the intrinsic nature of many real systems to be ∗ Corresponding author. E-mail addresses: paola.falugi@gmail.com (P. Falugi), giarre@ieee.org, giarre@unipa.it (L. Giarré). 0167-6911/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.sysconle.2009.03.005 described by means of positive quantities, for instance most of the biological systems can be modeled in this way. We propose a method for the identification of the fitness and the replication probability parameters of a genetic sequence, subject to a set of stringent constraints to have physical meaning and to guarantee positiveness. The aim is to estimate such models from collected data taking into account structural and experimental identifiability. Moreover, the principal goal is to evaluate the model quality of the estimated quasispecies models in the framework of set membership identification, [8,9] by taking into account physical constraints on the variables, expressed in terms of model parameter relations. We recall that any models need to be validated, i.e. examined if it is good enough or not to its intended purpose. The data can sometimes invalidate the model. In our contest this corresponds to checking if a feasible set is empty. When this set is empty, then some of the a priori information are wrong, or some of the assumptions on the system are invalidated by the data. Only if the data are not invalidating the set, then it is possible to evaluate the quality of the identified model. The quality depends on its purpose, and can be measured according to it. The behavior of the system and the identified model need to be compared. The comparison is based on some appropriate output features. Any mismatch between system and model output should be analyzed, having in mind that a model is an approximation of the system, [10]. Differently from validation in control area, in system biology the validation consists in bearing out that all physical mechanisms involved have been characterized. In this paper we present only in silico experimental tests. We must point out that in many works the quasispecies models have been used starting from in vitro data, and we recall hereafter some 530 P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 of them, [11–14]. The importance of the quasispecies models in connection with real experiments can be related for example to characterize the HIV dynamics. Then, it is important to perform a parameter estimation that otherwise is hard from actual biological systems, in order to develop models for prediction purpose. The paper is structured as follows. In Section 2, we explain in detail the considered biological nonlinear systems modeled as quasispecies, recast in a concise way such that the properties and the constraints can be easily handled. In Section 3, we introduce and give a solution to the identification and validation problems. The procedure to estimate the model parameters and their uncertainty intervals is determined together with the inversion procedure to get the physical parameters. Model quality evaluation and validation/invalidation procedure is presented. In Section 4 two simulation examples are described. Finally, in Section 5, some concluding remarks are outlined. In the Appendix the obtained algorithms solving the problems are summarized. P.2 The system is positive: starting from any initial condition x(0) ≥ 0, it holds for the corresponding solution x(t ) ≥ 0 ∀t > 0. Properties P.1, P.2 mean that the unitary simplex Σ = {x : 1T x = 1, xi ≥ 0}, where 1T denotes a row-vector of ones, is invariant. The following Lemma establish an useful connection between the assumptions on the model parameters and the properties satisfied by the trajectories. Lemma 1. Under the assumptions A.1, A.2 the unitary simplex Σ is positively invariant if and only if 1T Q = 1T . Proof. Let us assume that Σ is positively invariant. The property 1T x(t ) = 1 ∀t ≥ 0 implies 1T ẋ = 0 ∀t ≥ 0. Hence, by (2) and exploiting 1T x(t ) = 1 we obtain: n X 2. Quasispecies models ẋi = i=1 The Quasispecies Equation (see [15]) describes the dynamics of the concentration of self-replicative components xi s, with i ∈ {1, 2, . . . n} ẋi = n X ! X xj fj qij − j =1 fj xj xi (1) j where n ∈ N denotes the number of species. The model parameters are fi , the replication rate of the ith component also called the fitness, and qij , the probability that replication of component j gives component i as offspring. Since the quantities xi s are the relative concentrations, the second term in Eq. (1) has been introduced to Pn ensure the total normalization of the concentration i=1 xi = 1. Notice that the model exhibits a bilinear dependence in the parameters fi and qij . In the literature the quasispecies model has been used extensively to describe evolution of certain self-replicating entities in different contexts: (i) Population genetics [3,5] where xi denotes the relative frequencies of alleles at the time of mating. (ii) Autocatalytic reaction networks [1,7] where xi are the concentrations of molecules, RNA or DNA, which are capable of self-replications. (iii) Population language learning: see [16,17]. Here xi are the relative abundance of individuals who use a specific grammar. Using a vector notation, the quasispecies model can be recast in the following form 4 ẋ = (Q diag(f ) − (f T x)I )x = h(x) (2) where the state variable is x = [x1 . . . xn ] , the fitness vector is f = [f1 f2 . . . fn ]T and the mutation matrix is Q = [qij ] (see [18]). Notice that for perfect copying accuracy Q equals the identity matrix. Mutations give rise to the off-diagonal elements in qij . T 2.1. Model parameterization Due to the physical meaning of the biological system under consideration, the model parameters satisfy the following conditions: A.1 qij ∈ [0, 1] A.2 fP i > 0 for any i n A.3 qij = 1 for any j. Notice that the trajectories generated by Eq. (2), under assumptions A.1, A.2, A.3, need to enjoy the following properties which are consistent with physical intuition: P.1 Whenever P i xi (0) = 1 then P i xi (t ) = 1, ∀t > 0. (3) = n X fj xj n X qij − n X j =1 i =1 i=1 n X n X ! j =1 fj xj qij − 1 ! xi = 0, ∀t ≥ 0, ∀x ∈ Σ . (4) i =1 Under assumptions A.1 and A.2, recalling in particular that fj xj > 0 for some x ∈ Σ , the equality in (4) implies 1T Q = 1T since x ∈ Σ is arbitrary. Conversely, 1T Q = 1T implies 1T ẋ = 0, ∀x ∈ Σ . Under assumptions A.1, P A.2, notice that from Eq. (2) the property P.2 n holds. Then, since i=1 ẋi = 0 and x(t ) ≥ 0 for all t > 0, whenever x(0) ∈ Σ , x(t ) ∈ Σ for all t > 0. 3. Identification and validation Despite the fact that the identification methodologies are well established in many application fields, their use in the parameter estimation of the evolutionary systems is quite rare. Most of the literature in this area deals with the modeling of the systems, without a rigorous validation and/or data-based parameter estimation. In some applications, the solution is based on statistical approaches like the Maximum Likelihood Principle (see [19] for a recent survey), or the parameter estimation based on time-series (i.e. [20]). On the other hand, if we consider the literature on identification of positive systems there are some results for linear systems. In this case the positive systems are compartmental. Some results are based on statistical approaches [21] with solution based on the Maximum Likelihood Principle, other results are based on the interval literature, [22]. For nonlinear positive systems (that are not compartmental), not much can be found in the identification literature. Almost all these contributions assume a statistical description of the noise and are mainly devoted to point estimation while little attention has been devoted to the computation of confidence regions for the parameter estimates although they are important for the assessment of the model quality. Conversely, the assumption of Unknown But Bounded (U.B.B.) noise (see [8] for an extensive survey) naturally rises the issue of computing the Feasible Parameter Set (FPS), which provides the uncertainty regions for the parameter estimates. Notice that the exact computation of the FPS for nonlinear systems is in general a difficult task. Model validation is a classical problem in control theory and identification. In particular, [23] has shown that for some experimental data, it is not possible to confirm whether the model is really valid; however, one can conclude whether the model is not contradicted by the given data. Model (in)validation tests are usually based on the difference between the simulated and measured P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 output and some statistics about these differences. Many statistical [24] and deterministic methods [23] have been studied for invalidating models, see [25] for nonlinear systems. Recently, some validation methodology have been applied with success to biological systems, [26], where the predictive capability of the model has been used to validate the model. Along the lines of [9,27], we first validate the a priori assumptions on the system, and then, if the actual data explain the system, we check the quality of the identified model. This quality evaluation is performed on the basis of the prediction error. Roughly speaking, letting S be the generic system to be identified, we define K as the set of all the feasible systems subject to the constraints K = {S : P.1, P.2, A.1 and A.2 are satisfied}. (5) This set of constraints represents the a priori information on the system. Assuming as a priori information on the error term that the bounded error noise e affecting the system belongs to a given set e ∈ B hi (x) of the continuous-time model (2). Indeed, denoting as ed the absolute value of the discretization error, the following relations hold edi (t ) = |xi (t + T ) − (xi (t ) + Thi (x(t )))| Z t +T = (hi (x(t + s)) − hi (x(t )))ds t Z t +T hi |x(t + s) − x(t )|∞ ds ≤ hi T . ≤ The last inequality is met since each state variable is constrained in the range [0, 1]. Whenever the bounds hmax such that maxi |hi (x(s))| ≤ hmax are known the subsequent estimation can be performed t +T Z hi |x(t + s) − x(t )|∞ ds t s Z max |hi (x(τ ))|dτ ds i 0 t +T hi hmax sds ≤ hi hmax ≤ T2 t 2 . (12) Notice that the estimated bounds of the discretization error converge to zero as T goes to zero. Here, the noise sequence e is only supposed to be bounded. In particular, a priori information on the approximation, discretization, measurements and model error need to be assumed. Considering an upper bound on v , taking into account also the error due to the discretization, the bounds in (12), |v|∞ ≤ v , and an upper bound on d, |d|∞ ≤ d , we get |d̃| ≤ g d , and the following hold: E.1 The error term is U.B.B.: |e|∞ = max max |ei (t )| ≤ (g + 1)d + v ≤ . i 3.1. A priori information on the noise The model in (2) is formed by n = 2ν continuoustime differential equations. Assume that we discretize it with a standard first-order Euler approximation with sampling time T , the discrete-time model can be described by yi (t ) = xi (t ) + di (t ). hi Z where F (·) is some map which captures the dependence of the measured output Y from the system S. The a priori information is considered validated if the set FSS is nonempty. Note that being the a priori information consistent with the present data does not exclude that they may be invalidated by future data. If the FSS is empty, the prior assumptions on system and noise are invalidated by data and need to be changed. If the FSS is not empty, an infinite number of models belonging to the FSS can be estimated. Among them, a model quality evaluation based on the predictive capability of the model need to be performed. xi (t + 1) = gi (x(t )) + vi (t ) for i = 1, . . . , n t +T Z ≤ t (6) t (13) where g = max gi . i Moreover, in order to guarantee the invariance of the simplex Σ , the error term is such that: E.2 X (7) ei = 0. (14) i In this approximation we consider that the system can be affected by various sources of noise: d(t ): measurement noise, v(t ): discretization and model error. Moreover, we can recast Eq. (7) as: E.3 yi (t + 1) = gi (y(t )) + ei (t ) Then the overall information a priori on the noise is: (8) 0 ≤ gi (y(t )) + ei (t ) ≤ 1. (15) e ∈ B = {e : E.1, E.2 and E.3 are satisfied}. where the overall error term affecting the system is ei (t ) = vi (t ) + di (t + 1) + d̃i (t ) (11) t it is possible to define the Feasible System Set as FSS = {S ∈ K : Y = F (S ) + e with e ∈ B }, 531 (9) and the term d̃i (t ) = gi (y(t )) − gi (y(t ) − d(t )) is related to the rate of variation of the nonlinear function. It must be noted that in any identification method no finite bound on the inference error can be guaranteed, unless some assumptions are made on the function g and the noise e. The typical approach in the literature is to assume a given functional form for g (linear, bilinear, etc.) and statistical models on the noise sequence. Here, the function g is known to be polynomial, so weaker assumptions are taken on its rate of variation. Prior assumptions on gi , i = 1, . . . , n: |gi (w(t )) − gi (w(t − 1))| ≤ gi |w(t ) − w(t − 1)|; (10) where gi is the Lipschitz constant. Notice that, since each gi is a smooth function of the state (gi ∈ C ∞ ), the estimation of gi is a well-posed problem. As far as the discretization error is concerned it is possible to estimate an upper bound depending on the sampling time T and the Lipschitz constant hi of the function (16) We note, once again, that the only a priori information that is needed is the knowledge of the upper bound in (13) and the verification that the noise belongs to the set B in (16). 3.2. Model overparameterization Notice that the model (2) is overparameterized as shown in the following Lemma. Namely there exist different sets of parameter values giving rise to identical trajectories. In particular, f is defined up to simultaneous translation of its entries. Lemma 2. Let Q̃ and f˜ satisfy f˜ = f + λ1 (17) f˜i > 0 ∀i Q̃ = [Q diag(f ) + λI ][diag(f ) + λI ] (18) −1 (19) 532 P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 for some real number λ, and x̃(t ) denotes a solution of Eq. (2) where f and Q are replaced by f˜ and Q̃ . Then, provided that x̃(0) = x(0), the two systems admit the same solution, viz.: x̃(t ) = x(t ) ∀t ≥ 0. (20) Proof. First, notice that by assumption (18), [diag(f )+λI ] is indeed an invertible matrix. In order to prove the identity in (20) it is enough to show that x̃˙ (t ) = ẋ(t ) ∀t ≥ 0. In particular, ∀z ∈ Σ , using relations (17)–(19) we obtain (Q̃ diag(f˜ ) − (f˜ T z )I )z = (Q diag(f ) + λI − ((f T + λ1T )z )I )z = (Q diag(f ) + λI − (f T z )I − λI )z = (Q diag(f ) − (f T z )I )z which proves the claims. a priori assigned. In the literature, to the best of our knowledge, such a value is immediately set arbitrarily. In the present paper the choice of the parameter to be fixed and its possible values is driven by two important considerations. First of all the other estimated parameters need to have a physical meaning, i.e. they have to satisfy constraints A.1 and A.2. Besides, it will be shown that a suitable choice of the parameter allows an efficient solution of the considered identification problem. In order to identify the parameters qij and fi , first we must recast the model in a regressor form. The model in (8) becomes yi (t + 1) = n X αji yj (t ) + βki yk (t )yi (t ) + ei (t ) (26) k=1 j=1 j6=i n X where Remark 1. It is essential to stress that, assuming that f˜i > 0 and fi > 0 for any i, the Eqs. (17) and (19) establish an equivalence relation in the set of models M describing the quasispecies dynamics. The equivalence relation between m̃ ∈ M and m ∈ M is denoted as m̃ ∼ m. Therefore, the following properties are verified: - Reflexive: m ∼ m for all m in M. (This is trivially verified by substituting λ = 0). - Symmetric: m ∼ m̃ implies m̃ ∼ m for all m, m̃ in M, where m ∼ m̃ means f˜ = f + λ1 1 Q̃ = [Q diag(f ) + λ1 I ][diag(f ) + λ1 I ]−1 (21) αii = 1 + Tqii fi αji = Tqij fj (27) β = −Tfk + α . i k i i Finally we stress that in Eq. (26) we have P already exploited the condition P.1. And so, if P.1. is satisfied ( j yj = 1), we have that: yj (t ) = yj (t ) n X yk (t ) k=1 giving raise to the elimination of the term j = i in the first sum of Eq. (26). The system can be recast in a regressor-like form as follows. Let us consider a single measurement. and implies y(t + 1) = Φ (t )Θ + e(t ) f = f˜ + λ2 1 where y(t ) = [y1 (t ) y2 (t ) . . . yn (t )] , e(t ) = [e1 (t ) e2 (t ) . . . en (t )]T , Q = [Q̃ diag(f˜ ) + λ2 I ][diag(f˜ ) + λ2 I ] −1 (22) selecting λ2 = −λ1 . - Transitive: m ∼ m̃ and m̃ ∼ m̂ imply m ∼ m̂ for all m, m̃, m̂ in M. In detail f˜ = f + λ1 1 Q̃ = [Q diag(f ) + λ1 I ][diag(f ) + λ1 I ]−1 fˆ = f˜ + λ2 1 Q̂ = [Q diag(f ) + λ3 I ][diag(f ) + λ3 I ]−1 φ1 (t ) 0 Φ (t ) = .. . yi (t + 1) = φi (t )θ i + ei (t ) 0 φ2 (t ) .. . 0 0 0 .. . ... ... .. . ... 0 0 , 2 (30) φn (t ) Θ = [θ θ . . . θ ] . 1 (29) n T (31) Each θ is composed by the 2n − 1 parameters defined in (27) that have to be determined: i imply fˆ = f + λ3 1 for i = 1, . . . , n 0 (23) Q̂ = [Q̃ diag(f˜ ) + λ2 I ][diag(f˜ ) + λ2 I ]−1 θ i = [α1i , . . . , αii−1 , αii+1 , . . . , αni , β1i . . . βni ]T (24) selecting λ3 = λ1 + λ2 . Then for each m ∈ M we may define the corresponding equivalence class in the following subset [m] = {m̃ ∈ M : m̃ ∼ m}. (28) T (25) Since two equivalence classes are either equal or disjoint, the collection of equivalence classes forms a partition of M. A set of class representatives is a subset of M which contains exactly one element from each equivalence class. In Section 3.5 we investigate how this overparameterization affects the structural identifiability. 3.3. Regressor form The previously analyzed overparameterizations reveal that structural identifiability is guaranteed only if a parameter value is (32) and each row component φi (t ) of the regressor is a vector defined at time t as φiT = y(i) yi y (33) where y(i) = [y1 . . . yi−1 yi+1 yn ]T , yi y = [y1 yi . . . yi−1 yi y2i yi+1 yi . . . yn yi ]T . 3.4. Experimental setup and constraints In order to guarantee experimental identifiability, the system is initialized from p different randomly generated initial conditions. We randomly pick different values of the state belonging to the simplex Σ . This corresponds to performing a physical experiment by starting from many different concentrations: clearly the possibility of selecting more than one initial condition may be dependent on the specific domain of application. For each P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 initial condition, the identification experiment is composed by nN measurements, and the overall data is a vector of pnN elements. For any initial condition j we build: Yj = [y(t + 1)T , y(t + 2)T , . . . , y(t + N )T ]T Ψ j = [Φ (t )T , Φ (t + 1)T , . . . , Φ (t + N − 1)T ]T (34) E = [e(t ) , e(t + 1), . . . , e(t + N − 1) ] j T T T and stacking the vectors such that, if Y = [(Y1 )T . . . (Yp )T ]T ; Ψ = [(Ψ 1 )T . . . (Ψ p )T ]T and E = [(E 1 )T . . . (E p )T ]T , Y = Ψ Θ + E. (35) 3.5. Structural identifiability In order to guarantee the structural identifiability we notice that at each time t the original parameters to be identified (qij and fi ) are n2 + n. Recalling that the fi parameters are subject to the constraint (17) and that the qij parameters are subject to n constraints corresponding to Eq. (3), the effective number of free parameters to be determined is n2 + n − 1 − n = n2 − 1. In the Θ -space, we need to estimate n(2n − 1) parameters subject to n constraints obtained from the relation (3) and (n − 1)(n − 1) from (27). Then, the free parameters to be estimated are n(2n − 1) − n − (n − 1)(n − 1) = n2 − 1. The problem is well posed. The new constraints in the Θ -space have been simply determined, and their expression is reported hereafter in terms of its components. The n constraints (3) are equivalent to the n equality constraints for j = 1, . . . , n n X αjk + βjj = 1. (36) k=1 j6=k Notice that from (27) we have that the parameters in the extended state satisfy the following (n − 1)(n − 1) equality constraints: for j = 2, . . . , n for k = 2, . . . , n (37) βj1 − β11 − βjk + β1k = 0. Remark 2. Notice that constraints (37) imply the following conditions βki − β`i − βk` + β`` = 0 ∀i, `, k = 1, . . . , n. (38) Moreover, we recall that the model must satisfy further inequality constraints given in A.1 and A.2. In the Θ -parameter space, correspondingly, we can impose that: for j = 1, . . . , n for i = 1, . . . , n (39) αji ≥ 0 i 6= j. Under assumptions (16), (37), (36) and (39), it is possible to define also an extended Feasible Parameter Set (FPS) as: Ω = {Θ ∈ M : |Yi − Ψi Θ | ≤ , i = 1, . . . , Npn} (40) where Yi and Ψi indicate the ith rows of Y and Ψ respectively, and the set of constraints on the parameters is M = {Θ : (37), (36) and (39) are satisfied}. (41) The following identification problems have been considered and solved. Single Model Problem: Least Squares Conditional Central Estimate. Θ ∗ = arg min kY − Ψ Θ k∞ Θ ∈M and ∗ denotes the associated minimum value of the objective function in (42). The computational burden of this problem amounts to solving one constrained Linear Programming (LP) Problem. Model Set Problem: Uncertainty Intervals (UI). Specifically we are interested in the computation of the intervals which contain all feasible values of the physical parameters. Due to the lack of structural identifiability, such intervals critically depend upon the choice of the class representative. Concerning this, in the spirit of a validation approach, one could at least in principle, take unions of the UI over all possible such choices of the class representative. This approach, however, besides being computationally intractable, yields results which often are not significant, namely one always encounters infinite intervals for the fi s (indeed their values are equivalent up to arbitrary simultaneous identical translations) and [0, 1] for the qij s. Hence, in order to introduce a meaningful definition of uncertainty interval we must first specify what is the policy of selection of the class representative. In particular, by this we denote any manifold: {Mc ⊂ M : card{Mc ∩ [m]} = 1 for all m ∈ M } where card{} denotes the cardinality of a set. (42) (43) For each possible equivalence class defined in (25) we select a specific one as a class representative by choosing Mc according to (43). Since the selection of the class representative, due to the overparameterization of the model, is arbitrary, the estimated parameters loose physical relevance. A class of models independent of the parameterization can be achieved only if additional a priori information is available. In any case the procedure is still useful since it can be exploited to verify if the collected data cannot be explained with a quasispecies model. Having this in mind, the uncertainty intervals can be defined by the lower and upper bound f i , f , qij and q solutions of the following i ij mathematical programming problems: For i = 1, . . . , n: f = min fi i subject to Θ (f , Q ) ∈ Ω (f , Q ) ∈ Mc 0 ≤ q`k ≤ 1, `, k = 1, . . . , n `, k = 1, . . . , n fk > 0, k = 1, . . . , n f i = max fi subject to Θ (f , Q ) ∈ Ω (f , Q ) ∈ Mc 0 ≤ q`k ≤ 1, fk > 0, (44) k = 1, . . . , n. For i = 1, . . . , n, j = 1, . . . , n: q = min qij qij = max qij ij subject to Θ (f , Q ) ∈ Ω (f , Q ) ∈ Mc 0 ≤ q`k ≤ 1, `, k = 1, . . . , n `, k = 1, . . . , n fk > 0, k = 1, . . . , n subject to Θ (f , Q ) ∈ Ω (f , Q ) ∈ Mc 0 ≤ q`k ≤ 1, fk > 0, (45) k = 1, . . . , n n(2n−1) where Θ (f , Q ) : R × R → R is a function defined by composition of Eqs. (31), (32) and (27) which maps the physical parameters to the regressor entries. Notice, that due to the equivalence relation illustrated in Lemma 2, if there exist f , Q solutions of the considered optimization problems this is not unique. Hence, also Θ (f + λ1, [Q diag(f ) + λI ][diag(f ) + λI ]−1 ) = Θ (f , Q ) are solutions for all λ ∈ R that preserve properties A.1 and A.2. Then, in order to have well-posed problems we choose a class representative by fixing the value of one parameter. The optimization problems (44) and (45) are subject to nonlinear constraints which can be solved by time-consuming branch and bound procedures. In this paper we show how the Least Squares Conditional Central Estimate in the original parameter space can be easily computed by means of an iterative procedure and how the exact computation of the UI for a selected class representative can be carried out, solving some suitable constrained LP programs. n 3.6. Identification problems 533 n×n 534 P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 3.7. Inversion algorithm for the Single Model Problem Assume that the Single Model Problem has been solved. Given the estimated Θ ∗ ∈ M , we should invert relations (27) in order to find a solution in the original space of parameters. Due to the equivalence relation, the inversion is not uniquely defined. The next Proposition gives conditions and a procedure for the determination of the unique inversion solution of the class representative in the case of quasispecies models. Proposition 1. Let the vector Θ defined according to Eqs. (31)–(32) belong to M as defined in (41). Then, there exist qij ∈ R and fj ∈ R for 1 ≤ i, j ≤ n satisfying (27) such that conditions A.1, A.2 and (3) are met. A unique inverse exists and can be explicitly computed once a suitable value for any of the parameters fj is fixed. Proof. First of all, we will show that by exploiting Eq. (27) it is possible to determine the order and the relative distance between parameters fi , for i = 1, . . . , n. Then, we will find conditions on fi in order to guarantee the fulfillment of A.1 and A.2. Finally, we prove that it is possible to obtain an admissible set of parameters inverting (27). The distances between the parameters is given by βji − βki . , ∆kj = fk − fj = ∀k, j T (46) which is the same for any possible i by virtue of (37). It is straightforward to establish the order relations between parameters fi checking the sign of ∆kj s. Thus, thanks to relation (17), we can ensure strict positivity of all the fi by assigning a sufficiently large positive value to any of the fj s. Exploiting conditions (39), since fi > 0 and by virtue of (27), conditions qij ≥ 0 are satisfied ∀P i, j and i 6= j. Moreover, the conditions in (36) are n equivalent to i=1 qij = 1 and qij ≥ 0 for i 6= j imply qjj ≤ 1 for j = 1, . . . , n. In addition, letting ∀j, ( fj ≥ max α 1 j T ,... j −1 j α T , j+1 j α T ,... α n j T ) j , 1 − βj T it is possible to guarantee conditions qij ≤ 1 and qjj > 0. Hence, choosing fj sufficiently large (as shown hereafter) ensures A.1 and A.2. We notice that Eq. (27) are invertible because they are bilinear and in triangular form for an assigned value of the parameter fj . In order to get qij ∈ [0, 1] with the inversion procedure, we denote with h the position of the smallest fj for j = 1, . . . , n. Then, any f such that fh ≥ γ̂ = min γ (47) such that ( γ + ∆kh ≥ max αk1 T α k−1 α k+1 α n 1 − βkk ,... k , k ,... k , T T T ) T for k = 1, . . . , n yields an admissible set of parameters. potentially exhibit local optima. In this subsection we determine some conditions which must be satisfied by each parameter fh in order to have a class representative with physical meaning. We show that, assigned the class representative by selecting Mc , it is possible to recast the (44)–(45) problem in terms of LP problems. The procedure on how to compute the optima is shown in the following procedure. The corresponding algorithm is summarized in the Appendix. Intuitively, following the reasonings carried out in the previous subsection, it is clear that a sufficiently large value of the parameter fh gives rise to suitable UI. As a first step, in the characterization of such admissible fh , we compute an estimation of the UI in the extended parameters space solving the following constrained LP problems. for i = 1, . . . , n for j = 1, . . . , 2n − 1 θ ij = min Θ ∈Ω ,Θ ∈M (48) i θji , θj = θji . max Θ ∈Ω ,Θ ∈M The computational burden of this problem amounts to solving 2(2n − 1)n LP Problems. Lemma 3. Let us assume that the FPS (40) is nonempty, then there exist values fˆj > 0 for all j = 1, . . . , n such that, provided fj > fˆj , the optimization problems (44) and (45) admit solutions. Proof. First of all, we will show that exploiting Eqs. (27) and solutions of the problem (48), it is possible to determine some conservative information on the order of the lower bounds of the parameters fi , for i = 1, . . . , n and on the lower and upper bounds of their distance. Then, we will give conditions on fi in order to guarantee fulfillment of A.1 and A.2. The distances between the parameters is determined by Eq. (46). . Let us define [α i1 . . . α ii−1 α ii+1 . . . α in , β i . . . β i ] = [θ i1 , . . . , θ i2n−1 ] i 1 . i i n i and [α i1 . . . α ii−1 α ii+1 . . . α in , β 1 . . . β n ] = [θ 1 , . . . , θ 2n−1 ]. Exploiting interval analysis it is possible to compute lower and upper bounds of ∆kj for all k, j with the following procedure i ∆ikj i i i i . βj − βk . βj − βk ≤ ∆kj ≤ ∆kj = = T T ∀ i. (49) Then, exploiting the information obtained for each i we obtain the desired estimation i ∆kj = max ∆ikj ∆kj = min ∆kj . (50) i∈{1,...,n} i∈{1,...,n} Finally, thanks to the relation (17), we can ensure strict positivity of all the fi by suitably choosing the value of the selected parameter fh . Notice that the physical parameters satisfy the following inequalities α ij ≤ Tqij fj ≤ α ij ∀i, j and i 6= j (51) j Remark 3. The proof of the existence of the unique solution is a constructive one. The procedure is implicit and suggests the identification algorithm that is explicitly shown in the Appendix. β jj − 1 ≤ Tfj (qjj − 1) ≤ β j − 1 ∀j. Exploiting conditions (39), since fi > 0 and by virtue of (27), conditions q ≥ 0 are satisfied ∀i, j and i 6= j. Moreover, the ij conditions in (36) are equivalent to i=1 qij = 1 and q ≥ 0 for ij i 6= j imply qjj ≤ 1 for j = 1, . . . , n. In addition, letting ∀j, 3.8. UI computation for the Model Set Problem solution The Model Set Problem cannot be solved directly. First of all it is fundamental to pick Mc compatibly with the structure of equivalence classes. Then, at least in principle, it would be possible to solve the optimization problems (44)–(45). Due to nonlinearity of the constraints, however, the problems in (44)–(45) may Pn ( fj ≥ max α 1j T α jj−1 α jj+1 α nj 1 − β j ,... , ,... , j T T T ) (52) T it is possible to guarantee conditions qij ≤ 1 and q > 0. Hence, jj choosing fj sufficiently large (as shown hereafter) ensures A.1 and A.2. In order to get the UI of any qij in the range [0, 1], we P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 535 denote with h the position of fj with the smallest lower bound for j = 1, . . . , n. Then, any f such that that is the same for any possible ` by virtue of (38). fh ≥ γ̂ = min γ For i 6= j (53) subject to ( γ + ∆kh ≥ max α 1 k T ,... α k−1 k T α , k+1 k T ,... α n k T , 1−β ) k k = T for k = 1, . . . , n yields admissible UI for the parameters in the original space. Hence, this corresponds to the choice of Mc = {(f , Q ) ∈ M : fh = fˆh } for some fˆh > γ̂ . Remark 4. Notice that the proof of the lemma provides a constructive procedure for evaluation of the fˆj for all j = 1, . . . , n. Once a suitable value for parameter fh has been determined by Lemma 3, it is possible to proceed with the computation of the UI for qij and fj . Hereafter we show that it is possible to exactly compute these UI, solving some optimization problems in the extended parameter space for suitable cost functions. Given a suitable value of parameter fh , solve the following optimization problems: For i = 1, . . . , n and i 6= h ∆f i = min cfi (Θ ) ∆f i = max cfi (Θ ) Θ ∈Ω where cfi (Θ ) = f = ∆f + fh , i (54) Θ ∈Ω 4 θni −1+h −θni −1+i T . Then compute: f i = ∆f i + fh . i (55) For i = 1, . . . , n, for j = 1, . . . , n: if i < j solve q = min γij (Θ ) ij qij = max γij (Θ ) Θ ∈Ω 4 where γij (Θ ) = (56) Θ ∈Ω θji−1 j j θn−1+h −θn−1+j +Tfh . ii qii = max πii (Θ ) Θ ∈Ω where πii (Θ ) If i > j solve: ij +θ k +h −θnk−1+k +Tfh −1 4 θi = n−1+kθ i n−1−θ . i +Tfh n−1+h n−1+i qij = max δij (Θ ) Θ ∈Ω (58) Θ ∈Ω θji 4 where δij (Θ ) = j j θn−1+h −θn−1+j +Tfh . T = βhi − βii T = Tfj − Tfh + Tfh θji−1 j n−1+h θ αji βhj − βjj + Tfh (60) j<i βki + T (fk − fh ) + Tfh − 1 Tfi T (fi − fh ) + Tfh βki + βhk − βkk + Tfh − 1 = βhi − βii + Tfh θni −1+k + θnk−1+h − θnk−1+k + Tfh − 1 = ∀k = 1, . . . , n. (61) θni −1+h − θni −1+i + Tfh qii = βki + Tfk − 1 = j>i − θnj −1+j + Tfh θji θnj −1+h − θnj −1+j + Tfh = Notice that qii admits n different expressions for different values of k ∈ {1, . . . , n} but they are all equivalent. Indeed, for any k and ` qii = = βki + Tfk − 1 Tfi = β + β` − β` + i k i i βki + β`i − β`i + Tf` − Tf` + Tfk − 1 Tfi β`` Tfi ` − βk + Tf` − 1 = β`i + Tf` − 1 Tfi (62) since βki − β`i + β`` − βk` = 0 as from condition (38). Then, in order to compute bounds for qii , it is sufficient to solve two optimization problems for some k ∈ {1, . . . , n}. Hence, for an assigned value of fh which practically selects a class representative, due to injectivity and surjectivity of the map Θ (f , Q ) restricted to M , the solution of the optimization problems (54)–(58) supply the UI for qij and fj . Moreover it is well known that optimization problems with rational objective functions, whose numerator and denominator are affine functions and the denominator is always positive, subject to linear constraints can be recast as LP problems [28]. Then, since the FPS (40) is nonempty, the true value of the UI for the selected class representative qij and fj ∀i, j = 1, . . . , n are computed and this requires the solution of 2(n2 + n − 1) LP problems. Remark 5. It is interesting to discuss our approach from a geometrical point of view. The map Θ (f , Q ), defined by composition of Eqs. (31), (32) and (27) is neither injective nor onto, however its image indeed coincides with the set M , so that, restricting our attention to such set we are able to recover surjectivity. Injectivity, as . customary, is achieved by considering the map Θ̂ ([m]) = Θ (f , Q ) which maps M \ ∼ to the set M ; where m, as usual, denotes the couple (f , Q ) while . Proof. First of all we show that when a suitable value for the parameter fh has been assigned the proposed procedure computes the desired UI. Exploiting (27), we get the following expressions for the physical parameters βh` − βi` Tfj M / ∼= {[m] : m ∈ M }. Theorem 1. Let us assume that the FPS (40) is nonempty, then the optimization problems (54)–(58) supply the true values of the UI for the selected class representative qij and fj ∀i, j = 1, . . . , n as defined in (44)–(45). The computational burden of this problem amounts to solving 2(n2 + n − 1) LP problems. fi − fh = αji = (57) Θ ∈Ω q = min δij (Θ ) αji For an exhaustive discussion about the numerical complexity of the optimization problems (56), (57), and (58) see [29]. If i = j solve for some k ∈ {1, . . . , n} q = min πii (Θ ) qij = θni −1+h − θni −1+i T (59) (63) Notice that, the above definition is well posed since indeed Θ (f , Q ) = Θ (f˜ , Q̃ ) whenever (f , Q ) ∼ (f˜ , Q̃ ). Since the structure of the quotient space M / ∼ is not easy to work with, uncertainty intervals are described by, a priori, determining the selection policy Mc . This means that we fix the kind of representative for each equivalence class, making sure that this is done without loss of generality, namely each equivalence class is present in the selected family of representatives thanks to the selection of exactly one of its members. In order remove the need for arbitrarily fixing Mc , one typically needs extra a priori information. Indeed, the lack of structural identifiability of the problem prevents a nontrivial confinement of the UI only on the basis of measures of 536 P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 y. Nevertheless, physical parameters are known to satisfy other empirical relations, which are plausible for direct inference of certain parameters. For instance, if qij s fulfill Eq. (65) a very strong constraint is imposed on the plausible values of the qij which one may combine with those arising from our procedure. More simply, if a priori lower and upper bound for fh are granted, then the union of the intervals as computed by our procedure can be employed to derive bounds on all other parameters. We stress that, carrying out our procedure by computing the UI over a parameterized family of selection manifolds Mc which spans the whole space M, entitles to a rather systematic analysis of the set FPS. This new information can be exploited to characterize other physical mechanisms in the system under analysis. First of all, given the a priori information on some fh = f˜h , the overall a priori information is invalidated if f˜h < fˆh . Finally, assuming that f˜h > fˆh , it is possible to compute the UI. The interest might be in determining if the measured data have been generated by a genotype distribution in which only point mutations occur. This situation is described by the fulfillment of the following explicit expression for Q in terms of the copying fidelity as given by [18] 3.9. Model quality evaluation and invalidation qij = qν As stated in the introduction, we recall that any model need to be validated, i.e. examined if it is good enough or not for its intended purpose. We recall that the data can always invalidate the system, and in our contest this corresponds to a feasible set that is empty. When this set is empty, then some of the a priori information are wrong, or some of the assumptions on the system are invalidated by the data. If the data are not invalidating the set, then it is possible to evaluate the quality of the identified model. The quality depends on its purpose, and can be measured according to it. In particular the behavior of the system and the identified model need to be compared. The comparison is based on some appropriate output features. Any mismatch between system and model output should be analyzed, having in mind that any model is an approximation of the system. Some of the identification literature is devoted for example to the validation problem in the context of modeling for control. There, the purpose is to get a low-order model that is good enough and for which the design of a controller is viable. In systems biology, instead the validation problem has different purposes. The main interest consists in bearing out that all physical mechanisms involved have been characterized. In the case of the Single Model problem, if the FPS is nonempty, a model quality evaluation is based on the predictive ability of the model. In the case of the Model Set problem, the use of UI to carry out a validation procedure based on some physical mechanism having biological meaning is considered and solved. 3.9.1. Single Model Problem Let ŷ(t ) = ĝ (y(t )) be the predicted output obtained for the Single Model Problem, once the Central Estimate obtained by ĝ has been determined using a certain set of data. The model is validated in a different experimental setup (noise and initial conditions), by comparing the measured output y(t + 1) with the resulting predicted evolution ŷ(t + 1) = ĝ (y(t )): ẽ(t ) = ŷ(t + 1) − y(t + 1). (64) Model (in)validation tests are usually based on the difference between the simulated and measured output and some statistics about these differences. Here, the model is invalidated if the |ẽ(t )|∞ norm of the error is greater than . Notice that the solution of the Single Model Problem provides together with the nominal model ĝ (obtained by the nominal parameter values Θ ∗ ) the minimum admissible error ∗ , explaining the collected data, computed by solving the optimization problem (42). If ∗ > means that the FPS is empty and the a priori information on the system may be invalidated. Instead if the FPS is not empty the quality of the estimated model can be evaluated on the basis of its predictive capability. 3.9.2. Model Set Problem Whenever the a priori information has not been invalidated and some knowledge of fh is available, it is possible to evaluate the UI. 1−q hij (65) q where hij is the Hamming distance between genomes j and i, and ν is the genome length and q is the copying accuracy. The Hamming distance hij is defined as the number of positions where genomes j and i differ. Then, once the uncertainty intervals for the qij s are obtained, we may exploit all the available information to determine if some admissible value of q may exist. In particular, expression (65) as a function of q has a bell-shaped profile in the interval [0, 1], with a unique maximum point in q? = (ν − hij )/ν , where it takes the value h q?ij = (ν − hij )(ν−hij ) hijij νν . According to the values of q and qij , several cases are possible: ij ? (1) q > qij , the interval of admissible qs is empty; ij (2) qij > q?ij > q , an interval of admissible qs exists; ij (3) q?ij > qij > q , two disjoint intervals of admissible qs exist. ij For each i, j in {1, 2, . . . , n} we compute the above intervals (whose computation can be accomplished efficiently by bisection or gradient descent, once case 1., 2. or 3. has been decided), and take their intersections (possibly empty). The admissible values for q are important for the interpretation of mutation events [1]. If the a priori information on fh is not precise, for instance it belongs to some known interval, we may attempt a span over the corresponding parameterized family of selection manifolds Mc , and take the unions of the computed intervals. 4. Examples The effectiveness of the proposed procedure, obtained by solving the Single Model Problem and the Model Set Problem, is now illustrated on two examples. In the first one we take a system whose behavior cannot be reproduced by the considered class of models. In particular the true system is a Replicator–Mutator (viz. its fitness is state-dependent) but we try to identify it as a quasispecies model. In this case we solve the single model problem and we investigate the effects of the model error. In the second one we show the use of the quasispecies model to identify the kinetic parameters of a chemical reaction network. The optimal Q ∗ and f ∗ have been determined as well as their uncertainty intervals. For the sake of clarity, we report hereafter only the results corresponding to the simplest genetic sequences, obtained for ν = 2 in Eq. (65). This gives raise to n = 4 state variables, although more complex genetic sequences have also been tested. The considered models have been discretized by Euler’s technique with a sampling time T = 0.01 s. Example 1. In order to evaluate the effectiveness of the proposed procedure, experimental data sets were generated by means of a Replicator–Mutator model ẋ = (Q (diag(w)+diag(x)Γ )−(w T x)I − xT Γ T xI )x, while their consistency with a quasispecies model will be P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 537 checked. The parameters of the simulated Replicator–Mutator [5] model are the following: 0.16 0.24 Q = 0.24 0.36 0.24 0.16 0.36 0.24 −9.15 13.20 Γ = 0.40 8.75 0.24 0.36 0.16 0.24 13.80 17.00 10.75 −4.00 0.36 0.24 0.24 0.16 2.30 17.55 17.05 2.80 9.70 17.85 w= 15.25 9.15 16.80 −15.65 . −2.80 14.55 (66) (67) The considered model, apparently, does not present a different qualitative behavior with respect to the quasispecies one. Moreover, bounded additive noise was introduced in the process, with `∞ norm less than or equal to 0.005. This bound is also considered as available a priori information on e(t ), i.e. = 0.005. In the identification experiment, carried out with N = 200 measurements, the system has been initialized with p = 2 different randomly generated initial conditions x(0). The overall data are Npn = 1600 and a quasispecies model is tuned to fit them. Solving the Single Model a nominal model Θ ∗ is determined corresponding to ∗ ≈ 0.005. In order to possibly invalidate the a priori assumptions and the obtained model, we applied the inversion procedure to the identified Θ ∗ . In particular, according to relation (47) we obtained f4 ≥ 8.9079. Since the overparameterization allows us to arbitrarily select any value of f4 satisfying the previous inequality we set f4 = 13.9079 and computed Q ∗ and f ∗ accordingly. The obtained model parameters for the Conditional Central Estimate are: 0.3468 0.1748 ∗ Q = 0.2080 0.2704 0.1986 0.2581 0.3173 0.2260 17.1738 31.2643 ∗ . f = 29.6277 13.9079 0.2115 0.3038 0.2751 0.2096 0.3086 0.1916 0.1402 0.3595 Fig. 1. Example 1: Validation – evolution comparison. (68) (69) Finally, we evaluated the system dynamics from a different set of known initial conditions. The state evolution resulting by simulating the identified model with new initial conditions and no noise, plotted against the Replicator–Mutator noisy solution initialized from the same initial state, is reported in Fig. 1, while Fig. 2 shows the evolution of the one-step-ahead prediction error ẽ(t ). Notice that, during the transient, the error is one order of magnitude larger than = 0.005. This value is not acceptable considering Pn the a priori assumption on and the constraints xi ≥ 0 and = 1 which limit the admissible state variables i =1 x i in the range [0, 1]. Then it is possible to conclude that the quasispecies model is not suitable to thoroughly describe the behavior of the system under consideration. Moreover in Fig. 2 it is also emphasized that the bound is not violated whenever the trajectory evolution belongs to a small neighborhood of the steady state. Indeed, while fewer parameters are still enough to adjust the equilibrium position in the desired location, the transient behavior, though qualitatively similar, appears to be different at a closer quantitative examination. Our simulative investigations have provided evidence that validation generally requires suitable sets of data generated from appropriate selected initial conditions. This means data obtained at steady state are not sufficiently informative to check if important physical phenomena are missed. In this case, the same set of steady-state data would be well described both from a quasispecies model and a Replicator–Mutator one. Indeed the transient response is very Fig. 2. Example 1: Validation – error evolution. important in the identification procedure, and it is essential to validate the model selecting different initial conditions located far away from the final steady state and well spread in the domain where the system is defined. Example 2. It has been shown that evolution of molecules based on replication and mutation and exposed to selection at a constant population size can be described in terms of chemical reaction kinetics (see [1] and the references therein). The following parallel chemical reactions qji fi A + Ci → Cj + Ci (70) form a network which considers the formation of every RNA genotype as a mutant of any other genotype. The material A required by RNA synthesis is continuously provided. The quantities Pn of interest are the relative concentration xk = [Ck ] / k=1 [Ck ] for k = 1, . . . , n. The reaction network (70) is described by a quasispecies model. The experiments have been carried out with p = 9 different initial conditions x(0) and an `∞ noise bound with = 0.01 has been considered. The overall measured data are Npn = 5436. The identification procedure ended with a valid data set for the UI interval and the Conditional Central Estimate. In particular, the solution of the optimization problem (53) provides f4 > 8.102. Since the overparameterization allows us to arbitrarily select any value of f4 satisfying the previous inequality we set f4 = 9.16. Then the UI (71), (72) and the Conditional Central Estimate (73) in the original space of parameters are obtained. 538 P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 Fig. 3. Example 2: Validation – evolution comparison. Fig. 5. Upper (dashed) and lower (solid) bounds for q as a function of f4 . intervals are plotted, as a function of f4 , in Fig. 5, showing clearly that we can a posteriori refine our estimate on f4 , in the interval 9.10 ≤ f4 ≤ 9.65, (the corresponding intervals for q being empty outside this range), while the unknown parameter q must then belong to 0.390 ≤ q ≤ 0.409. 5. Concluding remarks Fig. 4. Example 2: Validation - error evolution. Q ∈ [0.098, 0.166] [0.231, 0.260] [0.235, 0.257] [0.352, 0.391] [0.237, 0.251] [0.131, 0.164] [0.357, 0.373] [0.236, 0.248] [0.236, 0.248] [0.356, 0.376] [0.129, 0.166] [0.238, 0.249] [8.893, 9.948] [17.228, 18.187] F ∈ [14.381, 15.472] [9.160, 9.160] [0.354, 0.382] [0.236, 0.255] [0.232, 0.255] (71) [0.116, 0.161] 0.1609 0.2398 ∗ Q = 0.2398 0.3596 0.2399 0.1605 0.3598 0.2399 9.7100 17.8600 ∗ f = . 15.2600 9.1600 (72) 0.2398 0.3598 0.1606 0.2398 0.3596 0.2397 0.2397 0.1609 A new methodology devoted to estimating and validating/ invalidating a single quasispecies models as well its uncertainty set starting from data has been determined. The feasible parameter set has been used to validate the a priori information and then a model quality evaluation based on the model predictive ability is performed. The conditions and the procedure for the determination of the unique inversion solution of the class representative for quasispecies models have been determined. Moreover, due to the obtained inversion procedure, it is possible to get Uncertainty Intervals also on the original physical parameters, and set a procedure to validate the biological system, in order to understand if the model captures the biological behavior. Since the proposed method is based on the solution of Linear Programming (LP) problems, it is applicable also to high-dimensional systems. The presented techniques show a good ‘‘in silico’’ methodology that can be applied to real experiments, for example to identify the kinetic parameters of a chemical reaction network. Acknowledgement (73) The model has been validated on a different setup (different initial conditions and noise) and the resulting dynamics are reported in Fig. 3. In Fig. 4 the difference between the measurements and estimated evolution is reported. In this case, the error has a uniform behavior along the considered time interval. Then, looking at the obtained prediction error and the UI the identified quasispecies model is not invalidated. Now, we consider that some additional a priori information is available. In particular we consider the fulfillment of expression (65) for Q and f4 ∈ [9, 10]. Then, we proceeded to compute the admissible values of q, for f4 in the range [9, 10]. The resulting Second author was partially supported by MIUR-PRIN ‘‘Robust Techniques for Uncertainty Systems Control’’. Appendix Given the estimated Θ ∗ ∈ M , solution of the Single Model Problem (42), Proposition 1 suggests the following algorithm for the inversion of relations (27). Identification Algorithm: Inversion Single Model Problem (i) Determine the position h of the smallest fj for j = 1, . . . , n checking the sign of ∆kj s computed according to definition (46); (ii) Compute the lower bound γ̂ for fh solving the optimization problems (47); (iii) Fix a value fh > γ̂ ; (iv) Compute fk = fh + some i; βhi −βki T for k = 1, . . . , n with k 6= h for P. Falugi, L. Giarré / Systems & Control Letters 58 (2009) 529–539 (v) Compute qii = (vi) Compute qij = 1−αii Tfi for i = 1, . . . , n; Tfj for i, j = 1, . . . , n with i 6= j. 1−αji Whenever the FPS is not empty, Lemma 3 and Theorem 1 suggest the following algorithm for the solution of the Model Set Problem Identification Algorithm: Model Set Problem (i) Determine the position h of fj for j = 1, . . . , n with the smallest lower bound checking the sign of ∆kj s computed according to definition (50); (ii) Compute the lower bound γ̂ for fh solving the optimization problem (53); (iii) Fix a value fh > γ̂ ; (iv) Solve the optimization problems (54)–(57). Indeed, when the FPS is not empty, the conditions in Proposition 1, for determining the smallest position h of fj for j = 1, . . . , n together with the relative lower bound, are tightened up by the ones given in Lemma 3. Then, in this case, the overall identification algorithm is the following one Complete Identification Algorithm: (i) Determine the position h of fj for j = 1, . . . , n with the smallest lower bound checking the sign of ∆kj s computed according to definition (50); (ii) Compute the lower bound γ̂ for fh solving the optimization problem (53); (iii) Fix a value fh > γ̂ ; (iv) Compute fk = fh + some i; (v) Compute qii = (vi) Compute qij = 1−αii βhi −βki T for k = 1, . . . , n with k 6= h for Tfi for i = 1, . . . , n; Tfj for i, j = 1, . . . , n with i 6= j; 1−αji (vii) Solve the optimization problems (54)–(57). References [1] M. Stadler, P. Stadler, Molecular replicator dynamics, Advances in Complex Systems 6 (2003) 47–77. [2] M. Eigen, J. Mccaskill, P. Schuster, The molecular quasispecies, Advances in Chemical Physics 75 (1989) 149–263. [3] K.P. Hadeler, Stable polymorphisms in a selection model with mutation, SIAM J. Appl. Math. 41 (1981) 1–7. [4] M. Eigen, P. Schuster, The Hypercycle. A Principle of Natural Self-organisation, Springer–Verlag, 1979. [5] J. Hofbauer, K. Sigmund, Evolutionary Games and Replicator Dynamics, Cambridge University Press, Cambridge, 1998. [6] Y. Brumer, F. Michor, E. Shakhnovich, Genetic instability and quasispecies model, Journal of Theoretical Biology 241 (2) (2006) 216–222. 539 [7] P. Stadler, P. Schuster, Mutation in autocatalytic reaction networks— an analysis based on perturbation theory, Journal of Mathematical Biology 30 (1992) 597–632. [8] M. Milanese, A. Vicino, Optimal estimation theory for dynamic systems with set membership uncertainty: An overview, Automatica 27 (1991) 997–1009. [9] L. Giarré, B. Kacewicz, M. Milanese, Model quality evaluation in set membership identification, Automatica 33 (6) (1997) 1133–1139. [10] H. El-Samad, S. Prajna, A. Papachristodoulou, M. Khammash, J.C. Doyle, Model validation and robust stability analysis of the bacterial heat shock response using sostools, in: Proc. of the IEEE the 42nd Decision and Control Conference, 2003, pp. 3766–3771. [11] C. Briones, E. Domingo, C. Molina-París, Memory in retroviral quasispecies: Experimental evidence and theoretical model for human immunodeficiency virus, Journal of Molecular Biology (331) (2003) 213–229. [12] L. Sguanci, M. Bagnoli, P. Lió, Modeling hiv quasispecies evolutionary dynamics, BMC Evolutionary Biology 7. [13] J. Salmeron, P. Munoz De Rueda, A. Ruiz-Extremera, J. Casado, C. Huertas, M. Bernal, L. Rodriguez, A. Palacios, Quasispecies as predictive response factors for antiviral treatment in patients with chronic hepatitis c, Digestive Diseases and Sciences 51 (2006) 960–967. [14] P. Kosalaraksaa, M. Kavlick, V. Maroun, R. Le, H. Mitsuya, Comparative fitness of multi-dideoxynucleoside-resistant human immunodeficiency virus type 1 (hiv-1) an in vitro competitive hiv-1 replication assay, Journal of Virology 73 (1999) 5356–5363. [15] K. Page, M. Nowak, Unifying evolutionary dynamics, Journal of Theoretical Biology 219 (2002) 93–98. [16] M.A. Nowak, From quasispecies to universal grammar, Zeitschrift fur Physikalische Chemie 216 (2002) 5–20. [17] N.L. Komarova, Replicator–mutator equation, universality property and populations dynamics of learning, Journal of Theoretical Biology 230 (2) (2004) 227–239. [18] M. Nilsson, Mathematical models of molecular evolution, Ph.D. Thesis, Götenborg University, 2000. [19] J. Bielawski, Z. Yang, Maximum likelihood methods for detecting adaptive evolution after gene duplication, Journal of Structural and Functional Genomics 3 (1–4) (2003) 201–212. [20] S. Bonhoeffer, A.D. Barbour, R.J.D. Boer, Procedures for reliable estimation of viral fitness from time-series data, Proceedings of the Royal Society B: Biological Sciences 269 (1503) (2002) 1887–1893. [21] L. Benvenuti, A.D. Santis, A. Farina, On model consistency in compartmental systems identification, Automatica 38 (11) (2002) 1969–1976. [22] M. Kieffer, E. Walter, Guaranteed nonlinear state estimator for cooperative systems, Numerical Algorithms 37 (1–4) (2004) 187–198. [23] K. Poolla, P. Khargonekar, A. Tikku, J. Krause, K. Nagpal, A time-domain approach to model validation, Institute of Electrical and Electronic Engineering Transactions on Automatic Control 39 (1994) 951–959. [24] L. Ljung, L. Guo, The role of model validation for assessing the size of the unmodeled dynamics, IEEE Trans. Automat. Contr. 42 (1997) 1230–1239. [25] S. Prajna, Barrier certificates for nonlinear model validation, in: Proc. of the IEEE Conference on Decision and Control, 2003. [26] C. Coffey, P. Hebert, H. Krumholz, T. Morgan, S. Williams, J. Moore, Model validation procedures in human studies of genetic interactions, Nutrition 20 (1). [27] G. Belforte, B. Bona, M. Milanese, Advanced modeling and identification techniques for metabolic processes, CRC Critical Reviews in Biomedical Engineering 10 (1) (1984) 275–316. [28] F. Hillier, G. Liebermann, Introduction to Operations Research, Mc Graw-Hill, 2002. [29] N. Megiddo, Combinatorial optimization with rational objective functions, Mathematics of Operations Research 4 (4) (1979) 414–424.