“The Systems Biology Modelling Cycle (supported by BioPreDyn)” EMBL-EBI (Cambridge, UK), 12-15 May 2014 Optimal Design of Dynamic Experiments Julio R. Banga IIM-CSIC, Vigo, Spain julio@iim.csic.es Optimal Experimental Design (OED) Introduction OED: Why, what and how? Model building cycle OED to improve model calibration Formulation and examples OED software and references Modelling – considerations Models starts with questions (purpose) and level of detail We need a priori data & knowledge to build a 1st model We then plan and perform new experiments, obtain new data and refine the model We repeat until stopping criterion satisfied Modelling – considerations We plan and perform new experiments, obtain new data and refine the model But, how do we plan these experiments? Optimal experimental design (OED) Model-based OED OED – Why? We want to build a model We want to use the model for specific purposes (“models starts with questions”) OED allow us to plan experiments that will produce data with rich information content OED - What ? We need to take into account: (i) the purpose of the model, (ii) the experimental degrees of freedom and constraints, (iii) the objective of the OED: Parameter estimation Model discrimination Model reduction … OED – simple example We want to build a 3D model of an object from 2D pictures OED – simple example We want to build a 3D model of an object from 2D pictures OED – simple example We want to build a 3D model of an object from 2D pictures (i) purpose of the model: a rough 3D representation of the object, (ii) experimental constraints: we can only take 2D snapshots (iii) degrees of freedom: pictures from any angle “Experiment” with minimum number of pictures? OED – simple example OED – basic considerations There is a minimum amount of information in the data needed to build a model This depends on: how detailed we want the model to be how complex the original object (system) is safe assumptions we can make (e.g. symmetry in 3D object -> less pictures needed) Static model Dynamic model OED for dynamic models We need time-series data with enough information to build a model Data from different contexts, with enough time resolution Reverse engineering (dynamic) Mario Dynamic process models Main characteristics: Non-linear, dynamic models (i.e. batch or semi-batch processes) Nonlinear constraints (safety and/or quality demands) Distributed systems (T, c, etc.) Coupled transport phenomena Thus, mathematical models consist of sets of ODEs, DAEs, PDAEs, or even IPDAEs, with possible logic conditions (transitions, i.e. hybrid systems) PDAEs models are usually transformed into DAEs (I.e. discretization methods, like FEM, NMOL, etc.) Model building Experiment Data Solver Fitted Model Model Model building Optimal Experimental Design Identifiability Analysis Parameter Estimation Experiment Data Solver Identifiability Analysis Model Fitted Model Model building cycle Prior information OED Parameter estimation Model selection and discrimination New experiments New data Experimental design Experimental degrees of freedom and constraints Initial conditions Dynamic stimuli: type and number of perturbations Measurements What? When? (sampling times, experiment duration) How many replicates? How many experiments? Etc. Examples Bacterial growth in batch culture 3-step pathway Oregonator Example: Bacterial growth in batch culture Concentration of microorganisms Concentration of growth limiting substrate Example: Bacterial growth in batch culture Maximum growth rate Decay rate coefficient Concentration of microorganisms Michaelis-Menten constant Yield coefficient Concentration of growth limiting substrate Example: Bacterial growth in batch culture Experimental design: Initial conditions? What to measure? (concentration of microorganisms and substrate?) When to measure? (sampling times, experiment duration) How many experiments? How many replicates? Etc. Example: Bacterial growth in batch culture Case A: 1 experiment 11 equidistant sampling times Duration: 10 hours Measurements of S and B 30 14 obsS obsB 25 12 20 10 15 8 10 6 5 4 2 0 0 5 Time 10 0 5 Time 10 Example: Bacterial growth in batch culture Parameter estimation using GO method (eSS) 30 obsB fit cb data3 25 data cs 20 15 10 5 0 0 1 2 3 4 5 Time 6 7 8 9 10 Example: Bacterial growth in batch culture Case B: 1 experiment 11 equidistant sampling times Duration: 10 hours Measurements of S only 30 obsS 25 20 15 10 5 0 1 2 3 4 5 Time 6 7 8 9 10 Example: Bacterial growth in batch culture But bad predictions for bacteria… Good fit for substrate! 30 cb obsS 25 25 20 20 15 15 Predicted 10 10 5 0 1 2 3 4 5 Time 6 7 8 9 10 Real 5 0 1 2 3 4 5 Time 6 7 8 9 10 Example: Bacterial growth in batch culture Measuring both B and S… 0.09 0.09 9 0.08 0.08 8 0.07 0.07 7 0.06 ks 10 kd kd 0.1 0.06 6 0.05 0.05 5 0.04 0.04 4 0.03 0.03 3 4 5 6 ks 7 8 9 10 3 0.2 0.3 0.4 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.7 0.8 0.2 0.5 0.4 0.4 0.3 0.3 0.08 0.1 0.3 0.4 0.5 mumax 0.6 0.7 0.8 yield vsmumax 1 0.8 0.6 0.5 0.06 kd 0.6 yield 1 yield 1 0.04 0.5 mumax yield vsks yield vskd yield ks vsmumax kd vsmumax kd vsks 0.1 0.6 0.4 3 4 5 6 ks 7 8 9 10 0.2 0.4 0.6 mumax 0.8 Example: Bacterial growth in batch culture Measuring only S… 10 0.08 0.08 8 0.06 0.04 ks 0.1 kd kd ks vsmumax kd vsmumax kd vsks 0.1 0.06 4 0.04 3 4 5 6 ks 7 8 9 10 6 0.2 0.4 0.6 0.2 0.8 0.4 yield vsks yield vskd 0.6 0.8 mumax mumax yield vsmumax 0.9 0.9 0.8 0.8 0.8 0.7 0.7 0.7 0.6 yield 0.9 yield yield 1 0.6 0.6 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.04 0.06 kd 0.08 0.1 3 4 5 6 ks 7 8 9 10 0.2 0.3 0.4 0.5 mumax 0.6 0.7 0.8 Example: Bacterial growth in batch culture So, for this case of 1 experiment, we should measure both B and S But confidence intervals are rather large… mmax : 4.0940e-001 Ks Kd Y : 6.6525e+000 : 3.9513e-002 : 4.8276e-001 9.4155e-002 3.3475e+000 7.0150e-002 1.6667e-001 (23%); (50%); (177%); (34%); What happens if we consider a second experiment? (same experiment, but with different initial condition for S) Example: Bacterial growth in batch culture 1st experiment 2nd experiment 8 30 obsB 15 obsS obsB obsS 14 7 25 12 6 20 10 8 15 6 10 4 10 5 4 5 3 5 2 2 0 2 4 6 8 10 0 Time 2 4 6 8 10 0 2 Ks Kd Y : 5.3551e+000 : 4.1657e-002 : 4.8529e-001 6 Time Time mmax : 3.9542e-001 4 3.4730e-002 9.1440e-001 2.5753e-002 6.1227e-002 (9%) (17%) (62%) (13%); Great improvement with a second experiment ! BUT, can we do even better? 8 10 0 0 2 4 6 Time 8 10 Example: Bacterial growth in batch culture 1st experiment 2nd experiment 8 30 obsB 15 obsS obsB obsS 14 7 25 12 6 20 10 8 15 6 10 4 10 5 4 5 3 5 2 2 0 2 4 6 8 10 0 Time 2 4 6 8 10 0 2 Ks Kd Y : 5.3551e+000 : 4.1657e-002 : 4.8529e-001 6 Time Time mmax : 3.9542e-001 4 3.4730e-002 9.1440e-001 2.5753e-002 6.1227e-002 (9%) (17%) (62%) (13%); Great improvement with a second experiment ! BUT, can we do even better? OPTIMAL EXPERIMENTAL DESIGN 8 10 0 0 2 4 6 Time 8 10 Example: simple biochemical pathway C.G. Moles, P. Mendes y J.R. Banga, 2003. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Research., 13:2467-2474. Kinetics described by set of 8 ODEs with 36 parameters Example: simple biochemical pathway Parameter estimation: 36 parameters measurements: concentrations of 8 species 16 experiments (different values of S y P) Example: simple biochemical pathway Experiments (S, P values) Initial conditions for all the experiments 21 measurements per experiment , tf = 120 s Example: simple biochemical pathway Multi-start local methods fail… Multi-start SQP Example: simple biochemical pathway Parameter estimation: again (some) global methods can fail too… Example: simple biochemical pathway 0.5 2.5 0.4 2 Concentración M2 Concentración E1 Parameter estimation: best fit looks pretty good but… 0.3 0.2 0.1 0 0 20 40 60 tiempo 80 100 120 1.5 1 0.5 0 0 20 40 60 tiempo 80 100 120 Example: simple biochemical pathway Identifiability problems… Contours [p1, p6] Contours [p1, p4] Practical identifiability problems are often due to data with poor information content Need: more informative experiments (data sets) Solution: optimal design of (dynamic) experiments What about identifiability? I.e. can the parameters be estimated in a unique way? Identifiability: A. Global a priori (theoretical, structural) B. Local a priori (local) C. Local a posteriori (practical) (A) is hard to evaluate for realistic nonlinear models (B) and (C) can be estimated via the FIM and other indexes... (C) takes into account noise etc. Parametric sensitivities xi Sij p j x x t ,p0 ,p p0 Fisher information matrix (FIM) N FIM SIT ti W 1ti SI ti i 1 Checking identifiability and other indexes… Compute sensitivities (direct decoupled method) : x Sij i p j x x t ,p0 ,p p0 Build FIM , covariance and correlation matrices N FIM i 1 SIT ti W ti SI ti 1 C FIM 1 Rij Cij C ii C jj Analyse possible correlations among parameters , i j ; Rij 1, i j Sensitivities w.r.t. p1 and p6 are highly correlated (i.e. The system exhibits rather similar responses to changes in p1 and p6 for the given experimental design) p1 & p6 p1 & p4 Checking identifiability and other indexes… Compute sensitivities (direct decoupled method) : xi Sij p j x x t ,p0 ,p p0 Build FIM , covariance and correlation matrices N FIM SIT ti W 1ti SI ti i 1 C FIM 1 Rij Cij C ii C jj , i j ; Rij 1, i j Analyse possible correlations among parameters Compute confidence intervals Check FIM-based criterions (practical identifiability) Singular FIM: unidentifiable parameters, non-informative experiments Large condition number of FIM means lower practical identifiability Rank parameters v Finally, use OED to improve experimental design (New) Experiments Optimal experimental design Model calibration Model structure & parameters Practical Identifiability analysis Ranking of parameters Parametric sensitivities Balsa-Canto, E., Alonso, A. A., & Banga, J. R. (2010). An iterative identification procedure for dynamic modeling of biochemical networks. BMC Systems Biology 4:11 Optimal (dynamic) experimental design Design the most informative experiments, facilitating parameter estimation and improving identifiability How? Define information criterion Optimize it modulating experimental conditions “If you want to truly understand something, try to change it.” Kurt Lewin, circa 1951 Optimal (dynamic) experimental design Computational approaches that are applicable to support the optimal design of experiments in terms of •how to manipulate the degrees of freedom (controls) of experiments, •what variables to measure, •why to measure them, •when to take measurements. Experiment Data Optimal (dynamic) experimental design Information content measured with the FIM N FIM SIT ti W 1ti SI ti i 1 We will use scalar functions of the FIM (“alphabetical” criteria) Find experiments which maximize information content Some FIM-based Criterions... • D-criterion (determinant of F), which measures the global accuracy of the estimated parameters max J F • E-criterion (smallest eigenvalue of F), which measures largest error max J min FIM • Modified E-criterion (condition number of F), which measures the parameter decorrelation min J F max (F) / min (F) • A-criterion (trace of inverse of F), which measures the arithmetic mean of estimation error J trace(F 1 ) A criterion = mintrace FIM 1 D criterion = maxdetFIM E criterion = maxminFIM FIM Modified-E criterion = min maxFIM min E criterion = maxminFIM E-optimality: max the min eigenvalue of FIM (minimizes the largest error) Modified-E criterion max FIM min = minFIM Maximize decorrelation between parameters (make contours as circular as possible) OED as a dynamic optimization problem Calculate the dynamic scheme of measurements so as to generate the maximum amount and quality of information for model calibration purposes. Which type of dynamic stimuli? When to measure? (Optimal sampling times) OED as a dynamic optimization problem Calculate time-varying control profiles (u(t)), sampling times, experiment duration and initial conditions (v) to optimize a performance index (scalar measure of the FIM): System dynamics (ODEs, PDEs): Experimental constraints: Back to example: Bacterial growth in batch culture Experimental design: Initial conditions? What to measure? (concentration of microorganisms and substrate?) When to measure? (sampling times, experiment duration) How many experiments? How many replicates? Etc. Example: Bacterial growth in batch culture 1st experiment 2nd experiment 8 30 obsB 15 obsS obsB obsS 14 7 25 12 6 20 10 8 15 6 10 4 10 5 4 5 3 5 2 2 0 2 4 6 8 10 0 Time 2 4 6 8 10 0 2 Ks Kd Y : 5.3551e+000 : 4.1657e-002 : 4.8529e-001 6 Time Time mmax : 3.9542e-001 4 3.4730e-002 9.1440e-001 2.5753e-002 6.1227e-002 (9%) (17%) (62%) (13%); Great improvement with a second experiment ! BUT, can we do even better? OPTIMAL EXPERIMENTAL DESIGN 8 10 0 0 2 4 6 Time 8 10 Example: Bacterial growth in batch culture Let us design the second experiment in an optimal way: Criteria: E-optimality (minimize the largest error) Degress of freedom we can ‘manipulate’ in the second experiment: •Initial concentrations of S and B •Duration of experiment Example: Bacterial growth in batch culture OED of second experiment 1st experiment 2nd experiment after OED 30 obsB obsS 5 obsB 2.4 14 4.5 25 4 2.2 12 3.5 20 10 obsS 2 3 8 15 1.8 6 10 1.6 2.5 2 1.5 4 1.4 5 2 1 0.5 1.2 0 2 4 6 8 10 0 Time 2 4 6 8 10 0 5 Time mmax : 3.9950e-001 Ks Kd Y : 4.9530e+000 : 5.0859e-002 : 5.0544e-001 10 Time 1.7133e-002 2.9647e-001 2.9936e-003 1.4074e-002 (4.3%) (6%) (6%) (2.8%); free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11 15 0 5 10 Time 15 Example: Bacterial growth in batch culture Two arbitrary experiments mmax : 3.9542e-001 Ks Kd Y : 5.3551e+000 : 4.1657e-002 : 4.8529e-001 3.4730e-002 9.1440e-001 2.5753e-002 6.1227e-002 (9%) (17%) (62%) (13%); After OED of second experiment mmax : 3.9950e-001 Ks Kd Y : 4.9530e+000 : 5.0859e-002 : 5.0544e-001 1.7133e-002 2.9647e-001 2.9936e-003 1.4074e-002 (4.3%) (6%) (6%) (2.8%); free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11 Example: Bacterial growth in batch culture Correlation matrix after OED of second experiment Crammer Rao based correlation matrix for global unknowns Crammer Rao based correlation matrix for global unknowns 1 1 0.8 yield 0.8 yield 0.6 0.6 0.4 0.4 kd kd 0.2 0.2 0 0 -0.2 ks -0.2 ks -0.4 -0.4 -0.6 -0.6 mumax mumax -0.8 -0.8 mumax ks kd yield -1 mumax free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11 ks kd yield -1 Example: Bacterial growth in batch culture After OED of second experiment… ks kd vs kd 2.5 2 2 2 1.5 1.5 1.5 1 1 yield 2.5 kd 2.5 0.5 0 0 0 -0.5 -0.5 -0.5 -1 -1 -0.5 0 0.5 1 mumax 1.5 2 2.5 3 -1 -1 -0.5 0 0.5 1 ks 1.5 2 2.5 3 -1 -1 1.04 1.04 1.02 1.02 1 1 0.98 0.98 0.96 0.96 0.94 0.94 1.05 0 0.5 1 kd 1.5 2 2.5 3 > Kd and Y are still highly correlated but the size of the confidence ellipse is much smaller vs kd kd vs yield 1.05 yield 1.06 kd 1.06 1 mumax -0.5 ks mumax vs kd 0.95 > Correlation between parameters has substantially improved (in general) 1 0.5 0.5 vs yield 3 3 kd kd mumax vs kd 3 1 0.95 0.95 1 ks 1.05 free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11 0.95 1 kd 1.05 Example: Bacterial growth in batch culture mmax : 3.9950e-001 Ks Kd Y : 4.9530e+000 : 5.0859e-002 : 5.0544e-001 1.7133e-002 2.9647e-001 2.9936e-003 1.4074e-002 Monte-Carlo based confidence interval Monte-Carlo based confidence interval 60 50 5% 50 Robust confidence intervals are similar to those obtained by the FIM (4.3%) (6%) (6%) (2.8%); 6% 40 40 30 30 20 20 10 10 0 0.37 0.38 0.39 0.4 mumax 0.41 0.42 0.43 0 0.048 0.049 0.05 0.051 kd 0.052 0.053 0.054 Monte-Carlo based confidence interval Monte-Carlo based confidence interval 50 60 7% 50 3% 40 40 30 30 20 20 10 10 0 4.4 4.6 4.8 5 ks 5.2 5.4 0 0.48 0.49 0.5 0.51 yield 0.52 0.53 0.54 Example: OED for the simple biochemical pathway Moles, C. G., Pedro Mendes and Julio R. Banga (2003) Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Research 13(11):2467-2474 Original experimental design: 16 experiments (different S and P values) Result: large E-criterion and modified E-criterion Some FIM-based criterions for the original design... Very large modified E-criterion indicates large correlation among (some) parameters, making the identification of the system hard Can we improve this by an alternative (optimal) design of experiments? Improve experimental design by solving OED Find the values of S and P for a set of Nexp experiments which e.g. maximize E-criterion s.t. constraints (dynamics plus bounds) Improved experimental design: 16 experiments with optimal S and P E-criterion improved (> one order of magnitude) Other criteria also improved Improved design by solving OED problem: Improved design by solving OED problem: Original Design (E-crit= 60, logD=161) Improved Design (E-crit= 320, logD=181) Example: OED for Oregonator reaction The Oregonator is the simplest realistic model of the chemical dynamics of the oscillatory Belousov-Zhabotinsky (BZ) reaction (Zhabotinsky, 1991; Gray and Scott, 1991; Epstein and Pojman, 1998) Oregonator reaction: highly nonlinear, oscillatory kinetics Oregonator reaction: identifiability problems Villaverde, A., J. Ross, F. Morán, E. Balsa-Canto, J.R. Banga (2011) Use of a Generalized Fisher Equation for Global Optimization in Chemical Kinetics.Journal of Physical Chemistry A115(30):8426-8436. Oregonator reaction: OED E-optimality criterion Example: OED for Oregonator reaction E-optimality criterion: improved 3 orders of magnitude Villaverde, A., J. Ross, F. Morán, E. Balsa-Canto, J.R. Banga (2011) Use of a Generalized Fisher Equation for Global Optimization in Chemical Kinetics.Journal of Physical Chemistry A115(30):8426-8436. Example: OED for Oregonator reaction OED conclusions Currently, most experiments are designed based on intuition of experimentalists and modellers Model-based OED can be used to: Improve model calibration Discriminate between rival models OED is a systematic and optimal approach OED can take into account practical limitations and constraints by incorporating them into the formulation Take-home messages Main tips for dynamic model building Check identifiability Use proper optimization methods for parameter estimation Use optimal experimental design Main tips for dynamic model building “All models are wrong, but some are useful” --- Statistician George E. P. Box Main tips for dynamic model building “All models are wrong, but some are useful” --- Statistician George E. P. Box The practical question is: How wrong do they have to be to not be useful? Software for dynamic model building and OED http://www.iim.csic.es/~amigo/ A few selected references… Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA & Blom JG (2009a). Systems biology: parameter estimation for biochemical models. FEBS J 276: 886–902. Balsa-Canto, E. and Julio R. Banga (2011) AMIGO, a toolbox for Advanced Model Identification in systems biology using Global Optimization. Bioinformatics 27(16):2311-2313. Balsa-Canto, E., Alonso, A. A., & Banga, J. R. (2010). An iterative identification procedure for dynamic modeling of biochemical networks. BMC Systems Biology 4:11. Bandara, S., Schlöder, J. P., Eils, R., Bock, H. G., & Meyer, T. (2009). Optimal experimental design for parameter estimation of a cell signaling model. PLoS computational biology, 5(11), e1000558. Banga, J.R. and E. Balsa-Canto (2008) Parameter estimation and optimal experimental design. Essays in Biochemistry 45:195–210. Balsa-Canto, E., A.A. Alonso and J.R. Banga (2008) Computational Procedures for Optimal Experimental Design in Biological Systems. IET Systems Biology 2(4):163-172. Chen BH, Asprey SP (2003) On the Design of Optimally Informative Dynamic Experiments for Model Discrimination in Multiresponse Nonlinear Situations. Ind Eng Chem Res 2003, 42:1379-1390. Jaqaman K., Danuser G. Linking data to models: data regression. Nat. Rev. Mol. Cell Bio.7:813-819. Kremling A, Saez-Rodriguez J: Systems Biology - An engineering perspective. J Biotechnol 2007, 129:329-351 Mélykúti, B., E. August, A. Papachristodoulou and H. El-Samad (2010) Discriminating between rival biochemical network models: three approaches to optimal experiment design. BMC Systems Biology 4:38. van Riel N (2006) Dynamic modelling and analysis of biochemical networks: Mechanism-based models and model-based experiments. Brief Bioinform 7(4):364-374. Villaverde, A.F. and J.R. Banga (2014) Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J. Royal Soc. Interface 11(91):20130505 (review papers in yellow) http://www.iim.csic.es/~gingproc/software.html Thank you! More info? julio@iim.csic.es