Optimal design of dynamic experiments

advertisement
“The Systems Biology Modelling Cycle (supported by BioPreDyn)”
EMBL-EBI (Cambridge, UK), 12-15 May 2014
Optimal Design of Dynamic
Experiments
Julio R. Banga
IIM-CSIC, Vigo, Spain
julio@iim.csic.es
Optimal Experimental Design (OED)




Introduction
OED: Why, what and how?
Model building cycle
OED to improve model calibration


Formulation and examples
OED software and references
Modelling – considerations




Models starts with questions (purpose)
and level of detail
We need a priori data & knowledge to
build a 1st model
We then plan and perform new
experiments, obtain new data and
refine the model
We repeat until stopping criterion satisfied
Modelling – considerations


We plan and perform new experiments,
obtain new data and refine the model
But, how do we plan these experiments?

Optimal experimental design (OED)

Model-based OED
OED – Why?



We want to build a model
We want to use the model for specific
purposes (“models starts with questions”)
OED allow us to plan experiments that
will produce data with rich information
content
OED - What ?

We need to take into account:



(i) the purpose of the model,
(ii) the experimental degrees of freedom and
constraints,
(iii) the objective of the OED:
Parameter estimation
 Model discrimination
 Model reduction
 …

OED – simple example

We want to build a 3D model of an object
from 2D pictures
OED – simple example

We want to build a 3D model of an object
from 2D pictures
OED – simple example

We want to build a 3D model of an object from
2D pictures




(i) purpose of the model: a rough 3D representation
of the object,
(ii) experimental constraints: we can only take 2D
snapshots
(iii) degrees of freedom: pictures from any angle
“Experiment” with minimum number of pictures?
OED – simple example
OED – basic considerations


There is a minimum amount of
information in the data needed to build
a model
This depends on:



how detailed we want the model to be
how complex the original object (system) is
safe assumptions we can make (e.g. symmetry in
3D object -> less pictures needed)
Static model
Dynamic model
OED for dynamic models


We need time-series data with enough
information to build a model
Data from different contexts, with enough
time resolution
Reverse engineering (dynamic) Mario
Dynamic process models
Main characteristics:

Non-linear, dynamic models (i.e. batch or semi-batch processes)

Nonlinear constraints (safety and/or quality demands)

Distributed systems (T, c, etc.)

Coupled transport phenomena
Thus, mathematical models consist of sets of ODEs, DAEs, PDAEs,
or even IPDAEs, with possible logic conditions (transitions, i.e.
hybrid systems)
 PDAEs models are usually transformed into DAEs (I.e. discretization
methods, like FEM, NMOL, etc.)
Model building
Experiment
Data
Solver
Fitted Model
Model
Model building
Optimal Experimental Design
Identifiability Analysis
Parameter
Estimation
Experiment
Data
Solver
Identifiability Analysis
Model
Fitted Model
Model building cycle
Prior
information
OED
Parameter
estimation
Model
selection and
discrimination
New
experiments
New data
Experimental design
 Experimental degrees of freedom and constraints
Initial conditions
Dynamic stimuli: type and number of perturbations
 Measurements
What?
When? (sampling times, experiment duration)
How many replicates?
 How many experiments?
Etc.
Examples

Bacterial growth in batch culture

3-step pathway

Oregonator
Example: Bacterial growth in batch culture
Concentration of
microorganisms
Concentration of
growth limiting
substrate
Example: Bacterial growth in batch culture
Maximum growth
rate
Decay rate
coefficient
Concentration of
microorganisms
Michaelis-Menten
constant
Yield
coefficient
Concentration of
growth limiting
substrate
Example: Bacterial growth in batch culture
Experimental design:
Initial conditions?
What to measure? (concentration of microorganisms and substrate?)
When to measure? (sampling times, experiment duration)
How many experiments?
How many replicates?
Etc.
Example: Bacterial growth in batch culture
Case A:
1 experiment
11 equidistant sampling times
Duration: 10 hours
Measurements of S and B
30
14
obsS
obsB
25
12
20
10
15
8
10
6
5
4
2
0
0
5
Time
10
0
5
Time
10
Example: Bacterial growth in batch culture
Parameter estimation using GO method (eSS)
30
obsB
fit cb
data3
25
data cs
20
15
10
5
0
0
1
2
3
4
5
Time
6
7
8
9
10
Example: Bacterial growth in batch culture
Case B:
1 experiment
11 equidistant sampling times
Duration: 10 hours
Measurements of S only
30
obsS
25
20
15
10
5
0
1
2
3
4
5
Time
6
7
8
9
10
Example: Bacterial growth in batch culture
But bad predictions for bacteria…
Good fit for substrate!
30
cb
obsS
25
25
20
20
15
15
Predicted
10
10
5
0
1
2
3
4
5
Time
6
7
8
9
10
Real
5
0
1
2
3
4
5
Time
6
7
8
9
10
Example: Bacterial growth in batch culture
Measuring both B and S…
0.09
0.09
9
0.08
0.08
8
0.07
0.07
7
0.06
ks
10
kd
kd
0.1
0.06
6
0.05
0.05
5
0.04
0.04
4
0.03
0.03
3
4
5
6
ks
7
8
9
10
3
0.2
0.3
0.4
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.7
0.8
0.2
0.5
0.4
0.4
0.3
0.3
0.08
0.1
0.3
0.4
0.5
mumax
0.6
0.7
0.8
yield vsmumax
1
0.8
0.6
0.5
0.06
kd
0.6
yield
1
yield
1
0.04
0.5
mumax
yield vsks
yield vskd
yield
ks vsmumax
kd vsmumax
kd vsks
0.1
0.6
0.4
3
4
5
6
ks
7
8
9
10
0.2
0.4
0.6
mumax
0.8
Example: Bacterial growth in batch culture
Measuring only S…
10
0.08
0.08
8
0.06
0.04
ks
0.1
kd
kd
ks vsmumax
kd vsmumax
kd vsks
0.1
0.06
4
0.04
3
4
5
6
ks
7
8
9
10
6
0.2
0.4
0.6
0.2
0.8
0.4
yield vsks
yield vskd
0.6
0.8
mumax
mumax
yield vsmumax
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
yield
0.9
yield
yield
1
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.04
0.06
kd
0.08
0.1
3
4
5
6
ks
7
8
9
10
0.2
0.3
0.4
0.5
mumax
0.6
0.7
0.8
Example: Bacterial growth in batch culture
So, for this case of 1 experiment, we should measure both B and S
But confidence intervals are rather large…
mmax : 4.0940e-001
Ks
Kd
Y
: 6.6525e+000
: 3.9513e-002
: 4.8276e-001
9.4155e-002
3.3475e+000
7.0150e-002
1.6667e-001
(23%);
(50%);
(177%);
(34%);
What happens if we consider a second experiment?
(same experiment, but with different initial condition for S)
Example: Bacterial growth in batch culture
1st experiment
2nd experiment
8
30
obsB
15
obsS
obsB
obsS
14
7
25
12
6
20
10
8
15
6
10
4
10
5
4
5
3
5
2
2
0
2
4
6
8
10
0
Time
2
4
6
8
10
0
2
Ks
Kd
Y
: 5.3551e+000
: 4.1657e-002
: 4.8529e-001
6
Time
Time
mmax : 3.9542e-001
4
3.4730e-002
9.1440e-001
2.5753e-002
6.1227e-002
(9%)
(17%)
(62%)
(13%);
Great improvement with a second experiment !
BUT, can we do even better?
8
10
0
0
2
4
6
Time
8
10
Example: Bacterial growth in batch culture
1st experiment
2nd experiment
8
30
obsB
15
obsS
obsB
obsS
14
7
25
12
6
20
10
8
15
6
10
4
10
5
4
5
3
5
2
2
0
2
4
6
8
10
0
Time
2
4
6
8
10
0
2
Ks
Kd
Y
: 5.3551e+000
: 4.1657e-002
: 4.8529e-001
6
Time
Time
mmax : 3.9542e-001
4
3.4730e-002
9.1440e-001
2.5753e-002
6.1227e-002
(9%)
(17%)
(62%)
(13%);
Great improvement with a second experiment !
BUT, can we do even better?
OPTIMAL EXPERIMENTAL DESIGN
8
10
0
0
2
4
6
Time
8
10
Example: simple biochemical pathway
C.G. Moles, P. Mendes y J.R. Banga, 2003. Parameter estimation in biochemical pathways: a comparison of global optimization methods.
Genome Research., 13:2467-2474.
Kinetics described by set of 8 ODEs with 36 parameters
Example: simple biochemical pathway
Parameter estimation:
36 parameters
measurements: concentrations of 8 species
16 experiments (different values of S y P)
Example: simple biochemical pathway
Experiments (S, P values)
Initial conditions
for all the experiments
21 measurements per experiment , tf = 120 s
Example: simple biochemical pathway
Multi-start local methods fail…
Multi-start SQP
Example: simple biochemical pathway
Parameter estimation: again (some) global methods can fail too…
Example: simple biochemical pathway
0.5
2.5
0.4
2
Concentración M2
Concentración E1
Parameter estimation: best fit looks pretty good but…
0.3
0.2
0.1
0
0
20
40
60
tiempo
80
100
120
1.5
1
0.5
0
0
20
40
60
tiempo
80
100
120
Example: simple biochemical pathway
Identifiability problems…
Contours [p1, p6]
Contours [p1, p4]
Practical identifiability problems are often due to
data with poor information content
Need: more informative experiments (data sets)
Solution: optimal design of (dynamic) experiments
What about identifiability?
I.e. can the parameters be estimated in a unique way?
Identifiability:
A. Global a priori (theoretical, structural)
B. Local a priori (local)
C. Local a posteriori (practical)
(A) is hard to evaluate for realistic nonlinear models
(B) and (C) can be estimated via the FIM and other indexes...
(C) takes into account noise etc.
Parametric sensitivities
 xi
Sij  
 p
 j



 x  x t ,p0 ,p  p0
Fisher information matrix (FIM)
N
FIM   SIT ti  W 1ti  SI ti 
i 1
Checking identifiability and other indexes…
Compute sensitivities (direct decoupled method) :
 x
Sij   i
 p
 j



 x  x t ,p0 ,p  p0
Build FIM , covariance and correlation matrices
N
FIM  
i 1
SIT
ti  W ti  SI ti 
1
C  FIM
1
Rij 
Cij
C ii C jj
Analyse possible correlations among parameters
, i  j ; Rij  1, i  j
Sensitivities w.r.t. p1 and p6 are highly correlated
(i.e. The system exhibits rather similar responses to
changes in p1 and p6 for the given experimental design)
p1 & p6
p1 & p4
Checking identifiability and other indexes…
Compute sensitivities (direct decoupled method) :
 xi
Sij  
 p
 j



 x  x t ,p0 ,p  p0
Build FIM , covariance and correlation matrices
N
FIM   SIT ti  W 1ti  SI ti 
i 1
C  FIM
1
Rij 
Cij
C ii C jj
, i  j ; Rij  1, i  j
Analyse possible correlations among parameters
Compute confidence intervals
Check FIM-based criterions (practical identifiability)

Singular FIM: unidentifiable parameters, non-informative experiments

Large condition number of FIM means lower practical identifiability
Rank parameters
v Finally, use OED to improve experimental design
(New)
Experiments
Optimal
experimental
design
Model
calibration
Model structure
& parameters
Practical
Identifiability analysis
Ranking of
parameters
Parametric
sensitivities
Balsa-Canto, E., Alonso, A. A., & Banga, J. R. (2010). An iterative identification procedure for dynamic modeling of biochemical
networks. BMC Systems Biology 4:11
Optimal (dynamic) experimental design
Design the most informative experiments, facilitating parameter
estimation and improving identifiability
How?
Define information criterion
Optimize it modulating experimental conditions
“If you want to truly understand something, try to change it.”
Kurt Lewin, circa 1951
Optimal (dynamic) experimental design
Computational approaches that are applicable to support the optimal design of
experiments in terms of
•how to manipulate the degrees of freedom (controls) of experiments,
•what variables to measure,
•why to measure them,
•when to take measurements.
Experiment
Data
Optimal (dynamic) experimental design
 Information content measured with the FIM
N
FIM   SIT ti  W 1ti  SI ti 
i 1
 We will use scalar functions of the FIM (“alphabetical” criteria)
 Find experiments which maximize information content
Some FIM-based Criterions...
• D-criterion (determinant of F), which measures the global accuracy of the estimated parameters
max J  F
• E-criterion (smallest eigenvalue of F), which measures largest error
max J  min FIM 
• Modified E-criterion (condition number of F), which measures the parameter decorrelation
min J  F  max (F) / min (F)
• A-criterion (trace of inverse of F), which measures the arithmetic mean of estimation error
J  trace(F 1 )
A criterion = mintrace FIM 1
D criterion = maxdetFIM 
E criterion = maxminFIM
FIM 
Modified-E criterion = min maxFIM


min

E criterion = maxminFIM
E-optimality: max the min eigenvalue of FIM
(minimizes the largest error)
Modified-E criterion
 max FIM 
min
=  minFIM  
Maximize decorrelation between parameters
(make contours as circular as possible)
OED as a dynamic optimization problem
 Calculate the dynamic scheme of measurements so as to
generate the maximum amount and quality of information
for model calibration purposes.
Which type of dynamic stimuli?
When to measure? (Optimal sampling times)
OED as a dynamic optimization problem
Calculate time-varying control profiles (u(t)), sampling
times, experiment duration and initial conditions (v) to
optimize a performance index (scalar measure of the FIM):
System dynamics (ODEs, PDEs):
Experimental constraints:
Back to example: Bacterial growth in batch culture
Experimental design:
Initial conditions?
What to measure? (concentration of microorganisms and substrate?)
When to measure? (sampling times, experiment duration)
How many experiments?
How many replicates?
Etc.
Example: Bacterial growth in batch culture
1st experiment
2nd experiment
8
30
obsB
15
obsS
obsB
obsS
14
7
25
12
6
20
10
8
15
6
10
4
10
5
4
5
3
5
2
2
0
2
4
6
8
10
0
Time
2
4
6
8
10
0
2
Ks
Kd
Y
: 5.3551e+000
: 4.1657e-002
: 4.8529e-001
6
Time
Time
mmax : 3.9542e-001
4
3.4730e-002
9.1440e-001
2.5753e-002
6.1227e-002
(9%)
(17%)
(62%)
(13%);
Great improvement with a second experiment !
BUT, can we do even better?
OPTIMAL EXPERIMENTAL DESIGN
8
10
0
0
2
4
6
Time
8
10
Example: Bacterial growth in batch culture
Let us design the second experiment in an optimal way:
Criteria: E-optimality (minimize the largest error)
Degress of freedom we can ‘manipulate’ in the second
experiment:
•Initial concentrations of S and B
•Duration of experiment
Example: Bacterial growth in batch culture
OED of second experiment
1st experiment
2nd experiment after OED
30
obsB
obsS
5
obsB
2.4
14
4.5
25
4
2.2
12
3.5
20
10
obsS
2
3
8
15
1.8
6
10
1.6
2.5
2
1.5
4
1.4
5
2
1
0.5
1.2
0
2
4
6
8
10
0
Time
2
4
6
8
10
0
5
Time
mmax : 3.9950e-001
Ks
Kd
Y
: 4.9530e+000
: 5.0859e-002
: 5.0544e-001
10
Time
1.7133e-002
2.9647e-001
2.9936e-003
1.4074e-002
(4.3%)
(6%)
(6%)
(2.8%);
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
15
0
5
10
Time
15
Example: Bacterial growth in batch culture
Two arbitrary experiments
mmax : 3.9542e-001
Ks
Kd
Y
: 5.3551e+000
: 4.1657e-002
: 4.8529e-001
3.4730e-002
9.1440e-001
2.5753e-002
6.1227e-002
(9%)
(17%)
(62%)
(13%);
After OED of second experiment
mmax : 3.9950e-001
Ks
Kd
Y
: 4.9530e+000
: 5.0859e-002
: 5.0544e-001
1.7133e-002
2.9647e-001
2.9936e-003
1.4074e-002
(4.3%)
(6%)
(6%)
(2.8%);
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
Example: Bacterial growth in batch culture
Correlation matrix after OED of second experiment
Crammer Rao based correlation matrix for global unknowns
Crammer Rao based correlation matrix for global unknowns
1
1
0.8
yield
0.8
yield
0.6
0.6
0.4
0.4
kd
kd
0.2
0.2
0
0
-0.2
ks
-0.2
ks
-0.4
-0.4
-0.6
-0.6
mumax
mumax
-0.8
-0.8
mumax
ks
kd
yield
-1
mumax
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
ks
kd
yield
-1
Example: Bacterial growth in batch culture
After OED of second experiment…
ks
kd
vs kd
2.5
2
2
2
1.5
1.5
1.5
1
1
yield
2.5
kd
2.5
0.5
0
0
0
-0.5
-0.5
-0.5
-1
-1
-0.5
0
0.5
1
mumax
1.5
2
2.5
3
-1
-1
-0.5
0
0.5
1
ks
1.5
2
2.5
3
-1
-1
1.04
1.04
1.02
1.02
1
1
0.98
0.98
0.96
0.96
0.94
0.94
1.05
0
0.5
1
kd
1.5
2
2.5
3
> Kd and Y are still highly
correlated but the size of the
confidence ellipse is much
smaller
vs kd
kd
vs yield
1.05
yield
1.06
kd
1.06
1
mumax
-0.5
ks
mumax vs kd
0.95
> Correlation between
parameters has substantially
improved (in general)
1
0.5
0.5
vs yield
3
3
kd
kd
mumax vs kd
3
1
0.95
0.95
1
ks
1.05
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
0.95
1
kd
1.05
Example: Bacterial growth in batch culture
mmax : 3.9950e-001
Ks
Kd
Y
: 4.9530e+000
: 5.0859e-002
: 5.0544e-001
1.7133e-002
2.9647e-001
2.9936e-003
1.4074e-002
Monte-Carlo based confidence interval
Monte-Carlo based confidence interval
60
50
5%
50
Robust
confidence
intervals are
similar to those
obtained by the
FIM
(4.3%)
(6%)
(6%)
(2.8%);
6%
40
40
30
30
20
20
10
10
0
0.37
0.38
0.39
0.4
mumax
0.41
0.42
0.43
0
0.048
0.049
0.05
0.051
kd
0.052
0.053
0.054
Monte-Carlo based confidence interval
Monte-Carlo based confidence interval
50
60
7%
50
3%
40
40
30
30
20
20
10
10
0
4.4
4.6
4.8
5
ks
5.2
5.4
0
0.48
0.49
0.5
0.51
yield
0.52
0.53
0.54
Example: OED for the simple biochemical pathway
Moles, C. G., Pedro Mendes and Julio R. Banga (2003) Parameter estimation in biochemical pathways: a comparison of global optimization
methods. Genome Research 13(11):2467-2474
Original experimental design:
 16 experiments (different S and P values)
 Result: large E-criterion and modified E-criterion
Some FIM-based criterions for the original design...
Very large modified E-criterion indicates large correlation
among (some) parameters, making the identification of the
system hard
Can we improve this by an alternative (optimal) design of
experiments?
Improve experimental design by solving OED
Find the values of S and P for a set of Nexp experiments which
e.g. maximize E-criterion s.t. constraints (dynamics plus
bounds)
Improved experimental design:
16 experiments with optimal S and P
E-criterion improved (> one order of magnitude)
Other criteria also improved
Improved design by solving OED problem:
Improved design by solving OED problem:
Original Design (E-crit= 60, logD=161)
Improved Design (E-crit= 320, logD=181)
Example: OED for Oregonator reaction
The Oregonator is the simplest realistic model of the chemical dynamics of the
oscillatory Belousov-Zhabotinsky (BZ) reaction
(Zhabotinsky, 1991; Gray and Scott, 1991; Epstein and Pojman, 1998)
Oregonator reaction: highly nonlinear, oscillatory kinetics
Oregonator reaction: identifiability problems
Villaverde, A., J. Ross, F. Morán, E. Balsa-Canto, J.R. Banga (2011) Use of a Generalized Fisher Equation for Global Optimization in Chemical
Kinetics.Journal of Physical Chemistry A115(30):8426-8436.
Oregonator reaction: OED
E-optimality criterion
Example: OED for Oregonator reaction
E-optimality criterion: improved 3 orders of magnitude
Villaverde, A., J. Ross, F. Morán, E. Balsa-Canto, J.R. Banga (2011) Use of a Generalized Fisher Equation for Global Optimization in Chemical
Kinetics.Journal of Physical Chemistry A115(30):8426-8436.
Example: OED for Oregonator reaction
OED conclusions

Currently, most experiments are designed based on intuition
of experimentalists and modellers

Model-based OED can be used to:


Improve model calibration
Discriminate between rival models

OED is a systematic and optimal approach

OED can take into account practical limitations and
constraints by incorporating them into the formulation
Take-home messages
Main tips for dynamic model building
Check identifiability
Use proper optimization methods for parameter estimation
Use optimal experimental design
Main tips for dynamic model building
“All models are wrong, but
some are useful”
--- Statistician George E. P. Box
Main tips for dynamic model building
“All models are wrong, but
some are useful”
--- Statistician George E. P. Box
The practical question is:
How wrong do they have to be
to not be useful?
Software for dynamic model
building and OED
http://www.iim.csic.es/~amigo/
A few selected references…













Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA & Blom JG (2009a). Systems biology: parameter
estimation for biochemical models. FEBS J 276: 886–902.
Balsa-Canto, E. and Julio R. Banga (2011) AMIGO, a toolbox for Advanced Model Identification in
systems biology using Global Optimization. Bioinformatics 27(16):2311-2313.
Balsa-Canto, E., Alonso, A. A., & Banga, J. R. (2010). An iterative identification procedure for
dynamic modeling of biochemical networks. BMC Systems Biology 4:11.
Bandara, S., Schlöder, J. P., Eils, R., Bock, H. G., & Meyer, T. (2009). Optimal experimental design for
parameter estimation of a cell signaling model. PLoS computational biology, 5(11), e1000558.
Banga, J.R. and E. Balsa-Canto (2008) Parameter estimation and optimal experimental design.
Essays in Biochemistry 45:195–210.
Balsa-Canto, E., A.A. Alonso and J.R. Banga (2008) Computational Procedures for Optimal
Experimental Design in Biological Systems. IET Systems Biology 2(4):163-172.
Chen BH, Asprey SP (2003) On the Design of Optimally Informative Dynamic Experiments for Model
Discrimination in Multiresponse Nonlinear Situations. Ind Eng Chem Res 2003, 42:1379-1390.
Jaqaman K., Danuser G. Linking data to models: data regression. Nat. Rev. Mol. Cell Bio.7:813-819.
Kremling A, Saez-Rodriguez J: Systems Biology - An engineering perspective. J Biotechnol 2007,
129:329-351
Mélykúti, B., E. August, A. Papachristodoulou and H. El-Samad (2010) Discriminating between rival
biochemical network models: three approaches to optimal experiment design. BMC Systems Biology
4:38.
van Riel N (2006) Dynamic modelling and analysis of biochemical networks: Mechanism-based
models and model-based experiments. Brief Bioinform 7(4):364-374.
Villaverde, A.F. and J.R. Banga (2014) Reverse engineering and identification in systems biology:
strategies, perspectives and challenges. J. Royal Soc. Interface 11(91):20130505
(review papers in yellow)
http://www.iim.csic.es/~gingproc/software.html
Thank you!
More info?
julio@iim.csic.es
Download