Extensions to Nonlinear Models

advertisement
Extensions to Nonlinear Models
The theory we’ve covered to this point is specifically developed for models that are linear
in the parameters of interest. This is certainly the case for which the theory is most complete
and elegant. However, some elements of it can be used in developing designs for nonlinear
models as well.
General Case
Using as much of our previously defined notation as possible, we continue to denote the
response variable by y, the vector of r predictors by u, and the design space of permitted
values of u as U. In our “general case,” we specify the distribution of the response,
y ∼ f (y; u, θ),
where θ is the set of k unknown parameters of interest, and the form of the distribution f is
known. (f will also be used to denote the probability density function of y in these notes.)
Note that we’ve not introduced x yet, but something analogous to it will emerge.)
We continue to regard a discrete experimental design as a set of N values of the vector
of predictors, U = {u1 , u2 , ..., uN }, and now use the N -vector y to represent the responses
associated with the conditions specified in the design. Our previous definition of M depends
specifically on model linearity, and must be replaced in this context. We rely now on likelihood theory (assuming that standard regularity conditions are satisfied), and say that θ̂ is
the maximum likelihood estimate of θ. An asymptotically valid expression for the variance
matrix of θ̂ is the inverse of the Fisher information matrix:
I(θ) = E[−
∂2
0 log
∂ θθ
L(θ; y)],
where the likelihood function L(θ; y) ≡ f (y; U, θ), the joint pdf of all y’s.
Although an even more “general” version of “the general case” could be developed, here
we will focus on experiments in which the N responses are statistically independent. In this
case, L(θ; y) =
QN
i=1
f (yi ; ui , θ), and log L(θ; y) =
I(θ) =
PN
i=1
−E[
∂2
∂ θθ
0
PN
i=1
log f (yi ; ui , θ), so
log f (yi ; ui , θ)]
Hence the information matrix is the negative “expected curvature” of the log-likelihood. In
the spirit of previous notation, we can define M(U, θ) =
1
I(θ)
N
for a discrete experimental
design U .
The serious difficulty here is that M is generally a function of θ as well as the design.
But if we (unrealistically) claim to know θ, the optimality theory we’ve developed for linear
models can be applied. Let µ be a probability measure (or design measure) over U, and let
the set of such measures be H(U). Then, define the moment matrix for this design measure
as
1
M(µ, θ) = Eµ [−Ey|u,θ
∂2
∂ θθ
0
log f (y; u, θ)].
For fixed θ, let H(U) correspond to Mθ , that is, µ ∈ H(U) if and only if M(µ, θ) ∈ Mθ .
(A distinction between this set-up and what we used with linear models is that η was defined
over X ; here we don’t have an analogue of “x” – at least, not yet.)
With this, we define the φθ -optimal design measure to be
argmaxµ∈H(U ) φ(M(µ, θ))
and designate any such measure by µ∗ . Given φ, we define the Frechet derivative as before:
Fφ (M1 , M2 ) = lim→0+ 1 [φ {(1 − )M1 + M2 } − φ {M1 }]
where M1 = M(µ1 , θ) and M2 = M(µ2 , θ), and µ1 and µ2 ∈ H(µ). With this, following the
arguments for linear models:
Theorem 3.6A (6.1.1 in Silvey) For fixed θ, and φ concave on Mθ ,
µ0 is φθ -optimal
Fφ [M(µ0 , θ), M(µ, θ)] ≤ 0, all µ ∈ H(µ)
iff
Let J(u, θ) = M(µ, θ) for the design measure µ that assigns probability 1 to the single vector
u. Then:
Theorem 3.7A (6.1.2 in Silvey) For fixed θ, and φ concave on Mθ and differentiable at
M(µ0 , θ),
µ0 is φθ -optimal
iff
Fφ [M(µ0 , θ), J(u, θ)] ≤ 0, all u ∈ U
So, in summary, if we say we know θ, the funamental arguments of equivalence theory
we developed for linear models hold for general nonlinear models as well, with accomodation
that x and X , as we used them before, do not have exact counterparts in this case.
An Often-Used Trick
For most distributions/models, I(θ), and therefore also M(µ, θ), can be written as a
R
function of first (rather than second) derivatives. This begins by noting that f (y; u, θ)dy =
1. (We drop the arguments of f for a bit, with the understanding that it depends on both u
and θ.) Because this quantity is a constant, ∂∂θ
differentiation can be exchanged, this gives us
R ∂
∂θ
R
f dy = 0. If the order of integration and
f dy = 0
The next step in the argument is to multiply and divide the integrand by f , and write an
equivalent quantity:
2
R 1 ∂
(
f ∂θ
f )f dy =
R ∂
∂θ
(log f )f dy = E( ∂∂θ log f ) = 0
(Note that this is beginning to look a bit like I(θ), but only involves first derivatives.)
The same strategy is used a second time; because the above quantity is a constant,
differentiation with respect to θ yields a k × k matrix of zeros, and if the order of integration
and differentiation can be exchanged:
R
∂
1 ∂f
)(f ))
0 ((
f ∂θ
∂θ
dy = 0
Using integration-by-parts to re-express this integral,
R 1 ∂f
(
R
( ∂∂θ log
R ∂ 1 ∂f
∂f
)f dy = 0,
0 ) dy +
0(
∂θ
∂ θ Rf ∂ θ
∂
∂
f )(f ( ∂ 0 log f )) dy +
log f )f
0(
∂θ
∂θ ∂θ
f ∂θ
)(
or
dy = 0
But note now that the second term is the negative of the information matrix, and so rearrangement gives:
−E(
∂2
0
∂θ∂θ
log f ) = E( ∂∂θ log f )( ∂∂θ log f )0
for the information associated with a single observation at the selected u, or for a design U
of N points,
PN
I(θ) =
i=1
and based on this, M(U, θ) =
E( ∂∂θ log f (ui , θ))( ∂∂θ log f (ui , θ))0
1
I(θ)
N
as before.
In summary, the expectations of the squares and products of first derivatives of the loglikelihood can be used in place of the negatives of the expectations of their corresponding
second derivatives, if the required conditions hold. Note that this does not quite give us a
form equivalent to xx0 , since the expectation with respect to y applies to the product of
the two terms, rather than to each of them individually. We will see shortly that for one
common class of problems, a further simplification can be made that leads to a separable
product, and an analogue of x.
A Simple Example using an Exponential Growth Model
For a single controlled variable u ∈ U = (0, ∞), suppose that the response variable y has
a Bernoulli distribution:
y
= 0 with probability e−θu
= 1 with probability 1 − e−θu
for some unknown θ > 0, and that the responses taken on different experimental trials
are independent. Applications in which this kind of set-up might arise include reliability
experiments in which u represents “time on test”, θ is a rate of physical degradation under a
constant stress, and y is the result of a pass-fail performance test of a unit at time u, where
the probability of failure (y = 1) increases with time. For a single fixed u
3
L = (e−θu )1−y + (1 − e−θu )y
log L = (1 − y)(−θu) + ylog(1 − e−θu )
∂
log
∂θ
−θu
e
L = −(1 − y)u + yu 1−e
−θu
Because y(1 − y) = 0, the cross-product term is not present in the square of the above
derivative, and since y = y 2 ,
−θu
∂
u 2
( ∂θ
log L)2 = (1 − y)u2 + y( −e
)
1−e−θu
Taking the expectation requires only substitution of probabilities for y and 1 − y, and so for
any single value of u,
I(θ) =
1
e−θu
(θu)2 [ 1−e
−θu ].
θ2
−v
e
Regardless of the value of θ, the information is maximized when v = θu maximizes v 2 [ 1−e
−v ],
or the solution to
∂ 2 e−v
v [ 1−e−v ]
∂v
= 0, which is approximately v = 1.6. Since the information
matrix is a scalar in this case, essentially all reasonable variance-based optimality criteria
boil down to the same thing, i.e. φ = I, and since this function is maximized for only one
value of u (as a multiple of θ), any optimal design will include only replicates of this value
of u.
Here is some intuition for why this form of solution makes sense. Suppose a much smaller
value of u is used; then the responses will each, with very large probability, be 0. This would
be be solid information that e−θu is large, or that θu is small, but since u is small, this
leaves a great deal of uncertainty about θ. Likewise, if a much larger value of u is used, then
responses will each, with very large probability, be 1, and there will be little uncertainty
that e−θu is small or that θu is large, but this leaves the possibility that θ could be very,
very large. Hence, only values of u that lead to an “intermediate” binomial probability can
(statistically) eliminate the possibility of both large and small values of θ. For u = 1.6/θ,
this probability of y = 1 is e−1.6 = 0.20. A Bernoulli probability of 0.80 ≈ e−0.22 would be
“equivalent” in at least one sense, because the uncertainty associated with the probability
of failure would be the same. Why would u = 0.22/θ be a less desirable design point?
Additive Error Models and Exponential Families
Now consider a more restrictive, but still widely-applicable, setting in which the random
component of each observation is additive:
U = {u1 , u2 , ..., uN }
yi = m(ui , θ) + i = mi + i
Here we will say that i has density pi (−) with mean zero (which does not limit generality
due to the form of the model), with independent responses over the N experimental trials.
4
We subscript p with i to indicate that this distribution may depend on the value of u.
Because of the model form, the distribution of yi is f (yi ) = pi (yi − mi ), i.e. a simple shift
transformation. Since f (yi ) = pi (i ), logf (yi ) = logp(i ), so
∂
logf (yi )
∂θ
=
1
f (yi )
× ∂∂θ f (yi ) =
p0i (i )
pi (i )
But in this expression, only the scalar
E[ ∂∂θ log f (yi )
∂
∂θ
0
1
pi (i )
× p0i (i )[− ∂∂θ mi ]
is a function of the random variable, and so
log f (yi )] = (
R p0i (i )2
pi (i )
di )( ∂∂θ mi )( ∂∂θ mi )0
If pi is a distribution in the exponential family, the first factor is
1
,
σi2
and so the information
matrix for an N -point design is
I(θ) =
This form has the structure of M =
σi−2 ( ∂∂θ mi )( ∂∂θ mi )0
PN
i=1
P
wxx0 from the linear case, but still requires that we
know the parameters to fully define xi = σi−1 ∂∂θ mi , so that
M(U, θ) =
1
N
PN
i=1
xi x0i
If the σi ’s are all equal, xi can be simplified to just the vector of derivatives for purposes of
experimental design.
Example: Michaelis-Menton Model
The Michaelis-Menton model is a standard nonlinear regression form used in enzyme
kinetics and other applications. The response y is the rate of a reaction, the single controlled
variable u is the concentration of the substrate under study, u ∈ U = (0, u∗ ), where u∗ is
the operational upper bound on concentration level, and the parameters of interest are θ1 ,
the maximum possible (asymptotic in u) reaction rate (in units of y), and θ2 , the value of u
at which the reaction rate is half of θ1 ; both of θ1 and θ2 must be positive. With additive
random error (say, for measurement effects), the form of the model is
y=
θ1 u
θ2 +u
+
If the distribution of is from the exponential family and variance is constant across experimental runs, we can define:

x = ∂∂θ m =
u
θ2 +u

1
1
− θ2θ+u



x
= 1 
x2
The induced design space is a parametric curve in two-dimensional space (x1 and x2 )
specified by values of a single variable (u) (again, remembering the θ1 and θ2 are, for our
purposes, fixed). Plots below show an example of the Michaelis-Menton function, and the
induced design space, for θ1 = θ2 = 1, and u∗ = 5:
5
0.00
Induced Dsn. Space for theta=(1,1)
−0.30
−0.20
x2
−0.10
0.0 0.2 0.4 0.6 0.8 1.0
E(y)
M−M Function with theta = (1,1)
0
1
2
3
4
5
0.0
0.2
u
0.4
0.6
0.8
1.0
x1
What design points would lead to a “diagonally dominant” information matrix (generally
good)?
• u near u∗ → x1 as near as possible to 1 → large x21 → large I1,1
• u near θ2 → x2 near − 41 θθ12 , it’s minimum → large x22 → large I2,2
2
• u near 0 or u∗ → x1 or x2 small → small x1 x2 → small I1,2
This exercise helps establish some intuition for what to expect, but doesn’t lead to a complete
answer since, for example, x1 and x2 both small isn’t good. In fact, for any u∗ , the D-optimal
design is:





µ∗ : u = 



θ2
prob
u∗
prob
other
1
2
1
2








prob 0 
You could, for example, find this design numerically with a point-exchange algorithm (Frechet
derivatives, et cetera) with candidate points on the parabola. We will do this with a more
interesting model shortly.
Efficiency
The optimal design for Michaelis-Menton model is a function of θ2 , but not a function of
θ1 . But, the performance of the design is a function of both parameters. To see this, consider
the design measure:
µ:u=









u1
prob
∞
prob
other
6
1
2
1
2








prob 0 
which is D-optimal if u∗ = ∞ and u1 is set to the value of θ2 . For this design measure,
|I(θ)| =
2 2
1 θ1 u1
,
4 (θ2 +u1 )4
and for the optimal u1 = θ2 , this is
1
64
θ1
θ2
2
. But what if you are incorrect
about your assumed value of θ2 ?
Suppose you assume that θ1 = θ2 = 1, u∗ = ∞, and you place equal weight on (the
optimal) u1 = 1 and u2 = ∞. As a function of the actual values of the parameters, |I(θ)| =
θ12
1
.
4 (θ2 +1)4
The figure displays the values of this criterion function over (θ1 , θ2 ) ∈ [0.5, 1.5]2 .
|I| as a function of theta's
1.4
01
0.
2
1.0
3
0.0
01
0.
4
0.0
5
0
.
0
6
0.0
0.6
theta2
0.0
7
0.0
0.6
0.8
1.0
1.2
8
0.0
1.4
theta1
Note that the criterion function of information is greatest at lower right corner (big θ1 , small
θ2 ), even though this isn’t the set of parameter values for which the design is optimal.
However, that shouldn’t really be surprising. The logic is that if θ1 = θ2 = 1, then the
design we’ve picked is the best (with respect to our criterion). But this doesn’t say that the
criterion value for this design might not be even greater at different parameter values (even
though at different parameter values, there will be another design that is even better than
the one we’ve selected). For this reason, it is sometimes preferred to consider an adjusted
measure of optimality, efficiency, which compares, as a function of “true” parameter values,
the design you’ve chosen relative to the best one you could have chosen for those parameters.
For the Michaelis-Menten model, the D-optimal design measure places half its weight on each
of u = θ2 and u = u∗ (which we’re saying is ∞ here to simplify things). That design has a
criterion value which we’ll call |I(θ)|best =
2
1 θ1
.
64 θ22
We can then define a measure of efficiency
for the design we’ve selected (based on an assumption that θ2 = 1) as
θ2
2
eff(θ1 , θ2 ) = |I(θ)|/|I(θ)|best = 16 (θ2 +1)
4
This function is plotted below for θ2 ∈ [0.5, 1.5]:
7
0.90
0.80
eff
1.00
eff as a function of theta2
0.6
0.8
1.0
1.2
1.4
theta2
This shows that even though information (as measured by our criterion) associated with our
design is greater for θ2 = .5 than for θ2 = 1, efficiency, which evaluates a design relative to
the best that could have been used, is less than 1 at θ2 = .5. Unfortunately, there is usually
no unique way to do this; e.g. efficiency can, for example, also be defined as a power of this
ratio, or as |I(θ)| − |I(θ)|best .
Example: Speed of R → P1 + P2
Here is a somewhat more extensive nonlinear regression problem based on another function from chemical kinetics. The model form is:
y = θ1 θ3 u1 /(1 + θ1 u1 + θ2 u2 ) + with errors that are assumed to have a distribution in the exponential family with constant
variance. Suppose the design region for the two controlled variables is U = [0, 3]2 , and the
guessed/assumed value of θ is (2.9, 12.2, 0.69)0 . A Wynn algorithm, essentially like the one
described in the notes for algorithms for linear models, was written to generate a D-optimal
design using a grid of u with spacing of 0.03 in each direction (for a total of 1012 grid points),
with corresponding points in (the 3-dimensional) X computed by evaluating the derivatives
of the model form with respect to each of the 3 model parameters at the assumed value of θ.
The design was initialized with a randomly chosen 10-point design, and the algorithm ran
for 261 iterations before stopping, based on a stopping rule of δ = 0.1. The graphs show all
points included in the initial design and added through the iterations, and an image plot of
accumulated probability mass at each point:
8
●
2.0
●
●
2.0
W−algorithm, mass
3.0
●
●
1.0
●
●
●
●
●
●
●
●●
●
●
●
0.0
0.0
u2
●
●
1.0
u2 (jittered)
3.0
W−algorithm, all points
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
u1 (jittered)
2.0
2.5
3.0
u1
The three locations at which most of the mass accumulates are (u1 , u2 ) = (0.27, 0.00) with
mass 0.3180, (3.00,0.78) and (3.00,0.81) with mass 0.3065, and (3.00,0.00) with mass 0.3218.
(The two neighboring grid points with substantial mass likely indicate a single mass point
with u2 between 0.78 and 0.81.) It seems likely that for practical purposes, a design that
places equal numbers of replicates at each of the three indicated points would be near optimal
– if the assumed value of θ is correct.
Nonlinear Functions of Linear Model Parameters
One application of the ideas discussed here is to the design of experiments for linear models where the quantities of interest are nonlinear functions of the linear model parameters.
A simple example is that of a quadratic regression,
y = θ0 + θ1 x + θ2 x2 + where interest centers on (1) the location of the peak or dip, (2) the expected response value
at the peak/dip, and (3) the second derivative at the peak/dip (or, given the form of the
model, the second derivative anywhere). Simple calculus leads to:
1. g1 = −θ1 /(2θ2 )
2. g2 = θ0 − θ12 /(2θ2 )
3. g3 = 2θ2
as the three quantities of interest.
More generally, a linear model y = x0 θ + is appropriate for modeling, but interest rests
not in the elements of θ as such, but in k ∗ functions of them. Let g be a vector of these
functions of θ, and define a k-by-k ∗ matrix G to have (i, j) element
well-behaved likelihood asymptotics
9
∂gj
.
∂θi
For standard and
V ar(ĝ) = G0 V ar(θ̂)G ∝ G0 M(µ, θ)−1 G
If G is square and of full rank
V ar(ĝ)−1 ∝ G−1 M(µ, θ)G0 −1 = [G−1 x][x0 G0 −1 ]µ(x)dx
R
If some or all of the functions x are nonlinear in θ, then G depends on these parameter
values. We can indicate this by writing Gθ , and for design purposes say:
−1 0
I(θ) = [G−1
θ x][Gθ x] µ(x)dx
R
So, call G−1
θ x the augmented design space vector “x” instead; this fits the nonlinear setup
we’ve been discussing.
For k ∗ = k, Gθ of full rank, and the collection of functions g invertable, an alternative
way to see this is that the model could be re-written as a nonlinear model in the quantities of
interest. For example, in the quadratic regression problem mentioned above, inverting gives:
1. θ0 = g2 + g12 g3
2. θ1 = −g1 g3
3. θ2 = g3 /2
Substituting these expressions in the linear model (for θ) yields a nonlinear model (for g),
and the approach we’ve discussed above will lead to the same result. Complications arise,
as they do with Ds -optimality, when k ∗ < k, since an optimal design may not require that
all elements of θ be individually estimable.
Various “Robust” Approaches
This section is really a “placeholder” for at least one body of work that is called “robust”
nonlinear design. In short, the story we’ve told is predicated on knowing, or being willing
to assume, a value for the parameter vector of interest, θ. This allows the development of
theory that parallels that for linear models, but obviously does not (as presented so far)
lead to designs for practical situations. One attempt to overcome this practical problem is
to broaden the process we’ve talked about, generally by minimax arguments, to a broader
collection of possible θ vectors. Suppose that, rather than a single value, we are willing to
assume that the parameter vector of interest lies in some specified region, θ ∈ Θ. Then it
can make sense to construct designs as solutions to:
• argmaxµ minθ ∈Θ φ(µ, θ), or
• argmaxµ θ ∈Θ φ(µ, θ)w(θ)dθ, for some suitable weight function w
R
10
Alternative approaches can be considered by substituting a measure of efficiency for optimality in either of these two. We shall not explore these possibilities further here, but do note
that despite the computational intensity that generally accompanies minimax problems, a
number of researchers have spent considerable efforts in directions such as these. (A related
approach we will consider in a bit more detail in the next unit is that of “Bayesian design”.)
Another Approach: Sequential Estimation/Design
Another approach to making optimal design theory more applicable to practical situations
is to adopt a sequential approach in which data, as it is gathered, is used to improve the
assumed (or “guessed”) value of θ. A general outline of such an approach might be written
as follows:
1. Begin with a best guess, θ g .
2. Construct an optimal design as if θ = θ g .
3. Execute the experiment, collect the data, and compute θ̂.
4. Redefine θ g ← θ̂, and return to step 2.
This algorithm is not complete. For example, in step 2, does “construct an optimal design”
mean a design that is optimal in its own right, or optimal under that constraint that it
includes the experimental runs from previous stages? Similarly in step 3, does “compute θ̂”
mean an estimate based only on the most recent data, or does the estimation also including
data from previous stages? Pro’s and con’s of various approaches are related to those we
briefly discussed in our treatment of group screening designs.
Where sequential experimentation is pratical, this is often a very effective and practical
approach to experimental design. A general difficulty with the sequential approach is that
there is, in many cases, no guarantee that V ar(θ̂)−1 ≈ I(θ) as derived under the standard (and relatively simple) formulation of I from the likelihood function for independent
responses. Instead, sequential experiments are explicitly constructed so that u2 depends on
y1 , u3 depends on y2 and perhaps y1 , et cetera. The statistical nature of this dependence is
a function of f and details of the sequential rules. Where asymptotic arguments are developed, they are often based on showing that, with probability approaching 1, design points
“pile up” on the (correct) locally optimal design, and that the sequence of θ̂ i ’s converge to θ
“quickly” in some sense, relative to the number of experimental trials that can be attempted.
Example Continued: Speed of R → P1 + P2
Fedorov (1972) presents a demonstration of sequential experimentation using the model
of the previous example:
11
E(y) = θ1 θ3 u1 /(1 + θ1 u1 + θ2 u2 )
His demonstration is apparently based on real (rather than simulated) data, with experiments
on catalytically dehydrated n-hexyl alcohol performed at 555◦ F, for which the two products
of reaction are olefin and water. As in the previous example, the design region for this
experiment is U = [0, 3]2 . For this setting, the “known” value of the parameter vector θ is
(2.9, 12.2, 0.69)0 , the set of values we used as “guesses” in the previous example. However,
for this demonstration, this information was not used. Instead, a standard 22 factorial
experiment in all combinations of values 1 and 2 for u1 and u2 was used as a first-stage
design. Based on the data collected for these 4 experimental trials, an estimate of the
parameter vector was computed:
θ̂ 4 = (10.39, 48.83, 0.74)0
(Note that these estimates are not especially close to the true parameter values, but also
that they are based on only 4 data points.) Using these estimates as if they were the
actual parameter values, Fedorov then determined the optimal 5th design point that should
be added using the Frechet derivative associated with D-optimality; the rather extreme
parameter estimates led to selection of the next (u1 , u2 ) = (0.1, 0.0). A new experimental
run was conducted at this point, a new estimate of the parameter vector was computed
based on the accumulated 5-run data set, and a 6th design point was selected. This process
continued through a total of 13 experimental trials, with results tabulated below:
trial
u1
u2
y
θ̂1
θ̂2
θ̂3
1
1.0
1.0
0.126
2
2.0
1.0
0.129
3
1.0
2.0
0.076
4
2.0
2.0
0.126
10.39
48.83
0.74
5
0.1
0.0
0.186
3.11
15.19
0.79
6
3.0
0.0
0.606
3.96
15.32
0.66
7
0.2
0.0
0.268
3.61
14.00
0.66
8
3.0
0.0
0.614
3.56
13.96
0.67
9
0.3
0.0
0.318
3.32
13.04
0.67
10
3.0
0.8
0.298
3.33
13.48
0.67
11
3.0
0.0
0.509
3.74
13.71
0.63
12
0.2
0.0
0.247
3.58
13.15
0.63
13
3.0
0.8
0.319
3.57
12.77
0.63
Note that the estimate based all 13 data values is considerably closer to the “known” vector
value than θ̂ 4 . Fedorov does not discuss a formal stopping rule that led to the experiment
ending after 13 trials, but does note that “dispersion of the parameter estimates became
12
insignificant and ... close to [those of] the theoretical value(s).” Note also that of the 9
sequential runs made (excluding the initial 4), 4 were close to (0.2,0.0), 3 were at (3.0,0.0),
and 2 were at (3.0,0.8) – very close to the 3 apparent mass points based on our example that
assumed the “true” parameter values.
References
Fedorov, V.V. (1927). Theory of Optimal Experiments, Academic Press, London. Originally
published in Russian (1969) as TEORIYA OPTIMAL’NOGO EKSPERIMENTA.
13
Download