Optimal Design Theory for Linear Models, Part I

advertisement
Optimal Design Theory for Linear Models, Part I
The notation used here approximately (but not exactly) follows that of Silvey (1980).
Less theoretical (than Silvey’s) introductions to optimal design are included in many other
textbooks including Myers, Montgomery, and Anderson-Cook (2009) and Morris (2011).
Introduction
Our presentation of orthogonal arrays was based loosely on an argument that the orthogonal property made them “best” in a relevant statistical sense. In this section, we’ll address
the idea of “best” more carefully, and consider more direct and less restrictive concepts of
relating definitions of “best” to specific designs. In short, we usually do this be defining
optimality criteria, functions of designs that represent their quality from a statistical perspective. We then define and solve (or try to solve) an optimization problem which consists
of finding the (or a) design that leads to the largest (or smallest) value of the optimality
criterion, and call such a design “optimal” (with respect to that criterion).
Criterion functions (or often just criteria for short) are formulated in different ways,
reflecting the specific goals of an experiment, and designs that are optimal with respect
to one criterion are not necessarily optimal with respect to others. General theory has
been developed that allow optimal designs to be derived analytically in some cases. Where
this is impossible or impractical, numerical methods are employed to solve the optimization
problem. (However, many of the algorithms used in numerical optimization of designs are
rooted in, or at least related to, the analytical theory.)
Some Notation
Before undertaking a general discussion of theory, we shall define some notation that will
hopefully be broad enough to carry us through any linear models settings.
Data for any specific experimental trial or run consists of a set of r controllable predictors,
denoted by the r-vector u, and a scalar-valued response y. We usually think of these variables
as being stated in “natural units”, e.g. the elements of u may be in units of pounds or dollars,
and have not generally been scaled to something more numerically convenient. Parametric
inference requires that we specify the form of a model relating y to u, and for linear models
we will write:
y=
Pk
i=1 θi fi (u)
+ , or y =
where:
• θi , i = 1, 2, ..., k, are unknown quantities
• fi , i = 1, 2, ..., k are known functions
1
Pk
i=1 θi xi
+
• is a random variable with mean 0 and variance σ 2
The values that can be entertained for the vector u constitute the design space (U), the physically meaningful domain; each u ∈ U. The induced design space is the set of corresponding
values that can be entertained for x = (x1 , x2 , ..., xk ); each x ∈ X . Another way to say this
is
X = f (U)
where f a vector-valued function (f1 , f2 , ..., fk )0 .
An experiment consists of N experimental trials, with an experimental design denoted
as
U = {u1 , u2 , ..., uN }.
For persent purposes, we shall assume that the random variable is i.i.d. across these runs.
Each vector of predictors defines a corresponding vector from the derived design space,
xi = f (ui ), i = 1, 2, ..., N . This, in turn, leads to what we shall call the design information
matrix or design moment matrix:
M(U ) =
1
N
PN
i=1
xi x0i
Apart from the factor of N −1 , this is the familiar “X0 X matrix” in notation that is often used
(for example, in STAT 512), and can be regarded as a “fundamental element” in describing
the quality of statistical inference that can be drawn from data collected with design U . For
example, under the stated assumptions, V ar(θ̂|U ) =
σ2
M(U )−1 ,
N
if the inverse exists. This,
and other common variance formulae, suggest the intuition that good designs are associated
with “large” M matrices, and by extension, that optimal designs should be associated with
the “largest” M matrices.
One concept that has been used for sorting out which design moment matrices are largest
is the Loewner partial ordering for symmetric matrices, which defines M1 = M(U1 ) as being
weakly “greater than” M2 = M(U2 ) iff M1 − M2 is positive semi-definite, suggesting that
U1 is a superior design to U2 . The hope, then, would be that for a collection of competing
designs, the design moment matrix for one (or a few) would dominate those of the other
designs in this sense. When this occurrs, it is difficult to think of a statistical reason to dispute
the indicated optimal design. Unfortunately, this is not the case in most practical situations.
For example, consider the following two designs in N = 4 runs, with U = [−1, +1]2 for
first-order regression (i.e. an intercept and two slopes):
•
•
••
U1 :
U2 :
•
•
•
2
•
Although U1 would be considered superior to U2 for nearly all statistical purposes (except
for the benefit of pure replication in the second design), the eigenvalues of M1 − M2 are, in
this case, 0.7071, 0, and -0.7071, and so neither design is preferred to the other in terms of
the Loewner partial ordering. In fact, the sense of “ordering” implied here is so strong that
it is of very little practical use, since far too many pairs of designs cannot be ordered in most
problems of real interest.
A more practical approach to optimal design is to base the ranking of alternative designs
by a scalar-valued criterion function, φ(M), so that the problem of optimal design construction becomes one of function optimization. In the following sections, we briefly describe
some of the criteria more commonly used in practice. Most of the effort in optimal design
construction has as its goal finding the optimal design or designs for a given problem. However, we note at this point that in many cases, the criterion can be interpretted as being
“monotonically related to the quality of the design”, so that two designs can be ordered in
their desirability based on the criterion function, even if neither is optimal. This also leads to
various definitions of “efficiency” of a design, based on the difference or ratio of the criterion
value of that design and the criterion value of an optimal design of the same size (N ).
D-optimality
We turn now to examining specific optimality criteria that are in common use. Probably
the most-often used (and certainly the one for which the theory described later is best
developed) is D-optimality. This criterion is appropriate for experiments for which the overall
goal is satisfied by estimating θ well, and can be developed by noting that the classical
confidence region for this parameter vector can be written as
CR(θ) = {θ : (θ − θ̂)0 M(U )(θ − θ̂) ≤ c1 }
The boundary that defines this region is an ellipsoid in k-dimensional space, centered on the
vector θ̂, with volume c2 |M(U )|−1/2 . Hence, for any values of σ 2 and θ̂, the volume of CR
is minimized by a design that minimizes:
φ(M(U )) = |M(U )−1 |
or
φ(M(U )) = −log|M(U )|
or
φ(M(U )) = log|M(U )|
or equivalently, maximizes:
φ(M(U )) = |M(U )|
and designs that are optimal with respect to this criterion are called D-optimal (for “determinant”).
This is a reasonable choice for a criterion for designing an experiment “to estimate θ
well,” but it is not fool-proof. For example, suppose that for a particular problem, the
design that minimizes the volume of CR(θ) actually produces confidence ellipsoids that are
3
extremely long in one direction, but very short in all directions orthogonal to this. (This
would correspond to a matrix M(U ) with all but one eigenvalue very small, but one very
large.) While the volume of the confidence region might be small, the length of any or all
confidence intervals on individual elements of θ̂ could be very large! This remark shouldn’t be
interpretted as a weakness of D-optimality specifically; similar arguments can be constructed
for the “fallibility” of all common criteria. It is, instead, the inevitable consequence of relying
on a single value (the criterion) to represent the value of an experimental design (which is
hardly a scalar-valued structure). Reduction to a scalar-valued objective function makes
optimization possible, but at a price. A reasonable practical approach to dealing with this
is to construct a design using an appropriate criterion function, and check the quality of the
design with respect to other related criteria to be sure that it is at least “good” with respect
to them all.
Example: Simple Linear Regression
For sample size N = 3, set U = [−1, +1]. For simple linear regression, without scaling
the independent variable, the usual parameterization is x = (1, u)0 , so X = 1 × [−1, +1]. In
this case,

3
M(U ) = 13  P
ui
P

ui
P 2
P
1
2
P 2  |M(U )| = 9 (3 ui − ( ui ) )
ui
That is, |M(U )| is proportional to the sample variance of ui ’s, therefore, any U that does
not contain u = −1 and u = +1 can’t be optimal because “spreading it out” to the boarders
would increase that variance. So, let u1 = +1, u2 = −1, |M(U )| = 19 (3(2 + u23 ) − u23 ) =
1
(6
9
+ 2u23 ). This quantity is maximized at u3 = +1 or -1, so:
















1 +1 



U = −1 , X =  1 −1 








 −1 
1 −1
1 +1 



U = −1 , X =  1 −1 








 +1 
1 +1

+1 



+1 


are D-optimal for SLR with this U and N = 3. (It should be obvious that the argument
would take the same form for any interval on the real line.)
Invariance of D-optimality
D-optimality has a linear invariance property not shared by some of the other popular
optimality criteria. In the standard definition of the problem, we begin with u and transform
to x = f (u), leading to what we’ll temporarily call Mx (U ). Suppose, now, that we further
linearly transform x as z = Tx, where T is a square matrix of full rank. In the new
parameterization, the design information matrix is
Mz =
1
N
PN
i=1
zi z0i =
P
1
T[ N
i=1
N
4
xi x0i ]T0 = TMx T0 .
But it follows from this that |Mz | = |T|2 |Mx | (since the determinant of a product of square
matrices is the product of determinants). So any design that maximizes |Mx | also maximizes
|Mz | and vice versa, that is, D-optimality is invariant under nonsingular linear transformation of x. (In fact, beyond the question of which design is optimal, it is clear that the ranking
of designs of the same size (N ) by this criterion is unchanged by linear transformation.)
Example: SLR, continued
Recall that the parameterization from the SLR example is x0 = (1, u). Now consider the
effect of a linear tranformation defined by




+1 +1 
1+u 
, z = Tx = 
T=
+1 −1
1−u
That is, the new parameterization relates the dependent and predictor variables as y =
θ1 (1+u)+θ2 (1−u)+, and Z is an upper-left to lower-right diagonal in [0, 2]2 . Transforming
the model matrix for one of the optimal designs in x to the new parameterization, we have




1 +1 
2 0 






Xx =  1 −1  → Xz =  0 2 .




1 +1
2 0
Note that, as in the x-parameterization, the optimal design also leads to placement of two
design points at one end of the augmented design space and one point at the other end in
the z-parameterization.
DA and Ds
Here is one generalization of D-optimality in two forms, one of which is a special case
of the other. Suppose we want to estimate A0s×k θ well, where s ≤ k and A of full rank, s.
Note that when s = k, this is essentially a full-rank transformation as discussed above, so
that a DA -optimal design in this case is the same as the D-optimal design; interesting cases
correspond to s < k. By reasoning analogous to that given above, designs that minimize the
volume of a CR for A0 θ are those that minimize
φ(M(U )) = |A0 M−1 (U )A|
A special case (which is the most common form you’ll see in the literature) is where
A0 = (I|0), so that the focus is on a subset of s of the k parameters, Ds -optimality (for
“subset”). (I’ve simplified the form of A here by writing it as if the first s parameters are
the ones of interest.) Suppressing the argument U , let

M=


M11 M12 
,
M21 M22
M−1 = 
5
M
11
M
12

M
21
M22

where the matrix partitioning is as suggested by A0 . The central matrix of the quadratic form
defining a confidence region for the first s model parameters, acknowledging the remaining
k − s as nuisance parameters, is M11 − M12 M−1
22 M21 , hence the criterion for Ds -optimality
can be written as:
φ(M(U )) = |M11 − M12 M−1
22 M21 | to be maximized, or
−1
φ(M(U )) = |M11 | = |M11 − M12 M−1
to be minimized
22 M21 |
While the notions of DA and Ds seem at first glance to be fairly natural and straightforward extensions to D-optimality, it should be noted that they can lead to substantially
more difficult design construction problems (both analytically and computationally). This
is because A0 θ may be estimable for designs for which M is not of full rank, and in fact,
such designs may actually be optimal. For example, consider multiple linear regression in
two predictors,
U = [−1, +1]2 , y = θ0 + θ1 u1 + θ2 u2 + and let the subset of parameters of interest be (θ0 , θ1 )0 . It is not difficult to show that when
N is even,
U=


N/2 points at (+1, 0) 


N/2 points at (−1, 0) 
is an optimal design for estimating the subset, but


1 0 0 



M= 0 1 0 


0 0 0
is singular. As a result, neither M−1 nor M−1
22 exists! We defer until later the accomodations
that are required to deal with this issue.
One additional general point related to Ds -optimality should be made. You will recall
that in STAT 512, much was made about the view that experiments should be comparative.
Rationale for this is based in the argument that, because tight experimental control (exercised
to reduce variability) can result in unique common conditions for all runs in an experiment,
comparisons between data in different experiments should not be expected to reflect only
effects associated with experimental treatments. A direct consequence of this line of thinking
is to treat the model intercept as a nuisance parameter. This point of view is nearly universal
in what might be called the “traditional” treatment of statistical experimental design as
developed by R.A. Fisher and those who followed him. In contrast to this, in much of the
literature on optimal experimental design, less (and sometimes no) attention has been given
to such issues as unit randomization and the overall effect of experimental control. Most
6
of the design criteria we will discuss, at least in their basic forms, treat the entire data
model, including the intercept, as “meaningful” from an experimental point of view. Hence
D-optimality is based on a summary measure of precision of all parameters in the expectation
portion of the linear model, including the intercept, while Ds -optimality offers one way to
re-cast optimal design ideas in a context that is closer to the “traditional” perspective.
G-optimality
While D-optimality is motivated by the need to estimate model parameters well, Goptimality (for “global”) is motivated by direct estimation of the expected response. The
goal here is to estimate E(y(u)) well throughout U. For a given design, and at any one
location u, the variance of that estimate is
V ar(ŷ(u)) =
σ2 0
x M(U )−1 x
N
where x = f (u). For any σ 2 , the G-optimal design minimizes the largest such variance (over
points in the induced design space):
φ(M(U )) = maxx∈X x0 [M(U )]−1 x
or equivalently, maximizes:
φ(M(U )) = −maxx∈X x0 [M(U )]−1 x
As defined, the optimization problem required by the G-criterion is more difficult than
that for the D-criterion (and most others). This is because evaluation of the criterion for a
given design requires a complete optimization over all x ∈ X . Direct construction of a Goptimal design is then an optimization (over x ∈ X ) within an optimization (over U ∈ U N ),
leading to analytical and computational difficulties.
Note that φ(M(U )) is also −maxx∈X trace[M(U )]−1 [xx0 ]. If we had decided to look for the
design that minimizes the average, instead of the maximum, variance of estimated expected
response, then an appropriate criterion might be
φ(M(U )) =
R
x∈X
trace[M(U )]−1 [xx0 ]ω(x)dx
= trace[M(U )]−1
R
x∈X
xx0 ω(x)dx
The last integral is a region moment matrix, which is not dependent on the design. Finding
an optimal design here would not require the “inner” optimization (over x). This is a special
case of what is called “linear optimality”
φ(M(U )) = trace[M(U )]−1 Ck×k = trace[CM(U )−1 ]
7
Despite construction advantages, this criterion may be less popular than G-optimality
because it requires a weight function ω to define the average. (This can be ignored, but
that amounts to tacitly assigning uniform weight across X , and note that except in simple
cases, this is not equivalent to uniform weight across U.) The advantage of G-optimality is
that the mini-max approach focuses entierly on the worst case (point of greatest estimation
variance), and so does not require specification of the relative importance of one region of
X relative to another. However, the “average” version of this criterion is sometimes used,
especially in response surface analysis applications; this is developed further in the discussion
of “Q-optimality” in the section below on average performance of ŷ.
SLR Example, continued
We continue with the example of simple linear regression from the notes on D-optimality.
For any design U = {u1 , u2 , ...uN }, it is straightforward to show that the variance of ŷ(u) is
minimized when u is the average value of the controlled variable over the design runs, and
that this variance is a quadratic function of u so that the maximum variance ŷ occurs for
u = −1 or +1, the most extreme values of U. We determined that for N = 3, the D-optimal
designs are U1 = {−1, −1, +1} and U2 = {−1, +1, +1}. For these designs, V ar(ŷ) = 12 σ 2
and σ 2 at the two end-points, so that maxu V ar(ŷ(u)) = σ 2 . Consider now an alternative
design, U3 = {−1, 0, +1}; for this plan, V ar(ŷ(−1)) = V ar(ŷ(+1)) = 56 σ 2 . Therefore, the
D-optimal plan cannot also be G-optimal. A remaining question: Is U3 a G-optimal design
for this problem?
Other Criteria
Suppose we want to minimize the average estimation variance of several linear combinations of θ’s, A0 θ. Motivation is similar to that for DA , but focuses instead on the average
variance (ignoring correlations) instead of the volume of the CR. This leads to a criterion
function for what is called A-optimality (for “average”):
φ(M(U )) = trace[A0 M(U )−1 A]
which is minimized by the optimal design. Note that this can also be written as
φ(M(U )) = trace[M(U )−1 AA0 ]
and so is also a special case of “linear optimality” mentioned above. When A0 contains only
one row, this is sometimes called c-optimality:
φ(M(U )) = c0 M(U )−1 c = trace[M(U )−1 cc0 ]
which is analogous to Ds where the subset contains only one parameter.
8
With both A and c, complications can again arise with designs that should be called
“optimal”, but have singular M. Related to c-optimality, but without this problem, is
E-optimality (for “eigenvalue”), which calls for minimizing
φ(M(U )) = maxc0 c=1 c0 M(U )−1 c = evmax (M(U )−1 )
or maximizing
φ(M(U )) = evmin (M(U )).
Criteria Involving the Average (over u) Performance of ŷ(u)
The notes to this point largely follow the development of Silvey’s book. This section is
closer to the development of Myers et al. (I’ve tried to make the notation consistent, but this
was a last-minute addition, so let me know if you find something that is totally inconsistent.)
Variance optimality
Recall that G-optimality uses a criterion of form based on the precision of mean response
estimation:
φ(M(U )) = maxx∈X V ar[ŷ(x)] =
σ2
maxu∈U x0 M−1 x
N
As briefly noted above, we could also define the criterion by averaging over the design region
rather than focusing on the point of maximum variance, i.e.
φ(M(U )) =
σ2
N
R
u∈U
x0 M−1 xdu
This could also be defined with integration over x ∈ X ; here we integrate with respect to
u with the understanding that x is defined as a function of u. Myers et al. call this Qoptimality, and it is also sometimes called IV- or I-optimality for “Integrated Variance”. As
noted above, general “averaging” with an integral can be defined with a weight function,
but here we’ll compute “averages” as uniformly weighted integrals in u ∈ U, with the
understanding that this can be made more general when that helps. Define the volume of U
to be Ω =
R
u∈U
1du. Then a Q-optimality criterion function can be developed as:
aveu∈U V ar[ŷ(u)] =
=
=
=
=
1 R
σ 2 x0 (X0 X)−1 xdu
Ω u∈U
σ2 R
x0 M(U )−1 xdu
N Ω u∈U
R
σ2
trace[M(U )−1 xx0 ]du
N Ω u∈U
Z
σ2
N
trace[M(U )−1
σ2
N
−1
1
Ω
|
trace[M(U ) µ]
9
xx0 du]
u∈U
{z
}
Here, µ is a region moment matrix, and is analogous to what M would be in a limit as the
experimental design increases in size in such a way that it “uniformly fills” U. Consistent
with criterion functions presented above, we can omit constant factors of σ 2 and N , and
define
φ(M(U )) = trace[M(U )−1 µ]
which is a specific form of linear optimality discussed above.
Example: First-order regression model in U = [−1, +1]r

µ=
1 R
Ω u1 ∈[−1,+1]
...
1


 u1

R
ur ∈[−1,+1]



...
u1
u2
u21
u1 u2
...
...
...

...

... 

ur 

... u1 ur 

du1 ...dur
u2r
ur ur u1 ur u2 ...
Carrying out the integration on each scalar quantity:
• 1 integrates to 2r , so Ω = 2r
• Odd powers each integrate to zero
• u2i integrates to 2r ( 13 )
so µ = diag(1, 13 , 13 , ..., 13 ). Q-optimality leads to minimization of V ar(θ̂1 ) +
1
3
Pr+1
i=2
V ar(θ̂i ).
The most commonly used form of A-optimality employes A = I, leading to a criterion that is
minimization of the average of all coefficient estimate variances, so in contrast, Q-optimality
resembles this but places more weight on the intercept. A little reflection shows that any 2level orthogonal fractional factorial of resolution at least 3 is Q-optimal for 1st-order models
on U = [−1, +1]r .
Example: Second-order regression model in U = [−1, +1]2

µ=






R
R
1

Ω u1 ∈[−1,+1] u2 ∈[−1,+1] 





1
u1 u2
u21
u21 u1 u2 u21 u2
u32
u1
u2
u22
u1 u22 u21 u2

u22 

u1 u22 


u32 


u21 u22 u31 u2 u1 u32 

u41

u21 u22 

u42
sym
Here,
• Ω=4
10
du1 du2
• u2i integrates to 22 13
• u21 u22 integrates to 22 19
• u4i integrates to 22 15
• all terms with any odd power integrate to zero
Bias optimality
Q-optimality is based on average variance, but if the fitted model is of incorrect form,
bias is also an issue. For example, for simple linear regression, r = 1, with U = [−1, +1],
U = {−1, +1} is a Q-optimal design (along with any other reasonable form of optimality you
might develop based only on variance). The alternative design, U = {− 34 , + 34 }, is certainly
not Q-optimal, but may have less expected squared error in the estimates of E(y(u)) at most
values of u ∈ U if the actual model is quadratic, e.g.
-1
0
+1
To make this argument more formal, suppose E(y(u)) = x01 θ 1 + x02 θ 2 , but we fit a model of
form ŷ(u) = x01 θ̂ 1 . The squared error of ŷ at any u is:
err(u)2 = (x01 θ 1 + x02 θ 2 − x01 θ̂ 1 )2 .
The expectation of this squared error is comprised of squared bias and variance terms:
E[err(u)2 ] = [Eerr(u)]2 + V ar[err(u)].
11
In turn, this expression can be integrated with respect to u to get a measure of integrated
mean (or expected) squared error (IMSE), and the integration can be applied to the two
components individually. Extend the definition of design and region moment matrices introduced earlier as:
M11 (U ) =
µ11 =
1 PN
0
i=1 x1,i x1,i
N
R
1
x x0 du
Ω u∈U 1 1
µ12 =
PN
0
i=1 x1,i x2,i
x x0 du
Ω u∈U 1 2
M12 (U ) =
1
N
1 R
Then we can write the integrated variance and squared-bias components as:
• V =
R
• B=
R
u∈U
V ar[err(u)]du =
σ2
trace[M11 (U )−1 µ11 ]
N
(as with Q-optimality)
2
u∈U [Eerr(u)] du
−1
−1
−1
= θ 02 (µ22 − M21 M−1
11 µ12 − µ21 M11 M12 + M21 M11 µ11 M11 M12 )θ 2
Hence IMSE-optimality would suggest minimizing a criterion function comprised of the sum
of these two terms; IMSE = B + V . This has been tried in some contexts, but it is generally
difficult in most practical cases because we don’t know the relative size of θ 2 and σ 2 . For
contexts in which variance is expected to dominate error (i.e. θ 2 is expected to be small
relative to σ 2 ), ignoring B leads to Q-optimality. Now consider the opposite, where B is
expected to dominate so that V might be ignored. The expression above for squared bias
can be reduced by adding and subracting µ21 µ−1
11 µ12 in the central matrix, leading to a
decomposition of B into two pieces, B = B1 + B2 , with:
• B1 = θ 02 (µ22 − µ21 µ−1
11 µ12 )θ 2
−1
−1
−1
• B2 = θ 02 ((M21 M−1
11 − µ21 µ11 )µ11 (M11 M12 − µ11 µ12 ))θ 2
Note that B1 is not a function of the design; it is determined only by the model form, U and
the unknown value of θ 2 , so we can ignore it for purposes of comparing designs.
This would suggest using B2 as a design criterion, but this is still impractical since θ 2 is
unknown. What can be done, at least in some instances, is to design the experiment so that
B2 = 0 for any value of θ 2 . From the structure of B2 it is immediate that:
• a necessary and sufficient condition for B2 = 0 is:
−1
M−1
11 M12 = µ11 µ12 ,
• a sufficient condition for B2 = 0 is:
M11 = µ11 and M12 = µ12
12
Note that M−1
11 M12 is what we called the “alias matrix” in STAT 512 and STAT 513. The
sufficient condition is noted since it is sometimes easier to deal with in practice. Notice that
this is a fundamentally different approach to design than those above that are associated
with criterion functions. In the former cases (as we’ve noted), φ is a measure of “goodness”
that can be used to rank designs even when they are not optimal. Because this “minimum
bias” argument does not anticipate or “average over” the value of θ 2 , it does not result in
a function φ that is directly associated with a statistical performance measure, but specifies
“all or nothing” conditions that, when met, minimize integrated squared bias of expected
response regardless of this value.
Example: First-order regression model in U = [−1, +1]r
Suppose a first order regression model is fitted in U = [−1, +1]r , but in truth, data are
being generated by a second-order polynomial. Then:
• µ11 = diag(1, 13 , 13 , ... 13 )

• µ12 =
1 R
Ω u∈U


 (u1









1 



 u1 





...
ur
2
u22 ... u1 u2 ...)du =
1 0
1
3
0
0
...
00

00 

00 


... 

00
The pattern of nonzeros in these matrices match what we would have in M11 and M12
for a regular fractional factorial of resolution at least 3, but if the fraction were scaled so
that all the values of ui in the design were ±1, these nonzeros would each be 1. A little
thought suggests that the sufficient condition for minimum integrated bias can be achieved
by rescaling such a design so that each ui is
±1
√
.
3
This is what intuition (and the figure a
few pages earlier) suggests; minimizing integrated bias generally requires a design that is
“shrunken” away from the boarders of U, relative to a design that is Q-optimal (or most
other versions of optimality based only on variance considerations).
References
Myers, R.H., D.C. Montgomery, and C.M. Anderson-Cook (2009). Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 3rd ed., Wiley,
New York.
Morris, M.D. (2011). Design of Experiments: An Introduction Based on Linear Models,
Chapman and Hall, Boca Raton, FL.
Silvey, S.D. (1980). Optimal Design: An Introduction to the Theory for Parameter Estimation, Chapman and Hall, London.
13
Download