Techniques for the construction of robust regression designs

advertisement
The Canadian Journal of Statistics
Vol. 41, No. 4, 2013, Pages 679–695
La revue canadienne de statistique
679
Techniques for the construction of robust
regression designs
Maryam DAEMI* and Douglas P. WIENS
Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Alberta, Canada
T6G 2G1
Key words and phrases: Alphabetic optimality; approximate straight line regression; bias; invariance;
minimax; approximate quadratic regression.
MSC 2010:: Primary 62K05, 62G35; secondary 62J05
Abstract: The authors review and extend the literature on robust regression designs. Even for straight line regression, there are cases in which the optimally robust designs—in a minimax mean squared error sense, with
the maximum evaluated as the “true” model varies over a neighbourhood of that fitted by the experimenter—
have not yet been constructed. They fill this gap in the literature, and in so doing introduce a method of
construction that is conceptually and mathematically simpler than the sole competing method. The technique
used injects additional insight into the structure of the solutions. In the cases that the optimality criteria employed result in designs that are not invariant under changes in the design space, their methods also allow for
an investigation of the resulting changes in the designs. The Canadian Journal of Statistics 41: 679–695; 2013
© 2013 Statistical Society of Canada
Résumé: Les auteurs examinent la littérature en matière de plans expérimentaux robustes pour la régression
et y contribuent. Même pour la régression linéaire, il existe des cas où les plans robustes optimaux – dans
le sens d’erreur quadratique moyenne minimax dont le maximum évalué au « vrai modèle » varie dans
un voisinage de celui ajusté à l’origine – n’ont pas encore été élaborés. Les auteurs comblent cette lacune
en présentant une méthode d’élaboration plus simple, sur les plans conceptuel et mathématique, que la
seule autre méthode existante. La technique utilisée donne une nouvelle perspective sur la structure des
solutions. Lorsque les critères d’optimalité choisis mènent à des plans qui ne sont pas invariants en présence
de changements dans l’espace d’échantillonnage, les méthodes proposées permettent aussi un examen de
ces changements dans les plans. La revue canadienne de statistique 41: 679–695; 2013 © 2013 Société
statistique du Canada
1. INTRODUCTION AND SUMMARY
Consider an experiment designed to investigate the relationship between the amount (x) of a
certain chemical compound used as the input in an industrial process, and the output (Y ) of
the process. The relationship is thought to be linear in x; if this supposition is exactly true then
the design that minimizes the variances of the regression parameter estimates, or the maximum
variance of the predictions, places half of the observations at each endpoint of the interval of input
values of interest. It has however long been known (Box & Draper 1959, Huber 1975) that even
seemingly small deviations from linearity can introduce biases so large as to destroy the optimality
of this classical design, if mean squared error, rather than variance, becomes the measure of design
quality.
In Section 2 of this article we review existing results from the literature on robustness of
design, in the context of the most common formulation of “approximate linear regression.” In this
formulation one allows the “true” regression response to vary over a neighbourhood of that which
* Author to whom correspondence may be addressed.
E-mail: daemi@ualberta.ca
© 2013 Statistical Society of Canada / Société statistique du Canada
680
DAEMI AND WIENS
Vol. 41, No. 4
is fitted by the experimenter. The optimally robust design is then one that minimizes the maximum
of some scalar-valued function of the mean squared error (mse) matrix of the estimates. Common
functions are the trace, determinant or maximum eigenvalue of the mse matrix of the parameter
estimates, and the integrated mse of the predictions—these correspond respectively to the A-, D-,
E-, and I-optimality criteria in classical design theory, where the same functions are applied to the
covariance matrix. In each of these cases one typically finds that the optimally robust design must
minimize the maximum of several functions of the mse matrix. A simple and sometimes effective
method to proceed is then to find the design minimizing a chosen one of these functions, and to
verify that the value of this function, evaluated at the putatively optimal design, is indeed the maximum. We call this a “pure” strategy. When it fails—as it does in a great many common instances,
including robust A- and E-optimality—the literature to this point is almost silent. The sole exception is Shi, Ye, & Zhou (2003), who note that the maximum of several differentiable loss functions
is itself not necessarily differentiable, and go on to apply methods of nonsmooth optimization to
obtain a description of the minimizing designs, with an application to quadratic regression.
It is our aim here to introduce a technique that is conceptually and mathematically simpler
than that of Shi, Ye, & Zhou (2003), and to illustrate its application in the contexts of robust
A- and E-optimality for straight line regression, and robust I-optimality for quadratic regression.
The method is described in Section 3, with examples and computational aspects in Section 4.
The technique used injects additional insight into the structure of the solutions. In addition, we
discuss the changes in the A- and E-optimal designs as the design space changes—even under the
classical criteria these designs are not invariant under transformations of the design space.
Regression responses which are approximately linear, but with a more involved structure than
is used in our examples may, at the cost of greater but similar computational complexity, be
handled in the same manner as illustrated here.
2. ROBUST REGRESSION DESIGNS
Suppose that the experimenter intends to make n observations on a random response variable
Y , at several values of a p-vector f (x) of regressors. Each element of f (x) is a function of q
functionally independent variables x = (x1 , . . . , xq ) , with x to be chosen from a design space X .
The fitted response is f (x)θ. If however this response is recognized as possibly only approximate:
E[Y (x)] ≈ f (x)θ
for a parameter θ whose interpretation is now in doubt, then one might define this target parameter
by
θ = arg min (E[Y (x)] − f (x)η)2 dx
(1)
η
X
and then define
ψ(x) = E[Y (x)] − f (x)θ.
(2)
This results in the class of responses
E[Y (x)] = f (x)θ + ψ(x)
with—by virtue of (1)—ψ satisfying the orthogonality requirement
f (x)ψ(x) dx = 0.
X
The Canadian Journal of Statistics / La revue canadienne de statistique
(3)
DOI: 10.1002/cjs
2013
CONSTRUCTION OF ROBUST DESIGNS
681
Under the very mild assumption that the matrix
f (x)f (x) dx
A=
X
be invertible, the parameter defined by (2) and (3) is unique.
We identify a design, denoted ξ, with its design measure—a probability measure ξ(dx) on X .
If ni of the n observations are to be made at xi we also write ξi = ξ(xi ) = ni /n. Define
M ξ = X f (x)f (x)ξ(dx),
bψ,ξ = X f (x)ψ(x)ξ(dx)
and assume that M ξ is invertible. The covariance matrix of the least squares estimator (lse) θ̂,
assuming homoscedastic errors with variance σε2 , is (σε2 /n)M −1
ξ , and the bias is E[θ̂ − θ] =
M −1
ξ bψ,ξ ; together these yield the mean squared error matrix
σ2
−1
mse θ̂ = ε M −1
+ M −1
ξ bψ,ξ bψ,ξ M ξ
n ξ
of the parameter estimates, whence the mse of the fitted values Ŷ (x) = f (x)θ̂ is
mse[Ŷ (x)] =
σε2 −1
2
f (x)M −1
ξ f (x) + (f (x)M ξ bψ,ξ ) .
n
Loss functions that are commonly employed, and that correspond to the classical alphabetic
optimality criteria, are the trace, determinant, and maximum characteristic root of mse[θ̂], and
the integrated mse of the predictions:
σε2
−2
tr M −1
ξ + bψ,ξ M ξ bψ,ξ ,
n
⎞1/p
⎛
2
−1
σε
2
1
+
b
M
b
ξ
ψ,ξ
ψ,ξ n
σ ⎜
⎟
LD (ξ|ψ) = (det mse[θ̂])1/p = ε ⎝
⎠ ,
n
det M ξ
LA (ξ|ψ) = tr mse[θ̂] =
(4a)
(4b)
LE (ξ|ψ) = chmax mse[θ̂] = sup mse[c θ̂],
LI (ξ|ψ) =
X
(4c)
c=1
σ2 −1
+ bψ,ξ M −1
mse[Ŷ (x)] dx = ε tr AM −1
ξ
ξ AM ξ bψ,ξ +
n
X
ψ2 (x) dx.
(4d)
The dependence on ψ is eliminated by adopting a minimax approach, according to which one
maximizes (4) over a neighbourhood of the assumed response. This neighbourhood is constrained
by (3) and by a bound
X
ψ2 (x) dx ≤
τ2
n
(5)
for a given constant τ. That the bound in (5) be O(n−1 ) is required for a sensible asymptotic
treatment based on the mse—it forces the bias of the estimates to decrease at the same rate as
their standard error. Of course if n is fixed it can be absorbed into τ.
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
682
DAEMI AND WIENS
Vol. 41, No. 4
Although other neighbourhood structures are possible—see for instance Marcus & Sacks
(1976), Pesotchinsky (1982), and Li & Notz (1982)—that outlined above, a simplified version of
which was introduced by Huber (1975) in the context of approximate straight line regression and
which was then extended by Wiens (1992), has become the most common. Huber (1975) also
discusses and compares various approaches to model robustness in design.
A virtue of the neighbourhood given by (3) and (5) is its breadth, allowing for the modelling
of very general types of alternative responses. This breadth however leads to the complicating
factor that the maximum loss, over , of any implementable, hence discrete, design is necessarily
infinite; this leads to the derivation of absolutely continuous design measures and their subsequent approximation—see the discussion in Wiens (1992), where the comment is made that “Our
attitude is that an approximation to a design which is robust against more realistic alternatives
is preferable to an exact solution in a neighbourhood which is unrealistically sparse.”
If X is an interval, as will be assumed in our examples, then the implementation placing one
observation at each of the quantiles
−1 i − 1/2
xi = ξ
, i = 1, . . . , n,
(6)
n
converges weakly to ξ and is the n-point design closest to ξ in Kolmogorov distance (Fang &
Wang 1994). Of course one might elect to replicate the design at a smaller number of quantiles,
for instance to test for heteroscedasticity. Daemi (2012) discusses other possibilities.
To describe the maximized loss functions max L(ξ|ψ), let m(x) be the density of ξ and define
matrices
H ξ = M ξ A−1 M ξ ,
Kξ =
f (x)f (x)m2 (x) dx
X
and
Gξ = Kξ − H ξ =
X
[(m(x)I p − M ξ A−1 )f (x)][(m(x)I p − M ξ A−1 )f (x)] dx.
Wiens (1992), with further details in Heo, Schmuland, & Wiens (2001), shows that the maximum
losses are attained in a certain “least favourable” class {ψβ |β = 1} in which bψ,ξ = bψβ ,ξ =
1/2
√τ G
β. The maximization of the losses in (4) now requires only that they be evaluated at bψβ ,ξ ,
n ξ
and that the resulting quadratic forms in β then be maximized over the unit sphere. Define
ν=
τ2
∈ [0, 1],
σε2 + τ 2
representing the relative importance, to the experimenter, of errors due to bias rather than to
variance. Then the maximum values of LA , LD , LE , and LI , respectively are (σε2 + τ 2 )/n times
−1
−1
lA (ξ) = (1 − ν)tr M −1
ξ + νchmax M ξ Gξ M ξ ,
1/p
ν
1 + 1−ν
chmax M −1
ξ Gξ
lD (ξ) = (1 − ν)
,
det M ξ
−1
−1 lE (ξ) = chmax (1 − ν)M −1
ξ + νM ξ Gξ M ξ ,
−1
lI (ξ) = (1 − ν)tr AM −1
ξ + νchmax K ξ H ξ .
The Canadian Journal of Statistics / La revue canadienne de statistique
(7a)
(7b)
(7c)
(7d)
DOI: 10.1002/cjs
2013
CONSTRUCTION OF ROBUST DESIGNS
683
These maxima are presented in complete generality—they hold for any independent variables
x and any vector f (x) of regressors. In contrast the continuation of the minimax problem—
minimizing l(ξ)—is highly dependent on the form of the model being fitted. Except in some
simple cases it leads to substantial numerical work.
In the following examples we use the definitions
μj =
xj m(x) dx,
X
κj =
X
xj m2 (x) dx.
Example 1. We illustrate some of the issues in the case of straight line regression (f (x) = (1, x) )
over an interval X = [−1/2, 1/2], under the greatly simplifying but realistic requirement that the
design be symmetric. Then A, M ξ , Kξ , and H ξ are diagonal matrices, with diagonals (1, 1/12),
(1, μ2 ), (κ0 , κ2 ), and (1, 12μ22 ), respectively. We find that
1
κ2
+ ν max κ0 − 1, 2 − 12 ,
lA (ξ) = (1 − ν) 1 +
(8a)
μ2
μ2
⎞1/2
⎛
ν
1 + 1−ν
max κ0 − 1, μκ22 − 12μ2
⎠ ,
lD (ξ) = (1 − ν) ⎝
(8b)
μ2
(1 − ν)
κ2
lE (ξ) = max (1 − ν) + ν(κ0 − 1),
+ν
−
12
,
μ2
μ22
1
κ2
lI (ξ) = (1 − ν) 1 +
+ ν max κ0 ,
.
12μ2
12μ22
(8c)
(8d)
Note that in each of these cases, the maximized loss is of the form
l(ξ) = max{l1 (ξ), l2 (ξ)}.
(9)
The “pure” strategy referred to in Section 1 would have the designer choose one of l1 (ξ), l2 (ξ)—for
definiteness, suppose he chooses l1 (ξ)—and find a minimizing design ξ1 . If it can then be verified
that
l1 (ξ1 ) ≥ l2 (ξ1 ),
(10)
then the minimax design has been found: for any other design ξ we have
l(ξ) = max{l1 (ξ), l2 (ξ)} ≥ l1 (ξ) ≥ l1 (ξ1 ) = max{l1 (ξ1 ), l2 (ξ1 )} = l(ξ1 ).
If (10) fails, then the designer instead finds ξ2 minimizing l2 (ξ) and hopes to verify that
l2 (ξ2 ) ≥ l1 (ξ2 ).
(11)
For lI , Huber (1975) carried out this process and verified that ξ1 is the minimax design. This was
extended to lD by Wiens (1992), who also found that for lA and lE the strategy succeeds only
for ν ≤ νA = 0.692 and ν ≤ νE = 0.997, respectively. For these values of ν the designs ξ2 are
minimax; for larger values both inequalities (10) and (11) fail.
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
684
DAEMI AND WIENS
Vol. 41, No. 4
Those A- and E-minimax designs that have been obtained from the pure strategy will be
illustrated in Section 4, where they will be exhibited as special instances of the more general
strategy to be introduced in the next section. In the examples of Section 4 we also present the
minimax designs for the remaining values of ν and discuss the designs for more general design
spaces X = [−T, T ].
Example 2. For quadratic regression (f (x) = (1, x, x2 ) ) over X = [−1/2, 1/2], again assuming symmetry, the pure strategy fails drastically. For instance in the case of robust I-optimality,
we have
⎞
⎞
⎛
⎞
⎛
⎛
1
0 1/12
1 0 μ2
κ0 0 κ2
⎟
⎟
⎜
⎟
⎜
⎜
A = ⎝ 0 1/12 0 ⎠ , M ξ = ⎝ 0 μ2 0 ⎠ , Kξ = ⎝ 0 κ2 0 ⎠ ,
1/12 0 1/80
μ2 0 μ4
κ2 0 κ4
whence
1
240μ22 − 40μ2 + 3 def
+
= 1 + θ0 (ξ).
12μ2
240(μ4 − μ22 )
tr AM −1
ξ =1+
We calculate that
⎛
⎜
Kξ H −1
ξ =⎝
θ 1 κ 0 + θ 2 κ 2 0 θ 2 κ0 + θ 3 κ 2
0
κ2 θ4
0
⎞
⎟
⎠
θ1 κ2 + θ2 κ4 0 θ2 κ2 + θ3 κ4
with θj = θj (ξ), j = 1, 2, 3, 4, given by
θ1 =
240μ24 − 40μ2 μ4 − 3μ22
,
240(μ4 − μ22 )2
θ2 =
20μ22 + 20μ4 − 240μ2 μ4 − 3μ2
,
240(μ4 − μ22 )2
θ3 =
240μ22 − 40μ2 + 3
,
240(μ4 − μ22 )2
θ4 =
1
.
12μ22
The characteristic roots of Kξ H −1
ξ are
ρ1 (ξ) = θ4 κ2
and the two characteristic roots of
θ 1 κ 0 + θ 2 κ 2 θ2 κ 0 + θ 3 κ 2
θ 1 κ 2 + θ 2 κ 4 θ 2 κ 2 + θ 3 κ4
.
Of these two roots, one is uniformly larger than the other, and is
θ 1 κ0 + θ 3 κ4
ρ2 (ξ) = θ2 κ2 +
+
2
θ 1 κ0 − θ 3 κ4
2
1/2
2
+ (θ2 κ0 + θ3 κ2 )(θ1 κ2 + θ2 κ4 )
The Canadian Journal of Statistics / La revue canadienne de statistique
.
DOI: 10.1002/cjs
2013
CONSTRUCTION OF ROBUST DESIGNS
685
(The third root has a minus sign in front of the radical.) Thus the loss is again of the form (9),
with
l1 (ξ) = (1 − ν)(1 + θ0 (ξ)) + νρ1 (ξ),
l2 (ξ) = (1 − ν)(1 + θ0 (ξ)) + νρ2 (ξ).
Further investigation reveals that the pure strategy fails for all ν ∈ (0, 1). Some details are in
Heo (1998). We return to this example in Section 3.3.
3. A GENERAL CONSTRUCTION STRATEGY
In Examples 1 and 2 of Section 2, in the cases in which minimizing designs have been obtained,
they were found by introducing Lagrange multipliers to handle the various constraints, and then
using variational methods to solve the ensuing unconstrained problems. We will continue this
approach, but apply it in the context of the following theorem, whose proof is in the Appendix.
Theorem 1. Given loss functions {lj (ξ)}Jj=1 depending on designs ξ, a minimax design ξ∗ ,
minimizing the loss
l(ξ) = max {lj (ξ)},
1≤j≤J
can be obtained as follows. Partition the class of designs as = ∪Jk=1 k , where
.
k = ξ lk ξ = max lj ξ
1≤j≤J
For each j ∈ {1, . . . , J} define ξj to be the minimizer of lj (ξ) in j . Then ξ∗ is the design ξj∗ for
which j ∗ = arg min1≤j≤J {lj (ξj )}.
While it would be possible to write down general expressions for the form of ξ∗ resulting from
Theorem 1, it is simpler to take advantage of the structure of particular cases.
3.1. Straight Line Regression; A-optimality
As in Example 1 of Section 2, we have
1
l1 (ξ) = (1 − ν) 1 +
+ ν(κ0 − 1),
μ2
κ2
1
l2 (ξ) = (1 − ν) 1 +
+ν
− 12 .
μ2
μ22
(12a)
(12b)
−1
The eigenvalues of M −1
ξ Gξ M ξ appearing in (7a), (8a), and (12) are
κ0 − 1 =
κ2
− 12 =
μ22
DOI: 10.1002/cjs
1/2
−1/2
1/2
−1/2
(m(x) − 1)2 dx,
x2
m(x)
− 12
μ2
(13a)
2
dx.
(13b)
The Canadian Journal of Statistics / La revue canadienne de statistique
686
DAEMI AND WIENS
Vol. 41, No. 4
It is simplest to first optimize subject to μ2 being fixed. Thus we first find ξ1 , with density
m1 = arg min
1/2
−1/2
(m(x) − 1)2 dx,
determined subject to
1/2
= 1,
−1/2 m(x) dx
1/2
1/2
−1/2
−1/2 x
(m(x) − 1)2
2 m(x) dx
− x2
m(x)
μ2
= μ2 ,
− 12
2
− δ2
dx = 0,
where δ is a slack variable. By the theory of Lagrange multipliers (Pierre 1986, chapter 2 for
instance), we may instead consider the unconstrained problem of minimizing
⎤
(m(x) − 1)2 − 2λ1 m(x) − 2λ2 x2 m(x)
2 ⎦ dx
⎣
(m; λ1 , λ2 , λ3 ) =
2 − x2 m(x) − 12
(m(x)
−
1)
+λ
−1/2
3
μ2
⎡
1/2
with the multipliers determined by the side conditions (and (μ2 , δ) chosen to minimize the resulting
loss).
It is sufficient to minimize the integrand of pointwise over m(x) ≥ 0 (the requirement of
symmetry turns out to be satisfied unconditionally); this yields the minimizing density
m1 (x) =
a1 x 2 + b 1
c1 x 2 + d 1
+
(14)
(the positive part), where
a1 = λ2 − 12λ3 /μ2 ,
b1 = 1 + λ 1 + λ 3 ,
c1 = −λ3 /μ22 ,
d1 = 1 + λ 3 .
The significance of the current approach is that we now know that m1 (x) is of the form (14).
Note that this density is overparameterized—if a1 = 0 we can divide all constants by it, thus
obtaining the density
m(x; b, c, d) =
x2 + b
cx2 + d
+
;
(15)
if a1 = 0 then (14) is a limiting version of (15). The densities minimizing l1 (ξ) and l2 (ξ)
unconditionally are of the form (15), with c = 0 and d = 0, respectively.
A parallel development, minimizing l2 (ξ), shows that the minimizing density m2 (x) is also of
the form (15). Thus it is sufficient to restrict attention to the class of designs ξ with densities
of this form.
For the numerical work, it is now more convenient to proceed directly, and employ a numerical minimizer to find constants b∗ , c∗ , d∗ for which the design ξ∗ , with density m(x; b∗ , c∗ , d∗ )
The Canadian Journal of Statistics / La revue canadienne de statistique
DOI: 10.1002/cjs
2013
CONSTRUCTION OF ROBUST DESIGNS
687
minimizes l(ξ) in . The computational details, and the end results of this process, are discussed
in Section 4.
3.2. Straight Line Regression; E-optimality
The development for E-optimality is completely analogous to that for A-optimality, the only
difference being that the losses (12) are to be replaced by
l1 (ξ) = 1 − ν + ν(κ0 − 1),
1−ν
κ2
l2 (ξ) =
+ν
−
12
.
μ2
μ22
(16a)
(16b)
A consequence is that in the range ν ∈ [0, νA ], for which the pure strategy succeeds and
yields an A-optimal design ξ∗ minimizing l2 (ξ) and satisfying l2 (ξ∗ ) > l1 (ξ∗ ), this design is also
E-optimal. To see this denote the losses (12) by l1,A , l2,A and the losses (16) by l1,E , l2,E . Since
l2,E (ξ) = l2,A (ξ) − (1 − ν), ξ∗ also minimizes l2,E (ξ). Since l2,A (ξ∗ ) > l1,A (ξ∗ ) we have that
κ2 (ξ∗ )
− 12 > κ0 (ξ∗ ) − 1.
μ22 (ξ∗ )
(17)
But for any design on χ = [−1/2, 1/2],
1−ν
> 1 − ν.
μ2 (ξ)
(18)
Combining (17) and (18) we have that l2,E (ξ∗ ) > l1,E (ξ∗ ) and so the pure strategy also yields the
E-optimality of ξ∗ .
3.3. Quadratic Regression; I-optimality
Recall Example 2 of Section 2. We carry out the minimization of l1 (ξ) subject to l1 (ξ) ≥ l2 (ξ),
and of l2 (ξ) subject to l2 (ξ) ≥ l1 (ξ), initially for fixed values of μ2 (ξ) and μ4 (ξ). This fixes all θj ,
j = 0, . . . , 4, and so converts the inequalities between l1 and l2 into inequalities between ρ1 and
ρ2 . Thus for the first of these minimizations we solve
m1 = arg min
1/2
−1/2
x2 m2 (x) dx,
(19a)
subject to
1/2
m(x) dx = 1,
(19b)
x2 m(x) dx = μ2 ,
(19c)
x4 m(x) dx = μ4 ,
(19d)
ρ1 (ξ) − ρ2 (ξ) − δ2 = 0
(19e)
−1/2
1/2
−1/2
1/2
−1/2
for a slack variable δ.
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
688
DAEMI AND WIENS
Vol. 41, No. 4
For any design ξ with density m, define
ξ(t) = (1 − t)ξ1 + tξ,
0≤t≤1
with density m(t) (x) = (1 − t)m1 (x) + tm(x). To solve (19) it is sufficient to find ξ1 for which
(t; λ1 , λ2 , λ3 , λ4 ) =
1/2
−1/2
x2 m2(t) (x) − 2λ1 m(t) (x) − 2λ2 x2 m(t) (x) − 2λ3 x4 m(t) (x) dx
− λ4 {ρ1 (ξ(t) ) − ρ2 (ξ(t) )}
is minimized at t = 0 for each absolutely continuous ξ, and satisfies the side conditions.
The first order condition is
1/2
0 ≤ (0; λ1 , λ2 , λ3 , λ4 ) = 2
[{d1 + f1 x2 + e1 x4 }m1 (x)
−1/2
−{a1 + b1 x2 + c1 x4 }](m1 (x) − m(x)) dx
for all m(·), where in terms of
θ 1 κ0 − θ 3 κ4
,
2
K2 = θ2 κ0 + θ3 κ2 ,
K1 =
K 3 = θ 1 κ 2 + θ 2 κ4 ,
K = K12 + K2 K3 ,
all evaluated at ξ1 , we have defined a1 = λ1 , b1 = λ2 , c1 = λ3 and
θ1
K1
d1 = λ4
1+ √
,
2
K
(K2 θ1 + K3 θ2 )
√
f1 = 1 − λ4 θ4 − θ2 −
,
2 K
e1 =
λ4 ((K3 − K1 )θ3 + K2 θ2 )
√
.
2 K
It follows that m1 (x) is necessarily of the form
m1 (x) =
a1 + b 1 x 2 + c 1 x 4
d1 + f 1 x 2 + e 1 x 4
+
.
A similar calculation shows that m2 (x) is also of this form. Arguing as in Section 3.1 we consider
densities of the form
+
a + bx2 + cx4
m(x; a, b, c, d, e) =
,
(20)
d + x2 + ex4
and will, in Section 4, find constants a∗ , b∗ , c∗ , d∗ , e∗ for which the design ξ∗ , with density
m(x; a∗ , b∗ , c∗ , d∗ , e∗ ), minimizes the loss (9) in the class of designs with densities of the form
(20). That the density should be of this form was also noted by Shi, Ye, & Zhou (2003) using
The Canadian Journal of Statistics / La revue canadienne de statistique
DOI: 10.1002/cjs
CONSTRUCTION OF ROBUST DESIGNS
8
6
λ
2
4
2
0
689
8
λ1
minmax loss
comparative eigenvalues
2013
0
0.5
(a)
6
4
2
0
1
3
0
0.5
(b)
1
3
2
m(x)
m(x)
ν = 0.25
1
0
−0.5
0
(c)
0
−0.5
0.5
m(x)
m(x)
0
(d)
0.5
3
ν = 0.75
1
0
−0.5
ν = 0.5
1
3
2
2
2
1
0
(e)
0.5
0
−0.5
ν = 0.998
0
(f)
0.5
Figure 1: A-minimax designs ξ∗ for approximate straight line regression. (a) Eigenvalues (λ1 (ξ∗ ), λ2 (ξ∗ ))
versus ν; (b) Loss l(ξ∗ ) versus ν; (c)–(f) minimax densities m∗ (x) and 10-point implementations for
selected values of ν. [Colour figure can be seen in the online version of this article, available at
http://wileyonlinelibrary.com/journal/cjs]
methods of nonsmooth analysis, and in particular the Lagrange multiplier rule for nonsmooth
optimization (Clarke, 1983).
4. EXAMPLES, IMPLEMENTATIONS, DISCUSSION
The constants determining the minimax designs of Section 3.1–3.3 have been obtained by employing an unconstrained nonlinear optimizer in matlab. The code is available from the authors.
There is of course a constraint involved—the density must integrate to 1—which is handled as
follows. Within each loop, whenever the constants a, b, . . . in (15) or (20) are returned we integrate the resulting function, and then divide all coefficients in its numerator by the value of this
integral, so that a proper density is then passed on to the next iteration. All integrations were
carried out by Simpson’s rule on a 10,000-point grid. We have generally found Simpson’s rule
to be as accurate as, and much quicker than, more sophisticated routines—at least when applied
to quite smooth and cheaply evaluated functions such as are encountered here. See Daemi (2012)
for more details of the computing algorithm.
Example 1 continued To obtain the minimax designs under the A-criterion we carried out
the process described in Section 3.1, obtaining results as illustrated in Figure 1. Plot (a) of this
figure shows the two eigenvalues (λ1 (ξ∗ ), λ2 (ξ∗ ))—these were defined for general ξ in (13a)
and (13b), respectively—as ν varies. As could perhaps have been anticipated, they are equal for
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
DAEMI AND WIENS
8
6
l2
4
2
0
Vol. 41, No. 4
8
l1
minmax loss
comparative losses
690
0
0.5
(a)
6
4
2
0
1
3
0
0.5
(b)
1
3
2
m(x)
m(x)
ν = 0.25
1
0
−0.5
0
(c)
0
−0.5
0.5
0
(d)
0.5
3
2
m(x)
m(x)
ν = 0.5
1
3
ν = 0.75
1
0
−0.5
2
2
1
0
(e)
0.5
0
−0.5
ν = 0.998
0
(f)
0.5
Figure 2: E-minimax designs ξ∗ for approximate straight line regression. (a) Losses (l1 (ξ∗ ), l2 (ξ∗ ))
versus ν; (b) Loss l(ξ∗ ) versus ν; (c)–(f) minimax densities m∗ (x) and 10-point implementations for
selected values of ν. [Colour figure can be seen in the online version of this article, available at
http://wileyonlinelibrary.com/journal/cjs]
ν > νA = 0.692. The loss l(ξ∗ ) is shown in plot (b). At ν = 0 this is that of the classically Aoptimal design, with symmetric point masses at x = ±1/2; at ν = 1 it is determined entirely by
the bias of the continuous uniform measure, which vanishes by virtue of (3). In plots (c)–(f) the
minimax densities m∗ (x) = m(x; b∗ , c∗ , d∗ ) are illustrated for selected values of ν, along with
10-point implementations as at (6).
See Figure 2 for analogous plots of the minimax designs under the E-criterion, in which the
losses (l1 (ξ∗ ), l2 (ξ∗ )) are equal for ν > νE = 0.997. We note that here and in the other optimizations carried out for this article the surface over which one seeks a minimum is quite flat—see
Figure 3 for an illustration—and so it is important to choose starting values carefully and to
then increment ν slowly. The choice of starting values can be facilitated by a knowledge of the
limiting behaviour of the solutions. As ν → 0 the solution tends to the classically optimal design minimizing variance alone; in the limit as ν → 1 the optimal density is uniform. Thus, in
Example 1 we started near ν = 0 with b = −0.24, d = 0 and c the normalizing constant, so that
m(x) = (1 − 0.24/x2 )+ /c, approximating point masses at ±0.5.
Remarks: Both in classical design theory concentrating solely on the variances of the estimates
or predictions, and in our extension of this theory to consideration of bias, the designs constructed
according to the D- and I-criteria are invariant under transformation of the design space, whereas
those under the A- and E-criteria are not. While we have presented our results here for the space
χ = [−1/2, 1/2] only, the methods extend, with minor modifications (resulting entirely from the
The Canadian Journal of Statistics / La revue canadienne de statistique
DOI: 10.1002/cjs
2013
CONSTRUCTION OF ROBUST DESIGNS
691
b* = −0.030099
5.5
5
4.5
2
1.5
−13
1
0.5
x 10
d
0
0
0.4
0.2
0.6
0.8
1
c
c = 0.42644
*
8
6
4
2
−13
1.5
1
0.5
x 10
d
0
−0.08
−0.06
−0.04
−0.02
0
b
d = 8.1218e−014
*
8
6
4
1
0.5
c
0
−0.08
−0.06
−0.04
−0.02
0
b
Figure 3: Loss of E-minimax design ξ∗ in a neighbourhood of two of (b∗ , c∗ , d∗ ) with the third
held fixed; ν = 0.5. [Colour figure can be seen in the online version of this article, available at
http://wileyonlinelibrary.com/journal/cjs]
change in the elements of A), to more general symmetric intervals. Here we discuss the necessary
changes, and the resulting qualitative behaviour of the designs. For the space χ = [−T, T ], (8a)
and (8c) become
1
1 κ2
3
+ ν max κ0 −
lA (ξ) = (1 − ν) 1 +
, 2−
,
μ2
2T μ2
2T 3
1
κ2
3
(1 − ν)
lE (ξ) = max (1 − ν) + ν κ0 −
,
+ν
−
,
2T
μ2
2T 3
μ22
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
692
DAEMI AND WIENS
Vol. 41, No. 4
Table 1: Ten-point implementations of robust A- and E-optimal designs for selected values of ν and T .
ν = 0.25
T = 0.05
T = 0.5
ν = 0.5
T =5
T = 0.05
T = 0.5
T =5
A-criterion
±0.0467
±0.483
±4.79
±0.0461
±0.476
±4.70
±0.0401
±0.448
±4.30
±0.0383
±0.426
±4.01
±0.0333
±0.410
±3.71
±0.0304
±0.373
±3.18
±0.0262
±0.367
±2.89
±0.0223
±0.316
±2.13
±0.0180
±0.313
±1.47
±0.0135
±0.246
±0.772
±0.0467
±0.483
±4.5
±0.0461
±0.476
±4.5
±0.0401
±0.448
±3.5
±0.0383
±0.426
±3.5
±0.0333
±0.410
±2.5
±0.0304
±0.373
±2.5
±0.0262
±0.367
±1.5
±0.0223
±0.316
±1.5
±0.0180
±0.313
±0.5
±0.0135
±0.246
±0.5
E-criterion
respectively, and (13a) and (13b) become
T 1 2
1
m(x) −
=
κ0 −
dx,
2T
2T
−T
T
3
3 2
κ2
2 m(x)
−
=
x
−
dx.
2T 3
μ2
2T 3
μ22
−T
For the robust A-optimal designs, as T decreases the interval on which λ2 (ξ∗ ) > λ1 (ξ∗ )
expands to encompass all ν ∈ [0, 1]. As T increases that interval shrinks, and eventually
λ1 (ξ∗ ) > λ2 (ξ∗ ) for all ν. In each such case the pure strategy would succeed; we compute that
the corresponding ranges are T < TL ≈ 0.002 and T > TU ≈ 1.52. The robust E-optimal designs
behave in the same way, with TL ≈ 0.460 and TU ≈ 1.74—note however that the inequality (18)
and its consequences need no longer hold if T > 1.
See Table 1, where we have detailed some 10-point implementations. Those for T = 0.5 are as
shown in Figures 1 and 2, plots (c) and (d). For ν near 1 all of the designs are essentially uniform—
as is the E-optimal design shown in Table 1 for ν = 0.5 and T = 5. From these figures it appears
that the relationship between the designs and the scaling of the design space is rather subtle, and
that an experimenter who implemented the designs by merely applying the same scaling to the
design points as to the design space could be seriously misled.
Example 2 continued In Figure 4, we present representative results for the robust I-minimax
designs for approximate quadratic regression. Recall that for this case the pure strategy fails
for all ν ∈ (0, 1); that this should be so is now clear from plot (a), which reveals that the two
eigenvalues (ρ1 (ξ∗ ), ρ2 (ξ∗ )) are in fact equal. The loss l(ξ∗ ) in plot (b) varies from that of the
classically I-optimal design when ν = 0 to that of the uniform design when ν = 1; in the latter
−1
case A = M ξ∗ = H ξ∗ = Kξ∗ and so both AM −1
ξ∗ and K ξ∗ H ξ∗ in (7d) are 3 × 3 identity matrices
and l(ξ∗ ) = 1.
In Table 2, we give the values of the constants a, b, c, d, e defining the densities m∗ (x), for the
four values of ν in plots (c), (d), (e), and (f) of Figure 4. These values of ν were also used by Shi, Ye,
The Canadian Journal of Statistics / La revue canadienne de statistique
DOI: 10.1002/cjs
CONSTRUCTION OF ROBUST DESIGNS
10
2
0
0.5
(a)
10
m(x)
minmax loss
ρ
5
0
693
5
ρ1
4
3
2
1
0
1
ν = 0.01
m(x)
comparative eigenvalues
2013
5
0
0.5
(b)
4
1
ν = 0.09
2
0
−0.5
0
(c)
0
−0.5
0.5
0
(d)
0.5
2
1.5
ν = 0.5
m(x)
m(x)
2
1
ν = 0.91
1
0.5
0
−0.5
0
(e)
0.5
0
−0.5
0
(f)
0.5
Figure 4: I-minimax designs ξ∗ for approximate quadratic regression. (a) Eigenvalues (ρ1 (ξ∗ ), ρ2 (ξ∗ ))
versus ν; (b) Loss l(ξ∗ ) versus ν; (c)–(f) minimax densities m∗ (x) and 10-point implementations for
selected values of ν. [Colour figure can be seen in the online version of this article, available at
http://wileyonlinelibrary.com/journal/cjs]
& Zhou (2003), and our plots correspond to their Figure 2, plots (d), (c), (b), and (a), respectively.
They used the parameterization m(x) = {(a + bx2 + cx4 )/(1 + dx2 + ex4 )}+ ; we found (20) to be
more stable numerically. We give their constants, translated to the parameterization (20), as well
as our own in Table 2. It is presumably a result of the extreme flatness of the surface being searched
that the densities are so similar—and result in the almost the same values of the loss—despite the
quite startling differences in the values of the constants.
Table 2: Constants defining the robust I-minimax densities (20) for selected values of ν
ν
a
b
c
d
0.01 0.056 (0.721) −72.120 (−180.946) 341.400 (836.120) 0.004 (0.124)
0.09 0.046 (0.278)
−7.545 (−12.627)
0.5
−0.054 (−0.177)
0.012 (0.022)
0.91 0.074 (5.879)
0.872 (6.005)
0.99 0.191 (28.302)
0.962 (5.784)
51.000 (75.438) 0.009 (0.118)
8.792 (9.621)
112.500 (5.971)
0.007 (0.015)
0.082 (−6.472)
44.590 (28.383) 0.194 (28.662)
e
l(ξ∗ )
0.209 (3.447)
2.222 (2.225)
0.601 (0.176)
2.294 (2.306)
0.503 (0.582)
1.880 (1.881)
9.115 (−5.187) 1.174 (1.176)
43.380 (31.072)
1.020 (1.020)
Corresponding values of Shi, Ye, & Zhou (2003) in parentheses.
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
694
DAEMI AND WIENS
Vol. 41, No. 4
The implementations shown in plots (c)–(f) illustrate an enduring theme in robustness of
design—that a rough guide in constructing such a design is that one should take those point
masses of the classically optimal design, resulting in many replicates but only a small number of
design points, and spread these replicates out into clusters of distinct points in approximately the
same locations as the classically optimal design points.
APPENDIX
Proof of Theorem 1. Let ξ∗ = ξj∗ satisfy the conditions of the theorem and let ξ be any other
design. Suppose that ξ ∈ j . Then by the definition of j , followed by the definition of ξj as
the minimizer of lj in this class, we have
max {lj (ξ)} = lj (ξ) ≥ lj (ξj ).
(A.1)
lj (ξj ) ≥ min {lj (ξj )} = lj∗ (ξj∗ ),
(A.2)
1≤j≤J
This continues as
1≤j≤J
by the definition of j ∗ , and then, since ξj∗ ∈ j∗ , as
lj∗ (ξj∗ ) = max {lj (ξj∗ )}.
1≤j≤J
(A.3)
Linking (A.1)–(A.3) completes the proof that max1≤j≤J {lj (ξ)} ≥ max1≤j≤J {lj (ξj∗ )}, that is, that
䊏
l(ξ) ≥ l(ξ∗ ) for any ξ ∈ .
ACKNOWLEDGEMENTS
This work has been supported by the Natural Sciences and Research Council of Canada. The
presentation has benefited greatly from the incisive comments of two anonymous referees.
BIBLIOGRAPHY
Box, G. E. P. & Draper, N. R. (1959). A basis for the selection of a response surface design. Journal of the
American Statistical Association, 54, 622–654.
Clarke, F. H. (1983). Optimization and Nonsmooth Analysis, Wiley, New York.
Daemi, M. (2012). Minimax design for approximate straight line regression. M.Sc. thesis, University of
Alberta, Department of Mathematical and Statistical Sciences.
Fang, K. T. & Wang, Y. (1994). Number-Theoretic Methods in Statistics, Chapman and Hall: London and
New York Huber, North Holland: Amsterdam Marcus and Sacks, Academic.
Heo, G. (1998). Optimal designs for approximately polynomial regression models. Ph.D. thesis, University
of Alberta, Department of Mathematical and Statistical Sciences.
Heo, G., Schmuland, B., & Wiens, D. P. (2001). Restricted minimax robust designs for misspecified regression
models. The Canadian Journal of Statistics, 29, 117–128.
Huber, P. J. (1975). Robustness and designs. In A Survey of Statistical Design and Linear Models, Srivastava,
J. N., editors. North Holland, pp. 287–303.
Li, K. C. & Notz, W. (1982). Robust designs for nearly linear regression. Journal of Statistical Planning and
Inference, 6, 135–151.
Marcus, M. B. & Sacks, J. (1976). Robust designs for regression problems. In Statistical Theory and Related
Topics II, Gupta, S. S. & Moore, D. S., editors. Academic Press, pp. 245–268
Pierre, D. A. (1986). Optimization Theory with Applications, Dover, New York.
The Canadian Journal of Statistics / La revue canadienne de statistique
DOI: 10.1002/cjs
2013
CONSTRUCTION OF ROBUST DESIGNS
695
Pesotchinsky, L. (1982). Optimal robust designs: Linear regression in Rk . The Annals of Statistics, 10,
511–525.
Shi, P., Ye, J. & Zhou, J. (2003). Minimax robust designs for misspecified regression models. The Canadian
Journal of Statistics, 31, 397–414.
Wiens, D. P. (1992). Minimax designs for approximately linear regression. Journal of Statistical Planning
and Inference, 31, 353–371.
Received 24 November 2012
Accepted 31 May 2013
DOI: 10.1002/cjs
The Canadian Journal of Statistics / La revue canadienne de statistique
Download