Optimal Design Theory for Linear Models, Part II

advertisement
Optimal Design Theory for Linear Models, Part II
Design as a Probability Measure
As discussed above, the construction of optimal designs is generally viewed as a scalarvalued optimization problem to be solved either analytically or numerically. But finding
a design U that maximizes or minimizes φ(M(U )) can be very difficult, especially when
N × r (the usual number of arguments over which the optimization must be performed)
is large. This is made even more difficult in some cases by the inherent discrete nature of
the optimization (i.e. where replacing one design point with another is not a “continuous”
change in the design), restricting the applicability of many standard analytical optimization
techniques.
In response to this, continuous design theory has been developed as an alternative framework for optimal design study and construction. Formally, this is done as follows:
• Require that X be a compact subset of Rk .
• Define H to be the class of probability distributions on Borel sets of X .
• Regard any η ∈ H as a design measure.
This changes the problem from one of finding an optimal set of N discrete points in U (and
therefore also in X ) to one of finding a “weight function” that integrates to one over X . As
a result, we now generalize our definition of a the design moment matrix to be:
M(η) = Eη (xx0 )
Note that this is actually consistent with our previous formulation, where each discrete value
of x in the design was given an equal probabiliy, or “weight”, of
1
.
N
The point of this generalization is, essentially, to “remove N from the problem”. As
a result, the optimal “designs” found often cannot really be implemented in practice, but
must be “rounded” to “exact” designs that weight each of a discrete set of points with
probabilities that are multiples of
1
.
N
For this reason, the generalization is sometimes called
approximate design theory. Despite this disadvantage, we will see that it allows us to bring
some powerful mathematical results to bear on the design problem that greatly improve our
ability to construct optimal or near-optimal designs.
A practical constraint that goes with adopting the design measure approach is the requirement that for any design criterion φ considered,
φ(c1 M) = c2 φ(M) for any psd M, for positive constants c1 and c2
1
This requirement essentially says that for any collection of design measures, the ranking
implied by φ would be the same as that for the same set of design measures redefined
so that they integrate to a constant other than one. Fortunately, all of the optimality
criterion functions discussed above (as well as most others that make any sense) satisfy this
requirement.
It is also useful at this point to simplify things by agreeing to define criterion functions φ
so that “large is good”. Hence for a given problem, we seek the design measure or measures
η∗ that maximize φ(M(η)) over H. The above implies that if:
• η∗ assigns positive probability to only a finite set of points, and
• the probability assigned to the ith of these points can be written as ri /N for finite
integers ri and N
then the “exact” N -point design with ri replications at the corresponding unique point must
be φ-optimal among all N -point “exact” designs. If η∗ does not have this character, then
“rounding” to a near-optimal “exact” design – which may also be optimal among exact
designs – is necessary.
Example: SLR again
Let H be the class of all probability distributions over U = [−1, +1]. Then for any design
measure η ∈ H,


1
Eη (u) 
M(η) = 
|M(η)| = Eη (u2 ) − Eη (u)2 = V arη (u)
2
Eη (u) Eη (u )
Eη (u2 ) can be no greater than 1, and Eη (u)2 can be no smaller than 0, so
η∗ : u =









+1 prob
−1 prob
1
2
1
2








other prob 0 
is the D-optimal design measure. Hence for any even N



N/2 u0 s = +1 
U =
N/2 u0 s = −1 
is D-optimal among all discrete (exact) designs. For odd N , we need to round η∗ ; we might
guess that
U=


(N + 1)/2 u0 s = +1 


(N − 1)/2 u0 s = −1 
2
is “D-good”, but this result doesn’t prove it. (Actually, we did prove this the hard way for
N = 3 earlier.)
Some Assurances against what should be your Worst Fears
Question: Might it be possible that for some problem, the only optimal design measure η∗
is, say, the uniform distribution over all of a continuous space X , so that no exact design in
finite N approximates it well?
Answer: Let M be the class of all design moment matrices generated by H over X :
M(η) ∈ M
η ∈ H.
iff
M is a convex set, i.e.
• If M1 ∈ M and M2 ∈ M and 0 < t < 1
• then tM1 + (1 − t)M2 ∈ M
This can be proved immediately by simple construction; the “mixture M” belongs to the
design measure that is the indicated weighted average of η1 and η2 . This is what you need
to invoke a form of Caratheodory’s Theorem:
Any M ∈ M can be expressed as:
M=
PI
i=1
for xi ∈ X , λi > 0, i = 1, 2, 3, ..., I, and
λi xi x0i
PI
i=1
λi = 1 (i.e. a distribution with I
points of support), and such an arrangement exists for which:
• I ≤ k(k + 1)/2 + 1 if M is on the interior of M
• I ≤ k(k + 1)/2 if M is on the boundary of M
So, while there may design measures that cannot be well-approximated by exact designs,
for any value of φ (including the maximum one), there will always be at least one optimal
design measure on no more than the indicated finite number of support points.
Example: SLR again
Let η1 assign uniform probability across U = [−1, +1], or across X here since x0 = (1, u).
For this design measure

M(η1 ) =
R
X
xx0 η(x)dx =  R 1
1
1
−1 2 udu
Now also define
3

R1 1
udu
−1 2

R1 1 2
−1 2 u
du


1 0 
=
0 13





η2 : u = 



q
1
q3
− 13
+
prob
prob
1
2
1
2
other prob 0









It is not difficult to show that M(η2 ) = M(η1 ). Caratheodory’s result guarantees an equivalent design measure exists on no more than 4 points; η2 satisfies the requirement in one
fewer points.
Preparing for some Theory
A basic requirement that will be needed in order to develop a more powerful mathematical
threory of optimal design is that φ be a concave function of M, that is, for any M1 and
M2 ∈ M, and for any 0 < t < 1, we require that:
φ(tM1 + (1 − t)M2 ) ≥ tφ(M1 ) + (1 − t)φ(M2 )
Note that the “mixed” M on the left side is guaranteed to be an element of M by our
requirement that M be a convex set. The concavity requirement says that the mixture of
two designs is at least as good as the same mixture of φ values applied to the two designs
individually. Again, this is true of most commonly used optimality criteria. The most
important exception is that |M| is not concave in all cases, and we repair this by changing
the D-optimality criterion function to log|M|, which is concave.
Now, define for any φ and design moment matrices M1 and M2 from M, a quantity we
will call a Frechet difference:
D (M1 , M2 ) = 1 [φ{(1 − )M1 + M2 } − φ{M1 }], ∈ [0, 1]
In the following, we shall use the fact that requiring φ to be concave implies that D (M1 , M2 )
is a non-increasing function of ∈ [0, 1]:
slope = Dϵ(M1,M2)
φ
(1-ϵ)M1+ϵM2
ϵ
M2
M1
4
This can be proven as follows:
φ((1 − )M1 + M2 ) ≥ (1 − )φ(M1 ) + φ(M2 )
1
[φ((1
(concave φ)
− )M1 + M2 ) − φ(M1 )] ≥ φ(M2 ) − φ(M1 )
(algebra)
D (M1 , M2 ) ≥ D1 (M1 , M2 )
(defn of D )
D (M1 , M2 ) = 1 D1 (M1 , (1 − )M1 + M2 )
(defn of D )
1
D (M1 , (1 − )M1 + M2 ) ≥ 1 D1 (M1 , (1 −
δ
1
D (M1 , (1 − )M1 + M2 ) ≥ D (M1 , M2 )
δ
1
[φ((1 − δ)M1 + δM2 ) − φ(M1 )] ≥ 1 [φ((1
δ
)M1 + M2 ),
δ ∈ [0, 1]
(same argument)
(combining)
− )M1 + M2 ) − φ(M1 )]
(algebra)
Dδ (M1 , M2 ) ≥ D (M1 , M2 )
Introduction of the “Frechet difference” is really an intermediate step to introducing the
Frechet derivative of φ at M1 , in the direction of M2 . (It should not be surprising that
our desire to optimize a function would involve a derivative of some kind, especially after
we’ve gone to the trouble of generalizing the design problem to a continuum setting.) This
is one of a collection of so-called “directional derivatives”, and is the one most commonly
used in continuous design optimality arguments. (There is also a simpler Gateaux derivative
discussed by Silvey in Chapter 3; some results depending on F-derivatives can be made using
intermediate results based on G-derivatives.)
Def ’n: The Frechet derivative of φ at M1 , in the direction of M2 , is
Fφ (M1 , M2 ) = lim→0+ D (M1 , M2 ).
Frechet Derivative: D-optimality
φ = log|M|.
Fφ (M1 , M2 )
= lim→0+ 1 [φ {(1 − )M1 + M2 } − φ {M1 }]
[−]
= log|(1 − )M1 + M2 | − log|M1 |
[−]
= log|(1 − )M1 + M2 | + log|M−1
1 |
[−]
= log|(1 − )M1 + M2 ||M−1
1 |
[−]
= log|((1 − )M1 + M2 )M−1
1 | (prod of det’s is det of prod)
[−] = log|(1 − )I + M2 M−1
1 |
Now think about structure of this matrix. The ith diagonal element is 1 − + di where
di is the ith diagonal element of M2 M−1
1 . The off-diagonal elements are all of order .
So, the O(1) term in the determinant is 1, and the O() terms are − + di . Therefore,
2
[−] = log(1 − k + trace[M2 M−1
1 ] + O( ))
[−]
Fφ (M1 , M2 )
= −k + trace[M2 M−1
1 ] for small = trace[M2 M−1
1 ]−k
5
Frechet Derivative: A-optimality
φ(M) = −trace[AM−1 ]
= lim→0+ 1 [φ {(1 − )M1 + M2 } − φ {M1 }]
Fφ (M1 , M2 )
= lim
1
n
−trace[A((1 − )M1 + M2 )−1 ] + trace[AM−1
1 ]
o
−1
= trace A lim 1 [M−1
1 − ((1 − )M1 + M2 ) ]
Write the last inverse as M−1
1 + R, solve for R:
((1 − )M1 + M2 )(M−1
1 + R) = I
→ R = ((1 − )M1 + M2 )−1 (I − M2 M−1
1 )
Fφ (M1 , M2 )
= trace A lim
1
o
n
−1
−1
−1
M−1
1 − M1 − ((1 − )M1 + M2 ) (I − M2 M1 )
= trace A lim − 1 ((1 − )M1 + M2 )−1 (I − M2 M−1
1 )
−1
−1
= −traceAM−1
1 + traceAM1 M2 M1
−1
= traceAM−1
1 M2 M1 + φ(M1 )
Frechet Derivative: Exercise
Suppose we decide to make M−1 as “small as possible” by minimizing the sum of squares
of its elements. What is Fφ for this criterion? (There is no obvious statistical meaning, but
it isn’t entierly silly, and makes an interesting exercise. Hint: φ(M) = −trace(M−1 M−1 ).
Review the form of the argument for A-optimality.)
Some Theory
The following results are numbered as in (and closely follow) Silvey’s presentation.
Theorem 3.6: For φ concave on M, η∗ is φ-optimal iff:
Fφ (M(η∗ ), M(η)) ≤ 0 for all η ∈ H
Proof of “if”:
• Fφ (M(η∗ ), M(η)) ≤ 0 (for all η ∈ H)
•
1
[φ {(1
0
− 0 )M(η∗ ) + 0 M(η)} − φ {M(η∗ )}] ≤ 0 (D is nonincreasing)
• first term ≥ (1 − 0 )φ {M(η∗ )} + 0 φ {M(η)} (concave function)
•
1 0
[ φ {M(η)}
0
− 0 φ {M(η∗ )}] ≤ 0 (substitution)
• η∗ is φ-optimal
Proof of “only if”:
6
• η∗ is φ-optimal
• φ{M((1 − )η∗ + η)} ≤ φ{M(η∗ )} (any fixed )
• φ{(1 − )M(η∗ ) + M(η)} ≤ φ{M(η∗ )} (same thing)
• φ{(1 − )M(η∗ ) + M(η)} − φ{M(η∗ )} ≤ 0
• divide by and take the limit, Fφ (M(η∗ ), M(η)) ≤ 0
Really, this says “the obvious”; you are on top of the hill if and only if every direction takes
you “down”. Note that this result is not an especially powerful tool in direct application,
because proving that η∗ is optimal requires that you compare it to every η ∈ H. We’ll have
a more powerful result in the next theorem, but first:
Lemma: Let η1 ...ηs be any set of design measures from H, and λi > 0,
P
i
λi = 1, i = 1...s.
Let φ be finite at M(η). If φ is differentiable at M(η) then:
Fφ (M(η),
Ps
i=1
λi M(ηi )) =
Ps
i=1
λi Fφ (M(η), M(ηi ))
i.e. the derivative in the direction of a weighted average, is the weighted average of the
derivatives in each direction.
Theorem 3.7: For φ concave on M and differentiable at η∗ , η∗ is φ-optimal iff:
Fφ (M(η∗ ), xx0 ) ≤ 0 for all x ∈ X
Proof of “if”:
• Fφ (M(η∗ ), xx0 ) ≤ 0
• Any M(η) =
P
λi xi x0i (and the sum has a finite number of terms, Caratheodory)
• Fφ (M(η∗ ), M(η)) =
P
λi Fφ (M(η∗ ), xi x0i ) (Differentiability Lemma)
• ≤ 0, so η∗ is φ-optimal (Theorem 3.6)
Proof of “only if”:
• by Thm 3.6, since every x is a one-point design
Direct application of Theorm 3.7 requires comparison of η∗ to every x ∈ X , rather than to
every η ∈ H (as in Theorem 3.6).
Example: QLR Consider quadratic linear regression in one predictor variable and set U =
[−1, +1], so that:
7
X = {x = (1, u, u2 ) : −1 ≤ u ≤ +1}
We would like to show that a particular design in D-optimal, and we use the log form of the
criterion to satisfy the concavity requirement:
φ(M) = log|M|.
Theorem 3.7 does not tell us how to construct an optimal design (although we will later
see that it can be used to motivate construction algorithms). So at this point, we have to
“guess” what an optimal design might be, and use the theorem to (hopefully) prove that we
are correct. Since we know that any discrete design will need at least 3 distinct treatments
in order to support estimation of the model, try:
η0 : u =





+1 prob




−1 prob
0 prob





1
3
1
3
1
3




For this design,


3 0 2 


1
M(η0 ) = 3  0 2 0 


2 0 2
For our chosen criterion function, we’ve shown that the Frechet derivative at any design, in
the direction of any one-point design, is:
Fφ (M, xx0 ) = trace(xx0 M) − k = x0 Mx − k
So Theorem 3.7 shows that our design is D-optimal if x0 M(η0 )x ≤ 3 for all x ∈ X . Doing
the algebra shows that for η0 , the quadratic form is 43 [4 − 6u2 (1 − u2 )], which can be no more
than 3. Hence η0 is a D-optimal (continuous) design, and a discrete D-optimal design for
any N that is a multiple of 3 can be formed by putting
1
3
of the points at each of u = −1,
0, and +1.
There is a hint of something worth noticing here! Theorem 3.7 and the example make
a connection between the D-optimality criterion and the maximum value of its Frechet
derivitive “in the direction of” any one-point design:
log|M| and maxx∈X x0 Mx − k
Theorem 3.7 says that the first is maximized (the design is D-optimal) iff the second is no
more than zero. But note that the Frechet derivative for D-opimality is equivalent to the
criterion function for G-optimality. Hence this theorem suggests a link between the two
criteria, although it doesn’t quite say that the two criteria are equivalent (because it doesn’t
8
say you have to minimize the G-criterion ... just that it must be less than zero.) However,
the next result does complete this link.
Theorem 3.9: For φ differentiable on M+ (the subset of M where φ(M) > −∞), and an
optimal design exists, η∗ is φ-optimal iff:
maxx∈X Fφ (M(η∗ ), xx0 ) = minη∈H maxx∈X Fφ (M(η), xx0 ) = 0
(Note that the “= 0” part of this is implied by Silvey’s proof, but is not included in the
theorem statement; he adds this as part of a following corollary.)
Proof:
• For any η, we’ve shown Fφ (M(η),
P
λi M(ηi )) =
P
λi Fφ (M(η), M(ηi ))
• So Fφ (M(η), Eλ (xx0 )) = Eλ Fφ (M(η), xx0 )
• If λ = η, the first is zero, so the second is also zero
• So maxx∈X Fφ (M(η), xx0 ) ≥ 0 for any η, including the one that minimizes maxx∈X Fφ (M(η), xx0 )
• But by Thm 3.7, η is φ-optimal iff maxx∈X Fφ (M(η), xx0 ) ≤ 0
• So minη∈H maxx∈X Fφ (M(η), xx0 ) = 0, where the minimizer is η∗ .
Stated this way, Silvey’s Theorem 3.9 is “part a” of Whittle’s (1973) version of the
General Equivalence Theorem:
For φ concave and differentiable on M+ , if a φ-optimal design exists, the following statements
are equivalent:
1. η∗ is φ-optimal
2. η∗ minimizes maxx∈X Fφ (M(η), xx0 )
3. maxx∈X Fφ (M(η∗ ), xx0 ) = 0
Kiefer and Wolfowitz (1960) gave the first version of an Equivalence Theorem, specifically for
D-optimality (where the Frechet derivative is, apart from “−k”, the quantity to be minimized
for G-optimality) which says that the following statements are equivalent:
1. η∗ is D-optimal
2. η∗ minimizes maxx∈X x0 M(η)−1 x, i.e. is G-optimal
3. maxx∈X x0 M(η∗ )−1 x = k
9
An important reminder: (General) Equivalence Theory holds only for continuous design
measures, so for example, discrete D- and G-optimal designs are not always the same.
Corollary: If η∗ is optimal and φ is differentiable at M(η∗ ), then:
• maxx∈X Fφ (M(η∗ ), xx0 ) = 0 (bottom line of Thm 3.9)
• Eη∗ Fφ (M(η∗ ), xx0 ) = 0 (by letting the λ’s be the weights associated with η∗ in the
second line of the proof)
Where η∗ is discrete (and it always can be thanks to Caratheodory’s Theorem), this says
that the largest F-derivative in the direction of any x is zero, and the average F-derivative
in the direction of the x’s included in the design is zero, which implies that the F-derivative
in the direction of each point in the design is zero, that is:
Fφ (M(η∗ ), xi x0i ) = 0, i = 1, 2, 3, ..., I
The last point can be used as a check on a “candidate” optimal design, but isn’t sufficient to
prove optimality if it holds. That is, the F-derivative in another direction might be positive
even if those in the direction of the design support points are zero.
Example: MLR
Here’s an example of how optimal designs can be constructed or verified based on the theory presented to this point. Consider the 3-predictor first-order multiple regression problem
with:
E(y) = θ0 + θ1 u1 + θ2 u2 + θ3 u3 U = [−1, +1]3 , X = 1 × [−1, +1]3
and think about what form an exact D-optimal design for the whole parameter vector might
take. To use the theory we’ve discussed, start with a general probability measure η instead;
what do we know?
• At η∗ , maxx∈X x0 M−1 x = 4 (from equiv. theory)
Without much thought, we know that M = I achieves this; so, for what η’s does this happen?
• M=
PI
i=1
λi xi x0i = I
→
PI
i=1
λi u21,i = 1, etc. (diag’s)
So a D-optimal design can be constructed with mass only on the 8 corner points; what
8-point distributions do this? Order the 8 corner points as:
10


1  + + + + 


2 
+ + + − 





3 
+
−
+
+




4 
+ + − − 





5 
−
+
+
+


6 
+ − + − 





7 
+
−
−
+



8

+ − − −
In coded form, this means:

X0 X =
















1
1234
1256
5678
3478

1357 

2468 

1278
1
1368 


3456
2457 


1458 


1
2367 


1
where, e.g., the (1,2) element means λ1 + λ2 + λ3 + λ4 − λ5 − λ6 − λ7 − λ8 . If we require
that M be an identity matrix, including the “diagonal” implication that the sum of the λ’s
must be 1, this yields a system of 7 equations in 8 unknowns:


+ + + + + + + + 



 + + + + − − − − 




 + + − − + + − − 











λ=
+ − + − + − + − 


+ + − − − − + + 


+ − + − − + − + 

+ − − + + − − +















0 

1 



 0 



0 


0 


0 


0
Solutions to this system are of form:
λ1 = λ4 = λ6 = λ7 ,
λ 2 = λ3 = λ5 = λ8 ,
P8
i=1
λi = 1
So, design measures that put equal weight on the points of I = +ABC, and equal weight
on the points of I = −ABC are D-optimal. When N is any multiple of 4, an exact design
can be constructed that reflects this measure, and so it is D-optimal among all exact design.
(And for large N , there are lots of choices, e.g. 3 points at each of I = +ABC and 5 points
at each of I = −ABC when N = 64, et cetera.) Note that in this argument, we’ve not
explicitly included the requirement that each λi be non-negative ... that’s a “side condition”
that can’t be expressed as a linear equation.
11
The Complication of Singular M
Theorem 3.7 is quite powerful, but depends on φ being differentiable at M(η∗ ). In this
context, differentiability implies that:
Fφ (M(η∗ ), M(η)) =
R
x
Fφ (M(η∗ ), xx0 )η(x)dx
This is true for D- and A- and many other criteria that require nonsingular M, but consider
what happens when we try to do what seems reasonable in a subset-estimation example.
Consider the model y = θ0 + θ1 u1 + θ2 u2 + , with U = the quadrilateral with corners:
(−1, 0), (+1, 0), (−2, +1), (+2, +1):
u2
-2
-1
0
1
2
u1
Suppose want to estimate (θ0 , θ1 ) well; θ2 is a nuisance parameter. A partitioning of the
design moment matrix that reflects this is:


M11 m21 
M= 0
m21 m22
The inverse of the upper left corner of M−1 is
M11 −
and we could use φ(M) = log|M11 −
1
m m0
m22 21 21
1
m m0 |.
m22 21 21
But this requires m22 6= 0, which isn’t
really necessary to estimate (θ0 , θ1 ). To see this, let

η1 : u =


(+1, 0) prob

(−1, 0) prob

1 
2
1
2


1 0 0 



M(η1 ) =  0 1 0 


0 0 0
Since u2 = 0 in this design, (θ0 , θ1 ) can be estimated as if this were a simple linear regression
problem, and in fact, this design is D-optimal for the one-predictor model if you restrict U
to u1 in [−1, +1].
But we don’t know that η1 is optimal, so instead, we can define our optimality criterion
to be:
12
φ(M) = log|M11 −
1
m m0 |
m22 21 21
(“form 1”) if M is of full rank
= log|M11 | (“form 2”) if M11 is of full rank and m22 = 0
(Note here that φ(M(η1 )) = 0; this is needed in comparisons to come next.)
Now define an alternative design measure:








(+1, 0) prob
(−1, 0) prob
η2 : u = 

(+2, 1) prob





(−2, 1) prob








1
4
1
4
1
4
1
4









1 0 12 



M(η2 ) =  0 52 0 


1
1
0 2
2
Intuition is hard to use here; this design is on the corners of U, so the entire parameter vector
is estimable. θ̂2 is not orthogonal to θ̂0 , and other things being equal, this should mean that
we won’t be able to estimate (θ0 , θ1 ) as well as with η1 . But by using all of U, this design
“spreads out” the values of u1 more which, other things being equal, should mean that we
can estimate (θ0 , θ1 ) better.
We are interested in looking at the Frechet derivative of φ at η1 in the direction of η2 ,
and in the direction of each of its points of support. First, note that
(1 − )M(η1 ) + M(η2 ) =

1
0




0
1 + 32 1
2
0
1
2



0 

1
2
The optimality criterion (form 1, since this matrix is nonsingular for non-zero ) is
φ(above) =

log 
1

0
0 1+
3
2

− 1 
/2
2
/4 0
0
0


From this, the Frechet derivative is gotten by dividing by and taking the limit (remembering
the φ(M(η1 )) = 0, and after applying l’Hospital’s rule):
Fφ (M(η1 ), M(η2 )) = lim 1 log(1 + − 43 2 ) = 1
But now look at Fφ (M(η1 ), −) for each of the points of support of η2 individually:


1 + 0 



• (1 − )M(η1 ) + M(+1, 0) =  + 1 0 


0
0 0
– φ(above) =

log 
1

+ 
+ 1 (form 2)
– Fφ (M(η1 ), M(+1, 0)) = lim 1 log(1 − 2 ) = 0
• Similarly, Fφ (M(η1 ), M(−1, 0)) = 0
13


1
+2
+ 


• (1 − )M(η1 ) + M(+2, 1) =  +2 1 + 3 +2 



+
+2
+
– φ(above) =

log 



+22 
+2  1  2
−
+2 1 + 3
+22 +42 1
(form 1)
– Fφ (M(η1 ), M(+2, 1)) = lim 1 log((1 − )(1 − )) = −2
• Similarly Fφ (M(η1 ), M(−2, 1)) = −2
So, Fφ at M(η1 ) in the direction of each support point of η2 is non-positive, while Fφ (M(η1 ), M(η2 ))
is positive, so Fφ (M1 ,
P
λi xi x0i ) 6=
P
λi Fφ (M1 , xi x0i ). Theorem 3.7 cannot be used at η1 , but
Thm 3.6 can since it does not depend on differentiability.
What more can be said about this problem? It turns out that design measures that put
all mass on u2 = 0 can’t be optimal; if they were, our η1 would be the best (because it is the
optimal s.l.r. design allowed on this segment of the line), but it isn’t because, for example,
φ(M(η2 )) = log 54 . Further, design measures that put all mass on any other single value of u2
can’t be optimal, because they confound θ0 and θ2 , so at least two values of u2 are needed.
A good guess might be:
ηg : u =

















(+1 + d, d) prob π1 




(−1 − d, d) prob π1 
(+2, 1) prob π2
(−2, 1) prob π2







One could numerically or analytically optimize φ over π1 and d. Since φ will be differentiable
at M(ηg ), one could then ask whether Fφ (M(ηg ), xx0 ) ≤ 0 for all x.
References
Silvey, S.D. (1980). Optimal Design: An Introduction to the Theory for Parameter Estimation, Chapman and Hall, London.
14
Download