Stated Choice Experiments

advertisement
Stated Choice Experiments
Most of these notes closely follow material in the textbook of Street and Burgess (2007).
That book goes well beyond what is discussed here; this material should be considered only
a very brief introduction to the topic.
Introduction and Definitions
In a stated choice experiment, individuals (who are the experimental units) are given
a set of options, and are asked to idenitify the one element of the set they prefer. Street
and Burgess note that applications can be found in settings such as government services
(e.g. where the visitors to national parks are asked about various elements of the provided
services) and business (in the form of market research).
More specifically, a stated choice experiment is defined by a set of choice sets. Each
choice set contains two or more options/alternatives. These choice sets are presented to
respondents/subjects, each of whom is asked to identify the option they most prefer from
each set. A very simple example might involve a survey of a group of employees who are
asked how they prefer commuting to work; the single choice set in this case might be { drive,
catch a bus, walk, cycle, other }. Where the rules of response require that exactly one choice
be reported by each respondent for each choice set, this is sometimes called a forced choice
experiment. For obvious reasons, it is helpful to have options such as “other” and “none of
the above” with forced choice experiments.
Much of what will be discussed here focuses on the setting in which each option in the
choice set can be described by specifying the level for each of a number of attributes. This
is exactly parallel to the idea of defining treatments in factorial experiments. An example
might involve choices among health insurance plans, which are described by the maximum
out-of-pocket expense a patient might be charged, whether the coverage extends to facilities
outside of the patient’s local care network, whether specified pre-existing conditions can be
covered, et cetera.
A number of other variants and specific issues have been addressed in the literature on
stated choice experiments, including:
• binary choice, in which options are not comparative, but responses are only “yes” or
“no” for sets that each contain a single option
• implications for including (or not including) a “none” option in each set
• issues that arise when some combinations of attribute levels are unrealistic or impossible, or when it is known ahead of time that certain combinations of attribute levels
would always be preferred (i.e. a universally perceived best product for the lowest
price)
1
We will not discuss these here, but focus on the basic problem and the construction and
comparison of experimental designs, which in this case amount to selection of choice sets
from among the collection of options to be studied.
The Bradley-Terry Model
The Bradley-Terry model was developed for paired comparison experiments; these correspond to stated choice experiments in which each choice set contains two elements. Suppose for the moment that we denote the entire collection of choices to be compared as
{T1 , T2 , ...Tt }. The model specifies that when Ti is compared to Tj , Ti will be preferred with
probability
P r( Ti is preferred to Tj ) =
πi
, i 6= j; i, j = 1, 2, 3, ..., t
πi + πj
for model parameters πi ≥ 0, i = 1, 2, 3, ..., t, or collectively as the t-vector π. Note that there
is an overparameterization here; multiplying all πi ’s by the same positive number results in
an equivalent model. This ambiguity is eliminated by adding a constraint such as
Q
i
πi = 1.
It should be noted that the Bradley-Terry model is not “nonparametric”, but imposes
a specific structure on the preference probabilities. This is made obvious by observing
that there are t − 1 “free” parameters in the model after imposing the constraint, but
t
2
probabilities being modeled. Because the amount of information in categorical data is
much more limited than that associated with real-valued data, relatively “heavy-handed”
parametric modeling may be necessary to achieve useful results. The constraints implied by
the model form are generally reasonable in this context; they imply that options that are
“relatively good” (π > c for some constant c, say) have probability greater than one-half of
being preferred to options that are “relatively bad” (π < c). A little algebra reveals that the
structure of the model prevents “circular” patterns of the form
and
π3
π3 +π1
>
π1
π1 +π2
>
π2
, π2
π2 +π3 π2 +π3
>
π3
,
π3 +π1
π1
.
π1 +π+2
Suppose that each subject is shown the same choice sets (pairs of choices to be compared),
and that subjects are not asked to repeat any comparison (no replicate comparisons within
subject). Let
nij =


1
if the pair (Ti , Tj ) is included as a choice set

0
if the pair (Ti , Tj ) is not included as a choice set
Suppose that the choice sets are presented to s subjects, and for subject α the experimental
results are represented as:


1
wijα = 
0
when Ti is preferred to Tj
when Tj is preferred to Ti
With the convention that wijα = 0 if nij = 0, we can write the probability function for wijα
2
as
w
w
π ijα πj jiα
fijα (wijα , π) = i
(πi + πj )nij
If we let
Ps
α=1
wijα = wij , and wi =
Pt
j=1
wij , and if we assume independence among all
choices, then the likelihood function for the entire experiment may be written as
π1w1 π2w2 ...πtwt
L(π) = Q
snij
i<j (πi + πj )
The independence assumption, like the parametric model form, can probably be questioned for its realism in some cases. For example, suppose that
to any other choice, and
1
3
2
3
of all subjects prefer T1
prefer T2 to any other choice. Before parameter normalization,
this could be expressed by sayint π1 = 2 and π2 = 1. But in this case, when any subject
prefers T1 to T2 , that subject will also prefer T1 to (for example) T3 , and so the assumption
of independence does not hold. Again, this should probably be viewed as a pragmatic assumption motivated by the need to extract a meaningful inference from data that contain
limited information, but it also should be a warning that some effort should be made to
verify that the assumptions are at least reasonable in any given situation. Possible model
extensions to partially address this (that I’ve not seen, but may well have been explored)
might involve relatively small random effects associated with each subject and treatment,
to allow for more complex subject-conditional probabilities, and so perhaps more realistic
independence assumptions.
Maximum likelihood estimates under the independent-comparisons assumption can be
found by differentiating log(L) with respect to each πi , equating to zero, and iteratively
solving the t resulting equations. Convergence is assured under either of 2 conditions:
• All nij = 1.
• In every possible partition of the objects into two non-empty subsets, some object in
the first set is preferred at least once to some object in the second.
The second condition does not require the first, but is not entirely design-dependent.
That is, there is always the possibility that a given choice is never selected even if it is
directly compared to all other choices. What can be done at the design stage that insures
that the second condition can be met is to require that the design be connected. This means
that for every two options Ti and Tj , it is possible to construct a list of objects:
Ti , Ti1 , Ti2 , ..., Tj
such that each consecutive pair is included as a choice set. Under this constraint, if each
option is chosen at least once (over the entire experiment), then the second condition is
satisfied.
3
The per-choice contribution to the (i, j) element of the Fisher information matrix is from
subject α (i.e. the actual information matrix associated with subject α’s data, divided by
N ) is:
I(π)i,j
∂logfabα (wabα , π)
=
λab E
∂πi
a<b
X
where λab = nab /N and N =
P
i<j
!
∂logfabα (wabα , π)
∂πj
!
nij . For i 6= j, we find that
wijα
1
∂logfabα (wabα , π)
=
−
.
∂πi
πi
πi + πj
wijα is a Bernoulli random variable that takes the value of 1 with probability πi /(πi + πj ).
Assuming independence for the collection of comparisons, this leads to
I(π)i,j = −λij
1
.
(πi + πj )2
Similar considerations lead to an expression for the diagonal elements,
I(π)i,i =
X
λia
a6=i
πa
.
(πa + πi )2 πi
Assuming that the probabilities are the same for each subject, this matrix is also the actual
overall information matrix for the experiment, divided by s × N . (This is in the spirit of
our previous definitions of M, which were normalized by the number of experimental trials
in the design.)
Where choices are re-expressed in factorial-like attribute parameterization, it will be
more convenient to work with the logs of the Bradley-Terry π’s. Define γi = log πi , and
the corresponding t-element vector as γ. Changing variables is fairly straightforward, given
∂γi
∂πi
=
1
.
πi
It follows that the information matrix for γ has elements
I(γ)i,i = πi
X
λi,a
a6=i
πa
,
(πi + πa )2
I(γ)i,j = −λi,j
πi πj
, i 6= j
(πi + πj )2
Following Street and Burgess (who apparently follow others), we also denote I(γ) by Λ(π).
Example:
1.) Suppose t = 4, and all 6 possible option pairs are shown to each of 5 subjects. From
the expressions above:



1
Λ(π) = 
6


π1
πj
j6=1 (π1 +πj )2
−π1 π2
(π1 +π2 )2
−π1 π3
(π1 +π3 )2
−π1 π4
(π1 +π4 )2
P
−π1 π2
(π1 +π2 )2
P
πj
π2 j6=2 (π2 +π
2
j)
−π2 π3
(π2 +π3 )2
−π2 π4
(π2 +π4 )2
4
−π1 π3
(π1 +π3 )2
−π2 π3
(π2 +π3 )2
P
πj
π3 j6=3 (π3 +π
2
j)
−π3 π4
(π3 +π4 )2
−π1 π4
(π1 +π4 )2
−π2 π4
(π2 +π4 )2
−π3 π4
(π3 +π4 )2
P
πj
π4 j6=4 (π4 +π
2
j)








2.) Suppose t = 4, and that choice sets of only 3 comparisons – (T1 , T2 ), (T1 , T2 ), and
(T1 , T3 ) – are shown to each of the 5 subjects. Then:

π1


1
Λ(π) = 
3


πj
j6=1 (π1 +πj )2
−π1 π2
(π1 +π2 )2
−π1 π3
(π1 +π3 )2
−π1 π4
(π1 +π4 )2
P
−π1 π2
(π1 +π2 )2
π1 π2
(π1 +π2 )2
−π1 π3
(π1 +π3 )2
−π1 π4
(π1 +π4 )2
0
0
0
π1 π3
(π1 +π3 )2
0
0
0
π1 π4
(π1 +π4 )2








Here we run into the difficulty we have for experimental design with any nonlinear model
– the information matrix that is relevant for judging the value of an experimental design is
dependent on the model parameters that would be the focus of the experiment – in this case,
the vector π or equivalently γ. If some prior information is known about these parameters,
a single guessed value might be used, or an approach based on a region (“robust design”) or
prior distribution (“Bayes”) might be taken. In this case, a common choice is to design as
if all choices are to be equally preferred, that is πi = 1 for all i = 1, 2, 3, ..., t. For this set of
parameters values, the information matrices for these two examples become:





3 −1 −1 −1 

−1
3 −1 −1 

1 

Λ(1) =
24 
 −1 −1




3 −1 −1 −1 

−1
1
0
0 

1 

 and Λ(1) =
12 
 −1
3 −1 


−1 −1 −1
−1
3
0
1
0
0
0 


1
respectively.
A Representation Based on Attributes
Consider re-expression of the choices available as a specified level for each of k attributes.
Suppose the qth attribute has lq levels (denoted 0, 1, 2, ..., lq −1 as with standard notation for
symbols in orthogonal arrays) so that the number of choices is t =
Qk
q=1 lq .
A particluar choice
can be fully specified by a list of the corresponding attribute levels. For notational specificity,
now assume that the treatments are ordered lexicographically by this list representation;
that is, T1 corresponds to (0, 0, ..., 0), T2 to (0, 0, ..., 1) ... Tlk to (0, 0, ..., lk − 1), Tlk +1 to
(0, 0, ..., 1, 0) ... and finally Tt to (l1 , l2 , ..., lk ).
The aim now will be to express each γi with a linear model in terms of meaningful main
effects, two-factor interactions, et cetera defined from the attributes. Specifically, we will
work with a partitioned model. Toward this end, construct a t − 1 by t matrix B for which
each row is a contrast of unit length, and every two rows are orthogonal. The rows define
factorial contrasts in the elements of γ. We partition B into 3 sets of rows (or sub-matrices
by rows):


B
 1 

B =  B2 



B3
5
The first submatrix B1 contains the set of contrasts in which we have interest, the second
submatrix B2 contains contrasts for which corresponding effects should be included in the
model, but which will be regarded as nuisance parameters, and the third submatrix B3
contains contrasts for which we are willing to assume that the the corresponding effects are
actually zero. The implied linear model for γ is then:
γ = B0 β = B01 β 1 + B02 β 2 + B03 β 3 ,
β3 = 0
For example, suppose t = 8, k = 3, and each attribute has 2 levels. Suppose our interest
centers on estimating the 3 main effect contrasts, we suspect that the two-factor interactions
may be present but have no real interest in them, and that we are willing to assume that
the three-factor interaction is absent. Using the common ± full-rank parameterization, the
B matrix, partitioned as described above, would be:
1
B= √
2 2

−1 −1 −1 −1
















−1 −1
−1
1
1
1
1 −1 −1
1 1


1 1 

1 −1


1 −1 −1

1 −1
1 −1 1 


1 −1 −1 −1 −1
1 1 

1 −1
1 −1 −1
−1
1
1
1
1 −1
1 −1 1 


1 −1 −1 1 

1 −1 −1 1
Denote by B12 the subset of rows contained in both B1 and B2 . Then the information
matrix for β 12 = (β 01 , β 02 )0 is

I(β 12 ) = B12 Λ(π)B012

C11 C12 
=C= 0
.
C12 C22
The information matrix for β 1 , given that β 2 must also be included in the model, is
0
I(β 1 |β 2 ) = C11 − C12 C−1
22 C12
Example, Continued:
Continuing with the earlier t = 4 example, suppose the choices are actually defined by
two 2-level attributes/factors, that we want to make inferences about the two main effects:


1 1 
1 −1 −1
B1 = 
2 −1
1 −1 1
and that while not of direct interest, we are not willing to ignore the possibility of a two-factor
interaction:
1
B2 = ( 1 −1 −1 1 )
2
6
For the two designs specified earlier (all 6 pairs, and only pairs including T1 ), the matrix C
is

1
24 × 4
16 0


 0 16

0




0 
1 0 0 
1


=  0 1 0 

0 


6
0 16
0 0 1
1 


12 × 4 


respectively, and the corresponding matrix I(β 1 |β 2 ) is



1 1 0 
6 0 1

1  3 1 
24 1 3
Structural Properties
Street and Burgess cite Huber and Zwerina (1996) for a set of 4 design properties they
describe as “structural’. These clearly have implications for the statistical properties of
inferences that can be drawn, but are appealing otherwise in their own right:
1. Level Balance: All the levels of each attribute should occur with equal
frequency over all options in the choice sets.
2. Orthogonality: The joint occurrence of any two levels of different attributes
should appear in options with frequencies equal to the product of their
marginal frequencies.
3. Minimal Overlap: The frequency with which an attribute level repeats itself
in each choice set should be as small as possible. If the number of items in
each choice set is fewer than the number of levels for an attribute, then no
attribute level should repeated.
4. Utility Balance: Options within a choice set should be equally attractive to
subjects.
Level balance and orthogonality, in particular, are properties of some of the best designs for
linear models; their connection to statistical optimality may not be as firm here, but they are
certainly reasonable design properties in any case. Minimal overlap might be interpretted
as a sort of “level balance” within each choice set. Utility balance is a bit different from
the others in that it is actually a property of the anticipated responses; it can be partial
justification for a “design assumption” that the π’s are of equal value.
Optimal Designs for Paired Comparisons and Binary Attributes
Chapter 4 of Street and Burgess discusses the form of D- and A-optimal stated choice
designs, for situations in which the choice sets are of size 2, all attributes have 2 levels, and
7

4 −4 
2
1 −1 
1 



=
 1
4
8 −4 
2 −1 



12
−4 −4
8
−1 −1
2
8
the design values of all πi ’s are equal. We only briefly outline their results here; the proofs
of these results involve lengthy (although not especially difficult) algebra.
Their arguments are restricted to designs within a specific class of designs in which: the set
of possible choice pairs is restricted so that each pair of treatment combinations in which there
are v attributres with different levels appears equally often. For example, for k = 2 attributes,
the treatment pairs (00,11) and (01,10) each differ by the levels of v = 2 attributes, and so
should either both appear, or neither should appear. Similarly the treatment pairs (00,01),
(00,10), (01,11), and (10,11) each differ by the levels of v = 1 attributes and should either
all appear, or none should appear. This means that for this (very small) example, the only
three acceptable designs consist of the first set of pairs, the second set of pairs, or both sets.
Within this restriction, the properties of designs can be expressed in terms of:
iv =


1 if all choice pairs differing in v attributes are included

0 otherwise
These quantities are then scaled by the number of comparisons made, av = iv /N .
Limiting attention to this class of designs constitutes a rather major restriction on the
exprimental design process. However, all designs in this class have the appealing feature
that the information matrix for factorial effects, C, is diagonal no matter what effects are
classified as being of interest (contrasts in B1 ), nusance (B2 ), or assumed absent (B3 ).
Result 1 (Lemma 4.1.4, Theorem 4.1.1, and Theorem 4.1.2) The information matrix for
paired comparison designs for estimating main effects, assuming all higher order effects are
absent, is
k
1X
k−1
C=
av
2 v=1
v−1
"
!#
Ik
The D-optimal paired comparison designs for estimating main effects, assuming all higher
order effects are absent, consist of the foldover pairs only; that is all k attributes appear at
different levels in the two options in each choice set, i.e. for which ak =
1
2k−1
and all other
av = 0. This is also the subset of designs that are A-optimal in this context.
Example The following presents most of the contents of Table 4.5 in Street and Burgess,
showing the determinant calculation for the 7 competing designs for k = 3 binary attributes
8
(t = 8 options):
|C|
a1 a2 a3 N 12a1 + 12a2 + 4a3
1
12
0
0
1
12 12 12
+0+0=1
7.234 × 10−5
0
1
12
0
1
12 0 + 12 12
+0=1
5.787 × 10−4
0
0
0 + 0 + 4 14 = 1
1.953 × 10−3
0
1
16
1
1
16 0 + 12 16
+ 4 16
=1
8.240 × 10−4
1
16
1
24
1
28
0
1
4
1
16
1
16
1
1
16 12 16
+ 0 + 4 16
=1
2.441 × 10−4
0
1
1
24 12 24
+ 12 24
+0=1
2.441 × 10−4
1
28
1
1
1
28 12 28
+ 12 28
+ 4 28
= 1 3.644 × 10−4
1
24
1
28
4
The first 3 designs described in this table are comprised of the following choice sets:
• (000, 001), (000, 010), (000, 100), (001, 011), (001, 101), (010, 011), (010, 110), (100, 101), (100, 110), (011, 111), (101, 111), (110, 111)
• (000, 011), (000, 101), (000, 110), (011, 101), (011, 110), (101, 011), (101, 110), (101, 011), (101, 010), (100, 111), (010, 111), (001, 111)
• (000, 111), (001, 110), (010, 101), (100, 011)
The last 4 designs are comprised of combinations of these sets.
Linear Models Intuition from STAT 512
For many nonlinear design problems, it can be difficult to see any parallel at all to parallel
arguments for linear models. However, there are elements in this case that may make the
comparison exercise useful. Consider a 2-level factorial problem in which a first-order linear
model (intercept and main effects) is to be used, and the experiment must be performed
using blocks of size 2, with block effects that are to be considered fixed. The fixed block
assumption means (from STAT 512) that only the difference between the two responses
from each block is informative; the individual values and their average are aliased with the
unknown block effect and so cannot be directly compared to their counterparts from other
blocks. This is not an exact analogue to the stated choice problem we are considering, where
there is no interval-valued scale of measurement on which “the difference between the two
responses from each block” can be recorded. However, the two problems are similar in that
the useful information is confined to a comparison of two matched treatments.
A full factorial design for factors A through F , for which experimental runs are “paired”
with their fold-over counterparts is specified by the generating relation
I = ±AB = ±BC = ... = ±EF
along with all generalized interactions. Note that for k factors, this generating relation
contains k − 1 “independent” words, and so does result in blocks of size 2k−(k−1) = 2. Note
also that all words of even length not listed are included as generalized interactions, e.g.
9
AC = AB × BC, ACDF = AB × BC × DE × F , et cetera. Hence the factorial effects
confounded with the 2k−1 block differences are the 2k−1 − 1 factorial effects of even order.
But now recall that in a regular blocked factorial experiment, all pairs of factorial effects
are either orthogonal or confounded. Hence all effects of odd order are estimable (unbiased
by any other effects). As a result, a first order model could be estimated with full efficiency,
i.e. each independently of the others, and each with standard deviation σ 2 /2k , and so this
design is clearly optimal for this linear models setting. In fact, we know that if we are
willing to assume higher-order odd effects are zero, smaller designs of block-size 2 could be
constructed that are optimal for this problem. (For example, begin with a regular fraction of
resolution III, “double” it by adding all foldover runs, creating a resolution IV design with a
generating relation of only even-length words, and use fold-over pairs as blocks.) Street and
Burgess continue their discussion of optimal designs for the stated choice problem by using
fractional factorial plans in this way.
Result 2 (Lemma 4.1.6, Theorem 4.1.3, and Theorem 4.1.4) The information matrix for
paired comparison designs for estimating main effects and two-factor interactions, assuming
all higher order effects are absent, is

C=
[ 21
Pk
v=1
av
k−1
v−1
]Ik
0

0
[
Pk
v=1 av
k−2
v−1
]Ik(k−1)/2

The D-optimal paired comparison designs for estimating main effects and two-factor interactions, assuming all higher order effects are absent, are given by


av = 
k
)−1
(k+1)/2
(2k−1 k+1
)−1
k/2
(2k−1
v = (k + 1)/2, if k is odd
v = k/2, k/2 + 1, if k is even
and all other av = 0. This is also the subset of designs that are A-optimal in this context.
Example The following presents most of the contents of Table 4.6 in Street and Burgess,
showing the determinant calculation for the 7 competing designs for k = 3 binary attributes
(t = 8 options):
a1 a2 a3 N |C|
1
12
0
0
1 3 1 3
12 ( 24
) ( 12 ) = 4.1862 × 10−8
0
1
12
0
1 3 1 3
12 ( 12
) ( 12 ) = 3.3490 × 10−7
0
0
0
1
16
3 3 1 3
16 ( 32
) ( 16 ) = 2.0117 × 10−7
1
16
1
24
1
28
0
1
4
1
16
1
16
0
1 3 1 3
24 ( 16
) ( 12 ) = 1.4129 × 10−7
1
28
1 3 1 3
28 ( 14
) ( 14 ) = 1.3281 × 10−7
1
24
1
28
4
( 81 )3 03 = 0
1 3 1 3
16 ( 16
) ( 16 ) = 5.9605 × 10−8
10
The Multinomial Logit Model
The multinomial logit model (MNL) appears to be a direct generalization of the BradleyTerry model, extended for the case where choice sets have more than two elements. Again,
suppose that we denote the entire collection of choices to be compared as {T1 , T2 , ...Tt }. The
model specifies that when Ti is compared to Tj1 , Tj2 , ..., Ti will be preferred with probability
P r( Ti is preferred to Tj1 , Tj2 , ... ) =
πi
, i 6= jl ; i, jl = 1, 2, 3, ..., t
P
πi + πjl
for model parameters πi ≥ 0, i = 1, 2, 3, ..., t, or collectively as the t-vector π. Suppose
that each subject is shown the same N choice sets, each comprised of m options, of which
ni1 ,i2 ,...,im compare the specific options Ti1 , Ti2 , ..., Tim , where
ni1 ,i2 ,...,im =
so that N =
P


1
if the pair (Ti1 , Ti2 , ..., Tim ) is included as a choice set

0
if the pair (Ti1 , Ti2 , ..., Tim ) is not included as a choice set
i1 <i2 <...<im
ni1 ,i2 ,...,im . Extending the notation from the paired-comparisons
case, for any specified subject, let
wi1 ,i2 ,...,im =


1
when Ti1 is preferred to Ti2 ...Tim

0
otherwise
with the convention that wi1 ,i2 ,...,im = 0 if ni1 ,i2 ,...,im = 0.
Arguments paralleling those for the basic Bradley-Terry model lead to the per-comparison
information matrix for γ, Λ. Define λi1 ,i2 ,...,im = ni1 ,i2 ,...,im /N . Then:
Pm
Λi1 ,i1 = πi1
X
πi
1
λi1 ,i2 ,...,im Pmj=2 j 2 , Λi1 ,i2 = −πi1 πi2
λi1 ,i2 ,...,im Pm
( j=1 πij )
( j=1 πij )2
i2 <i3 <...<im
i3 <i4 <...im
X
Under the assumption that π1 = π2 = ... = πt = 1, this reduces to:
Λi1 ,i1 =
X
m−1
1
λi1 ,i2 ,...,im , Λi1 ,i2 = − 2
2
m i2 <i3 <...<im
m
X
λi1 ,i2 ,...,im
i3 <i4 <...im
Optimal Designs for Larger Choice Sets and Binary Attributes
Chapter 5 of Street and Burgess extends some of the arguments made in Chapter 4 to
situations for which each choice set has m (not necessarily 2) elements, continuing to focus
on 2-level attributes. One simplicity for the results of 2-element choice sets is the important
role of v, the number of attributes for which the levels differ in the two choices offered.
Optimal design characterization is similar here, but made more complicated by the fact that
there are
m
2
pairs of choices within each choice set. For this reason, for any potential choice
set, define a difference vector
v = (d1 , d2 , ...dm(m−1)/2 )
11
to be the collection of numbers of attributes with different levels for each pair of elements
in the choice set. (Hence, for example, each di must be between 1 and k.) The order of
elements isn’t important to the arguments made, so the convention used is that d1 ≤ d2 ≤
... ≤ dm(m−1)/2 . The results of this chapter are restricted to the class of designs for which, if
any choice set with a given difference vector is included, all choice sets with that difference
vector are also included. For example, if m = 3 and k = 3, all possible choice sets have one
of the difference vectors v1 = (1, 1, 2), v2 = (1, 2, 3), v3 = (2, 2, 2). There are 24, 24, and
8 possible choice sets associated with each of these difference vectors (respectively). Hence
the only designs considered for this problem contain 8, 24, 32, 48, or 56 choice sets.
Result 3 (Theorem 5.1.1) The D-optimal design for testing main effects only, when all other
effects are assumed to be zero, is given by any choice sets in which, for each v present
m(m−1)/2
X
di =
i=1


(m2 − 1)k/4, m odd

m2 k/4,
m even
For these D-optimal designs and for main-effects contrasts,
|C| =
 k

 m22−1
,
k
m 2k

1

,
k
2
m odd
m even
Example Continuing the example from the beginning of this section, consider m = 3 and k =
3. The three possible difference vectors are v1 = (1, 1, 2), v2 = (1, 2, 3), and v3 = (2, 2, 2). m
is odd, so the theorem requires that each v have sum (m2 − 1)k/4 = 6, so D-optimal designs
can be constructed from choice sets associated with difference vectors v2 and v3 . Within the
class of designs being considered, this identifies three designs – the 24 choice sets associated
with v2 , the 8 choice sets associated with v3 , or the 32 choice sets associated with either of
them.
Thinking of This Another Way
Return to the case of m = 2, choice sets with only two elements, and recall that in this
case,
Λ(π)i,i = πi
X
a6=i
λi,a
πa
,
(πi + πa )2
Λ(π)i,j = −λi,j
πi πj
, i 6= j
(πi + πj )2
Note that this information matrix can also be written in a form we’ve used earlier. Define
uij to be a t-element vector for which all elements are 0 except the ith and jth, and let these
be:
√
πi π j
{uij }i = −
,
π i + πj
12
√
π i πj
{uij }j =
πi + πj
and where the π’s are assumed for design purposes to be of equal value, the two non-zero
elements of uij are ± 21 . Then Λ(π) can be written as:
Λ(π) =
1 X
uij u0ij
N (i,j)
where the sum is taken over the N treatment pairs (i, j) that are compared. This form makes
clear a fact that you may have noticed in earlier examples; for any design (selection of choice
sets), Λ has row- and column-sums of zero, and hence is singular. This reflects the fact that
the π’s can be arbitrarily scaled by any multiplier, so the γ’s can be shifted by any additive
constant. That is, there is no information in the data regarding the absolute location of the
γ’s on the number line, and so linear combinations of them that are not contrasts are not
estimable.
However, if interest centers on factorial contrasts in the elements of γ, this singularity is
not an issue. Consider, for example, t = 8 and k = 3 attributes of 2 levels each. Without
regard to the factorial, we might define an experimental design region U containing
8
2
= 28
vectors of length 8:
u1,2 =
u1,3 =
u1,4 =
1
(−1, +1, 0, 0, 0, 0, 0, 0)0
2
1
(−1, 0, +1, 0, 0, 0, 0, 0)0
2
1
(−1, 0, 0, +1, 0, 0, 0, 0)0
2
... ...
u7,8 =
1
(0, 0, 0, 0, 0, 0, −1, +1)0
2
If our interest centers in main effect contrasts, this suggests that an augmented design region
X containing 28 vectors of length 3 can be generated as

xi,j

+ + + + − − − − 



=  + + − − + + − −  ui,j ,


+ − + − + − + −
i<j =1−8
These vectors xij are, explicitly,
comparison
x0
comparison
x0
(1, 2)
(1, 3)
(1, 6)
(1, 5)
(1, 6)
(1, 7)
(1, 8)
(2, 3)
(2, 4)
(2, 5)
(2, 6)
(2, 7)
(2, 8)
(3, 4)
(0, 0, −1)
(0, −1, 0)
(0, −1, −1)
(−1, 0, 0)
(−1, 0, −1)
(−1, −1, 0)
(−1, −1, −1)
(0, −1, 1)
(0, −1, 0)
(−1, 0, 1)
(−1, 0, 0)
(−1, −1, 1)
(−1, −1, 0)
(0, 0, −1)
(3, 5)
(3, 6)
(3, 7)
(3, 8)
(4, 5)
(4, 6)
(4, 7)
(4, 8)
(5, 6)
(5, 7)
(5, 8)
(6, 7)
(6, 8)
(7, 8)
(−1, 1, 0)
(−1, 1, −1)
(−1, 0, 0)
(−1, 0, −1)
(−1, 1, 1)
(−1, 1, 0)
(−1, 0, 1)
(−1, 0, 0)
(0, 0, −1)
(0, −1, 0)
(0, −1, −1)
(0, −1, 1)
(0, −1, 0)
(0, 0, −1)
The per-observation information matrix for γ can now be written as
C=
1 X
xij x0ij
N (i,j)
13
where the sum is over the size-two treatment pairings in the included choice sets. Note that
the vectors x for choice comparisons (1, 8), (2, 7), (3, 6), and (4, 5) each contain only non-zero
elements, and that the design including just these 4 choice sets has information matrix
1
(x1,8 x01,8 + x2,7 x02,7 + x3,6 x03,6 + x4,5 x04,5 ) = I3
4
These 4 pairs of treatments are, in factorial notation, the foldover pairs identified as the
D-optimal design by Street and Burgess. However, their development is entirely algebraic
and requires constraining the class of designs under consideration. Representing the problem
this way would allow use of the optimal design tools we’ve discussed previously, including
continuous design theory, Frechet derivatives, and related construction algorithms. For problems in which the a prior π values are not all equal, algebraic arguments are made more
complicated, but approaches based on the above representation can be easily adapted by
simply changing the non-zero values of uij .
Now consider the same problem, but now change specifications to required that the
second order model be used. This does not change the form of Λ(π), because this matrix
does not depend on the form of the factorial model. (Hence uij are as before.) However, the
matrix B1 must be altered to reflect this change, and the elements of X are now defined as


+ + + + − − − − 



 + + − − + + − − 




 + − + − + − + − 


 ui,j ,
xi,j = 

+ + − − − − + + 




 + − + − − + − + 


+ − − + + − − +
i<j =1−8
Of the 28 x-vectors generated, 16 contain three 0 elements and three ±1 elements; the
remaining 12 contain 2 0 elements and four ±1 elements. These 12 vectors, and the treatment
differences they represent, are:
x0
comparison
(000), (011) (0, −1, −1, −1, −1, 0)
(000), (101) (−1, 0, −1, −1, 0, −1)
(000), (110) (−1, −1, 0, 0, −1, −1)
(001), (010)
(0, −1, 1, −1, 1, 0)
(001), (100)
(−1, 0, 1, −1, 0, 1)
(001), (111)
(−1, −1, 0, 0, 1, 1)
(010), (100)
(−1, 1, 0, 0, −1, 1)
(010), (111)
(−1, 0, −1, 1, 0, 1)
(011), (101)
(−1, 1, 0, 0, 1, −1)
(011), (110)
(−1, 0, 1, 1, 0, −1)
(100), (111)
(0, −1, −1, 1, 1, 0)
(101), (110)
(0, −1, 1, 1, −1, 0)
14
The treatment pairs here are those that differ in the levels of v = 2 attributes. We can
observe directly that this is the optimal design, because:
• These x vectors have the largest number of non-zero elements, leading to the largest
diagonal elements of
P
xx0 , and
• If this set of x vectors is used as the rows in an X-matrix, every pair of columns in
that matrix is orthogonal, i.e. the off-diagonal elements of
P
xx0 are all zero.
The result (2) of Street and Burgess indicates that for k = 3, all choice pairs should differ
by v = (k + 1)/2 = 2, so their optimal design coincides with this one.
Unfortunately, generalization in this way is not so straightforward for choice sets of size
m > 2. Recall that in this case,
Λ(π)i1 ,i1 =
X
m−1
1
λi1 ,i2 ,...,im , Λ(π)i1 ,i2 = − 2
2
m i2 <i3 <...<im
m
X
λi1 ,i2 ,...,im
i3 <i4 <...im
Again, the information matrix for γ can be written in linear form, but not with a single
“u-vector” representing each choice set. For choice sets of size m, m − 1 vectors are required
to express the information associated with one choice set. For example, for m = 3, the
general form would be
Λ(π) =
1 X 1 1 0
u u + u2ijk u2ijk 0
N (i,j,k) ijk ijk
for a design comprised of choice sets (i, j, k). Still, a linear transformation to the vectors
we’ve called x is possible; it would be interesting to think about how general design theory
might be used to construct optimal designs under fewer constraints than those used by Street
and Burgess.
References
Street, D. J. and L. Burgess (2007) The Construction of Optimal Stated Choice Experiments:
Theory and Methods, John Wiley & Sons, New York.
15
Download