R. L. Anderson Institute of Statistics Mimeograph Series No. 310

advertisement
DESIGNS FOR ESTIIVIATING VARIANCE CONPONENTS
R. L. Anderson
Insti tu te of Stat istic s
Mimeograph Series No. 310
Novanber, 1961
DESIGNS FOR ESTIMATING VARIANCE COMPONENTS
!I
R. L. Anderson
Insti tute of Statistics
North Carolina State College
1.
Introduction
•
The science of statistics is essentially the application of the mathematical
theory of probability to the study of variation in experimental am operational
data.
In a paper prepared for the 1958 meeting of the International Statistical
Institute in Brussels, I discussed the llUses of Variance Component Analysis in
the Interpretation of Biological
Experiments~"
Some of this discussion is repeated
here.
The identification of the sources of variation and development of methods of
estimating the separate variances and testing hypotheses concerning them is one of
the statistician's important tasks.
In most experiments, some variation is imposed
by the experimenter, in the form of different treatments, varieties, or practices.
Often inferences are desired concerning only the particular treatments used in the
experiment; it has been the custom to label these as fixed sour cas of variability.
In other cases, it is assumed that the variability among the scurces used in the
experiment is representative of the variability in a much larger population; these
are usually labeled as random, scurces of variability.
Admittedly the particular
representatives used in the experiment sometimes are not selected in an accepted
random manner; however, the experimenter can often justify the assumption that
order of selection is of little importance
Q
Most eJq)eriments have a combination of
fixed and random sources of variability; the mixed category.
will want to examine a given variable
y
Often an experimenter
from both the random ani fixed point of views.
Presented at the Seventh Conference on the Design of EJl;Periments, October 18-20$
1961, U. Sli Army Signal Research and Development Laboratory, Fort Monmouth"
New Jersey.
In this paper I will discuss only situations in which all sources of variation
are essentially random o
Some examples are presented on the next page o
Trere are
two exper:ilJlental and operational procedures to be considered: the nested ar hiorarchical type and the alassification type.
To illustrate the basic differences
between these two types, suppose we have a large number (b) of batcres of come
material.
These are to be processed by one of a large number at: more or less iden-
tical machineso
Suppose material from the first batch is subdivided into m subl
batches and each sub-batch processed by a different machine; the second batch is
processed by
batches.
~
machines, all different from the first
Hence there are m =
~m.
. ~ J.
J.=...
lJ1:L;
and so forth for all
machines used far processing II
The processed
material from each machine is tested" assayed ar inspected by a corps of technicianf:\,
a different corps for each machim II
sample handled by ore technician.
The material is subdivided into sample s, ea ch
te.chnicians for the
ij
jth machine processing the i th batch, so that the total number at: technicians is
t =
~ t ij
0
Let us assume there are t
In the follOWing example of two batche s, we have assumed that each
J.,J
m. = 2 ani each t .. = 2 e
J.
J.J
Now suppose that there are only, say,
4
processillS machines and, say, 2
tech-
nicians at a given plant" but that these machines and technicians are "representative"
of large populations of machines and technicians.
batch is subdivided into
4
Let us further assume that every
sub-batches, one for each machine.
Then the product from
each machine is divided into 2 parts" one part for each technician.
This operation
for two batches is also exemplified on the follo'l'ling page o
In general, the first operation wlll be a combination of the two types" since
subsequent batches will be allocated to machines 1 and 2, 3 and
4,
etc o
J
EXAMPLES OF RANDOM OPERAT ION S
1 Batch 2
~~
Te~
Tech
6
Nested or Hierarchical-Type Operation
**************************
Tech 1
Tech 2
Mach 1
Batch 1
Hach 2
Mach 3
Mach
4
Mach 1
Batch 2
Mach 2
~1ach
3
Mach
4
Classification-Type Operation
********************~~*~~***
streptomycin Experiment
lnitial
Primary
Secondary
" Inoculation --..,
"- Inoculati on ---,..
~----~ Fermentation
Incubation--/'
~
~
.!\..s""ay
..
Rubber Sampling
Producer IS
Estate
--;>
Day at
Estate
:>
Bale on
given da"l'l!
'<l!
--;>
Sheet of
Sample
Rilbber
---';> from Sheet
from Bale
4
A better example of the strictly nested-type operation migh t be one I used
for the International Statistical Institute paper:
A pilot studywas considered
to assess the various sources of variability in the production and assay for streptomycin before conducting an experiment on the efficacy of various molds.
'I'here
were five stages in this process: an initial incubation in a test tUbej a primary
inoculation period in a petrie dish; a secondary inoculation period in mother
petrie dish; a fermentation period in a bath; and the final assay of the amount of
streptomycin produced. Nel1ton,et al .. (1951) in a studyof' the variability of rubber
considered the following sources of var iation: producer fS estates, days at a givon
estate, bales on a given day, sheets from a given bale, and sample s from each
Technicians then took measurements on the samples.
sheet~
In this paper I shall discuss
only purely nested and classification-type operations but the same methods are
applicable to the combination.
Returning to our batch problem, the final measurement made by the technician
is designated as the process yield, Y.
of the Y's.
"!rle are here interested in the variability
Such variation is a result of variable batches, variable batch
sampling and subsequent machine processing and variable product sampling and
measurement.
The variation in machine processing results from machine-to-machine
differences and failure of a given machine to perfonn in the same manner on
different batches, and similarly for the variation in the measuring process.
The
failure of a given machine to perform similarly on different batches is called a
machine-batch interaction and similarly for the technician-batch and technicianmachine interactions.
We are assumirg that only tVJo-factor interactions need be
considered; however, one might also be intere sted in a technician-machine-batch
interaction.
operation.
Interactions can be detected only wi th the classification-type
It is assumed that the sources of variability act independently and additively;
hence, one can write the mathematical model for the final yield as
follows~
v
Y. = IJ. + ~ e .. ,
J
i=l ~J
(1)
where there are v sources of variability, the
{ei~ assumed independently distributed
?
of.:
random variables with zero means and constant variances,
v
average.
The variance of the final product is (I2
;a
~i
. 1
~=
~
.
hence"
j.I.
is the product
It is assumed that the
eXP3rimenter or manufacturer is interested in reducing the magnitude of
i
0
In
order to do this, a preliminary e:xperiment is to be set up to estimate the
2
individual variances, (Ii.
The estimates of the (Ii will be designated as
cr~"
In s1lldying the adequacy
of these estimates, it will be assumed that the random variables (the e IS) are
normally distributed.
Hence the observed yields are assumed normally distributed
with the s arne me an IJ, and variance
i
= ~ (Ii e
However, if there is more than one
source of variation, the YSs (in general) will be correlated.
Consider the ne sted
process with 2 batches, 2 machines per batch and 2 technicians per machine.
One
might represent the measurements as follows:
(2 )
The usual short cut notation is to renumber the technicians within each mach ire,
the machines within each batch" etce; ioe o ,
Yll1 = IJ, + bl
-II
Y121 = ~
Ir1.1 + till' 1 112
0\1-
b1 + Ir1.2
of+-
:so
IJ,
~21' ..
+ b1
0\1-
0, 1 222
In..1l + t l12 ,
= fa.
+ b2 + ~2
of-
t 222 •
(3)
6
The latter notation is much simpler for complicated designs and will be used here.
In any case, it is easy to see that the covariance between Ylll and Y
is
l12
~
+
0';' ; between Ylll
and Yl21 is
~
; and between Ylll and Y222 is 0 0
Hence it
is possible to write the likelihood of the sample in the familiar multivariate
a:
normal form, 'tiith a constant mean (lJ.) and variance
= ~ + + O'~) but with
2
correlations of either (~ + et)/O' ,
or o. The method of est:imation would
(i
{Ii
be the familiar maximum likelihood (ML).
The model for the classification-type process might be
.. + t~ + (bt) ~'k + (mt) J'k + e,~J'k ~
Yi J' k = lJ. + b.~ + m.J + (bm) ~J-k
(4)
where e may be a three-factor interaction effect or technique error or a combination
of the two.
o
2
2
2
2
~
2
2
In this case i'ie have variance components. ~'O'm'~m,O't'O'bt',O'mt'O'e "
Again if the components are assumed to be normally distril:::uted" one can set up
the normal multivariate likelihood function, which will be considerably more
complicated than the nested one.
The usual estimating procedure for variance components is based on the familiar".
I hope, Analysis of Variance table.
For balanced designs, the analysis of variance
(ANDV') and max:ilnum likelihood (ML) estimators are essentially the same; I will not
take the time to discuss this difference here"
For non-balanced designs the ML
equations are complicated functions of the estimators; iterative methods are
usually needed to solve them.
non-balanced designs..
There is no unique ANOV estimating procedure for
The usual pro cedure is based on a pooling procedure first
presented by Ganguli (19L1) and summarized in Anderson and Bancroft (1952).
procedure produces unbiased estimators" but of uncertain efficiency.
This
The total
sum of squares can be subdivided into orthogonal sources but there remains the
problem of how to best combine these sources.
One probably would prefer unbiased
estimator's which are linear combinations of these mean squares in the ANOV table.
7
Hence one solution to the estimation problem would be to select that unbiased
linear estimator which has the smallest variance.
In all discussion ·vlhich follows"
a linear estimator of a variance component will refer to a linear function of t1:e
mean squares; obviously this is a quadratic function of the observations.
Mlch
theoretical work has been done on the construction of best quadratic estimators
for balanced designs, l::ut the results have been rather sterile for non-balanced
designs.
2.
Some Simple Examples s
To illustrate the complexity of the problem, let us consider a two-stage
nested process with only
8 final measurements, coming from a = 2,3, ••• " or 7
classes, with the model
(5)
i = 1,,2, ••• "a ..
Y';J'
•.;
. . . """ IJ. + a.~ + b ~J
The object of the investigation is to obtain a minimum variance unbiased linear
2
estimator of O"a. One possible design has a = 4 classes ani 2 samples per class.
2 2
2/
The analysis of variance would be as follows (p = O"~/Ob) :.J.
Table 1. Analysis of Variance for
4 Classes
(N=8)
Source of
Variation (SV)
Degrees of
Freedom (DF)
Mean
Square (MS)
Expected Value
of Me an Square (EMS)
Classes (A)
3
MSA
0; + 20"~ = ~(1+2P)
San:q:>les (B)
4
MaB
Cb
2
2
2
The ANOV estimates of O"a and Ob are
(6)
V
It is assumed that the reader is familiar with ANOV procedures.
reference is made to Anderson and Bancroft (1952, Chapter 22)0
If not,
8
Under the normality asoumption, MBA and MSB are distributed as multiples of
.
2
2 -Z
2
2 2
l.ndependent X -variates. In fact MSA = ":L (~ + 2..aa )/3 and MSB
~ab/4 , vJ"here
2 2 2
Xl and ~ are indeFerxient X -variates with 3 and 4 respective degrees of freedom..
=
I(
t t Cb 3
2 + 2.(2 )2
Hence
Var (&;)
I:
+
a
4l
iiJ
I:
a{fr148 + p/3
+ p2/3
J
n
In comparin§ designs and estimators, we will consider
Va
var(~)
4
1;1
2
7/48 + pl3 + P 13 •
1;1
(8)
201:)
Var (~) =
2~/4
; Vb
==
1/4 ..
Another plan would have a=-2 alasses and 4 samples per class, with the
results given in Table 2 and equations (10).
A comparison of these two designs
shows that the one with 2 classe s will be super ior (in estimating
But we could consider 3, 5, 6 or 7 classes.
p is quite small.
of these will involve non-balanced designs.
a:) only when
Unfortunately all
As an example, consider the design
with 7 classes, one sample per class for 6 classes and 2 samples from the seventh
class; the ANOV is in Table 3.
Al represents the variation among the 6 classes
with one sample per class. A2 represents the comparison of the sum of the sa first
6 observations and 3 times the sum of the two observations from the seventh class.
In estimating
cr;1
one must decide how to weight MSA
and MSA
If the se are
2
I
weighted directly as their DF, we have the results in equations (II).
0
Using estimators of this type, the values of Va and Vb have been computed When
a = 2,3, ••• , or 7 classes are used, for p
= .05,~1,~2,e5,1,2,5
expected, one requires more classes to estimate
the same time, the variance of
estimates of both
more detail later.
a~
and
O£'
and 10(Table 4). As
a~ e.fficientJ.y as p increase s; at
~ increases. Hence for large p,
a compromise is necessary.
i f one wishes good
This will be discussed in
9
Table 2. Analysis of Variance for 2 Classes
1\2
0'a
-Asv
-DF1
1£
-MSA.
D
6
MSB
= MBA Ii-
MSB
,•
",2
O'b
EMS
2 2 2 (1 +
°h + 40'a = Ob
2
up)
Oic
= MSB.,
V
a
= '9b
7
+ P
2"
oj;
p2 •
'
Vb =
1
b •
(10)
**************************
Table 30
1\2
O'a =
5
Analysis of Variance for 7
C1ass~
SV
DF
IvB
A
1
5
MSA
A
2
1
MSA. 2
22
ot2
+ 1.75 O'a = O'b (1
B
1
MSB
2
O'b
MSA1 + MSA. 2 - 6 MSB.
6.75
I
EMS
<lb2.
1
-II
2.
2
O'a = O'b(l + p)
+ 1. 75p )
_
Va - .922 + .2963 P + .17695 p2 ; Vb
= 1/(8~a)~
**************************
:/
Table 4. Values of Va and Vb for Various Designs, N = 8
(ll)
Va for given p
.05
.10
.2
..5
J..o
2.0
5.0
2
.100
.133
.213
.573
1.57
5.07
27.6
>
.121
.145
.198
410419
.993
2.90
14.8
54.9
.200
4
.163
.182
.226
,,396
.812
2 ..15
10.1
36.8
.250
5
.255
.274
.314
.467
.829 1,,96
8.60
30.5
.333
6
.430
,,447
.485
.623
.941
1.90
7.43
25.4
.500
7
.937
0953
.988
2.22
-6.83
21.6
1.000
No. Classes (a)
~ V~(&;) = 2Va~ ; var(~) = 2VbO'~
p =
2.
2
O'a/ot •
1.11
e
.-
1.40
10.0
105
Vb
.167
Underlined values are minima for a given
10
3. Selection of Best "Linear" Estimators.
As stated earlier, for non-balanced designs the choice
unbiased estimator is not obvious.
or even the best linear
We might use an iterative weighted least squares
procedure, using the mean squares as the dependent variable and the variance compo2
nents as regression coefficients. Since the mean squares are multiples of X _
variates, they have unequal variances, both due to EMS and DF; i.e lll
Var (MS)
= (EMS)2/DF •
For the above 7-class problem the least squares equations are given in equations
(12), with limiting values for p
=0
and p -) co in equations (13) and (15).
the latter two cases, explicit estimators of
O'~
and
O'~
are available"
their variances are presented in equations (14) and (16)1J
e
a suggested procedure is to use a preliminary estimate of
pooled ANOV results; then resolve for
In
These and
For the general case (12 h
~:
such as
'&;/~
from the
'O-~ and ~ using the least squares equations;
continue the iteration until two succeseive sets of estimates agree.
Unfortunately"
it is almost impossible to obtain the variance of the final estimate of
a comparison of designs becomes very difficult.
11;;
hence,
Very little research has been
pursued along these lines; the ideas are merely tossed out here for consideration o
A comparison of the limiting estimators with those of Table
Table 5 for p
= .05,9 .1,,1
and 10.
not lose much infonnation by use of MSB to estimate
0';"
is presented in
The first two estimators are about the same for
all p but the third one is good only for very small p.
estimate
4
except poss ibly When p is very small.
'b2
It appears that one does
and the pooled MSA
i
to
In subsequent discussions, it
will be assumed that the estimators will be of the type used in Section 2 0
11
Least Squares equations for 7-class problem.
p == 0
G (1 75)~ ~;
+
10
+
6.75e;
60
75S~
+
7~ ::; 5A1
=:
5A l + 1.75A2c
(13)
+ A2. + 13
(14)
= 00
p
1\2
°h,oo
DO
B
e 6~2
=1\2
"b ~ O'a
(r:!
+ ;/
+.
35A1 + 4A 2 .. 39B •
-
42
.',
1 )",2
O'b
I:"7S"
r:!11
= ~1
A2 "
+ 1" 75 '
(1t»
Va, 00
~~************************
Table,,,
Comparison of Variances of Estimators (N=81
p
I
.05
~5
,,94
V
a"ec
Va,o
v
Va
Vb
V
b"o
1
10
1,,11
1.40
21.6
1.03
1 21
1.49
20.,8
069
1.29
2o~
88.2
1.00
1.00
1.00
1.00
.77
..80
.87
1.13
0
12
The se methods could be extended to any design with any number of variance com-
squares can be computedo In general let ~
v
th
represent the h
mean square, with l'b, D.F. and E 1\ l:I~khiO'i, where there are
i=l
v variance components (some of the ~i will be zero; none will be negative~ The
ponents, provided independent
Ire an
i th normal equation will be
v
~ S rWh~i~~I
i. = hs [~khiI\J
L
;jJ J
J=l h
where wh
1:1
~/(E~)2.
(17)
'
One can simplify the weights (wh ) by using ratios of variance
components, as above.
40
Optimal Designs for a Two-Stage NeBted Processo
p. P. Crump (1954)
considered these tvJO problems for the two-stage ne sted
process~
(1)
Given N, the total sample size, find the best design for use in
estimating
~, ~,O';
or
~
1:1
O'~/~,
using the pooled ANOV considered
in Section 2.
(2)
For rertain selected designs, compare some alternative estimators
of
I..L
2
and O'a-
Only (1) will be discussed at this time.
The best design for estimating
and for
mediate.
~
I..L
is a
= N classes with
1 observation per class
is a = 1 class of N observations; the results for
Given
~
classes, Crump proved that for both
-
0'; and p are
0'; and P, the
inter-
optimum
- -
allocation of N was to have r classes with p+l observations per class and a - r
classes with Eo observations per class, where N
the optimal:. would be different for estimating
= ap
+ r, r (an integer)<:a. However,
0'; and p.
It could not be proven
rigorously but it was conjectured(and shown to be correct for all computed examples)
-
that the following procedure should be followed to determine the optimal a:
13
(1)
If! did not have to be an integer, it has been shown by several
authors that the optimal! (the one giv~ng an unbiased minimum
variance estimator based on the pooled ANOV) is
i)
2.
for O'a. a
N(Np+2 )
•
'
= ~ = N(p+l).!
(18)
ii)
(2)
for p : a = a
2
1;1
1 + (~-S) (Np+l)
2 p + N- J
•
Since ~ and a 2 will not, in general, be integers, select the integers
just larger and just smaller than ~ (or a ); compute the exact
2
variance for each; then whichever variance is smaller, compute the
variance for the next adjacent integral value of
proce ss until a smallest variance is obtained.
!;
continue this
In all examples
computed, this was found for one of the two original integers.
These calculations usually can be omitted since the variance profile
is quite flat near the optimum; hence, I suggest using for
integer closest to a
(3)
l
The true variances of
var(~)= 2ot;(Ap2
Var(p)
A
AI
= 2. (A tp2
1:1
=
2:. the
(or a ).
2
'6-; and-P- for
+ Bp + C)
a given
~ (N
= ap +. r) are:
,
+ B 'p + Cf).
(19)
2
2.
N S2 - 2NS 3 + S2
2N
N2 CN 1)(a 1)
- . . . . ; . - ._ _--~__"-; B =
2
; C=
- 2 i;
82 )2
N - 82
(N-a)(N - S2)
(w2 -
(N-a-2)A+l. Bt
(N-3)B.
N-a-4
'
= N-a...4'
S2. = Np + r(p+l) ; 8
3
~
Np
2
cr
=
(N-a)(N-3) C.
(N-l)(N-~4)'
+ r(p+l)(2p+l).
Crump presents values.of A,B,C,At,Bt, and CI for selected values of a for
N
1:1
10,20,30, and 100,
14
Crump 90mpared the m:i.n:i.mum variances for '(1~ and
p following
~
'trlith the theoretical minimum disregarding the fact that
observations per class must be integers.
the above procedure
and the number of
Some of these efficiency ratios of
variances (E ) are given in Table 6 for S2.
o
a
p
20 00
N
0
.50
10
.988
.981
.980
.940
20
0997
.987
c995
.943,
100
1.000
.998
1 0 000
.947
!I E
o
1.00
min. hypothetical variance
realisable variance
= nnno
These results indicate that the procedure outlined above must be quite good&
Recognizing that the allocation which produces a good estimator of one
parameter may be poor for estimating some other parameter) Crump computed the
efficiency ratio' (E ) for all four parameters, for various allocation plans. The se
f
are presented in Table 70 This table presents (for each parameter) the ratio of
the variance for the optimum allocation to estimate that parameter to the variance
for the given allocation.
plans for
~J ~, (j~
For example with N
and p are, respectively, a
= 10
and p = 1, the best allocation
=10,
1,,5 and 3.
The minimum
~/5; 2~/9; 2~(.6l25); 2(1.98775). The respective
variances for a = 4 are ~(.36); 2~/6; 2~(e6950); 2(2.32285). The ~atios of
minimum variances to those for a = 4 give the respective efficiency ratios" E
f
respective variances are
5/9
= .56;
2/3
= ~67;
.6125/.6950
= 088;
1.98775/2.32285
= .86.
the
:
y
.... .
Ra t::LOS (E)
1\2 ~2
f\
Table 7. Ef l~CJ.ency
£ f or "~9 ~)
O'a" p.
0
-10
N
«(3)
.43
.56
.65
.75
.78
.67
,.56
~89
,,61
.89
.99
1.00
.88
1.00
.78
.43
3
4
5
.33
.45
.56
.67
e89
..78
.67
.56
.42
.88
1.00
.79
1.00
,,86
.50
0..5
5
7
8
10
.50
.61
.65
.75
.79
.68
e63
053
•90
1 0 00
099
.96
1.00 .
095
.87
.69
1.0
5
7
8
10
11
.40
.51
.56
.67
.70
..79
.68
.63
.,3
.47
.70
.87
.92
1.00
.98
.96
1.00
.96
.82
.69
2.0
5
7
10
11
14
.33
.44
.60
.63
.71
Q79
.68
.53
.47
,,32
,,54
072
095
.97
1800
.90
1.00
.89
.82
.36
025
.91
.58
c88
.96
1.00
.99
.91
.71
.97
1.00
.96
.94
.74
.77
.,89
.,94
1 00
.95
.96
1.00
.97
.88
.72
,,61
.81
.90
1.00
.97
.72
0.5
10 0
2.0
Y
E
f
E
f
2
1.0
100
E£ ("'2
O'a )
E£W)
0.5
20
E£(~).
a
..£..
:;0
-
,
3
4
2
10
20
25
33
35
50
.43
.50
.60
25
33
40
50
60
.40
.49
.56
,,67
,,71
25
38
50
66
75
.. 33
.47
,,60
,,69
.. 82
.. 61
.75
,,81
,,76
..68
.66
.51
.,76
~68
~61
,,51
.40
.. 76
..63
.51
034
025
,,71
0
.97
1.00
..98
Variance for optimum allocation at optimum a
Variance for optinnlm allocation at given a
.54
15
16
An examination of Table 7 indicates that large gains can be achieved in the
0";
efficiency of estimates of
and p by suitable choice of the sampling plan" for
fixed total number of observations, N.
For N
= 20,
7,8,9 9
P ::. 1.0, one should use
or 10 classes depending on the relative importance of the two parameters, assuming
IJ. and
Cb2
were of secondary importance,"
between 7 and 14 classes..
Similarly if p ::. 2.0, one should use
= 0 5"
And i f p
0
use
5, 6,
or 7 classes..
It is important
to note that if one thought that 0..5 <: p <. 2.0, he might logically choose
classes without departing too far from the optimum.
between
0.5
and 2~O, he could choose around
40
For N
= 100
7,8 or 9
and p somewhere
classes without much risk of
departing far from the optimum:>
One can use the results in Table 7 to study the effects of using the incorrect
p in making his allocations.
estimation of
Some of these comparisons plus others regarding the
O"~ have been made in Table 8. In general it appears that if the
true p is 1 or less, one will not lose more than 10% in efficiency even if he uses
2p or p/2.
50
The loss is considerably less when p is as large as 2 0
Optimal Designs for a Two-Way Classification Model
-
5.1 Introductiona
D. W. Gaylor
(i960) considered methods of sampling, under
certain restrictions, to minimize the variance of certain est:imators of
variance components for a t"tvo-ioTay crossed classification
•
Yo~JOk = lJo + ri + c.J + ( rc ) ~ j + s'~J'k'
0
=
i
1,2, .... ,r -<..N
•
1 2 " ...
N
J:::,
,c, < ,
(20)
where the effects are assumed normally and independently distributed with zero
2 2 2 and O"s2 and n
t '
means and respec:lve
var:lances, O"r,crc'O"rc
ij
0
obtained in the (ij) cell.
are allocated to the
!£
0
b servat':lons are
~"Sl
It is assumed that a total of N ::: ~~ n,. samples
:lJ
cells.
The problem is to make the allocation so that
the variances of the estimates of the variance components will be as small as
e
17
Table 8.
-
N
20
Effect of Incorrect Estimate of p on the Efficiency of Estina tes
L
0.5
1,0
Optimal No.
Glasses (a)
7
10
Estimated p
O.2?
O.JL
5
6
0.68
0.8,
1.06
9
10
11
0.4J.
0.53
7
8
12
.903
.964
.98,
.960
.897
1.66
2.12
14
13
2.0
14
0.4J.
0.68
0.85
1.06
3.70
7
9
10
11
16
.72,
.868
.947
.969
.951
0.5
33
0.14
0.26
0.59
0.97
1.46
14
22
38
,0
60
.739
.911
.986
.912
.775
1.0
50
0.31
0.47
0.64
1.46
1.90
25
33
40
.772
.893
.944
.946
.894
2 eO
0.64
0.97
2.95
3.94
60
66
40
,0
75
80
0';
Efficienc,:y:
.869
.918
.957
.921
.869
1.32
100
No. Classes
Ueed (a)
or
.832
.969
.976
.93,
18
possible.
This again poses the problem of deciding which parameters are most
important.
Gaylor studied optimal allocations for individual components and
for certain combinations.
Given any quadratic estimator
(21)
Z is
the (N x 1) vector of observations and M
i
coefficients. In vector fom, we can write
where
leN
x
1)
= M:f.
= ~(N x 1) + !(N xl),
an N x N matrix of
(22)
where e is the error or random effects vector.
The variance-covariance matrix
of the e IS is
v =E(! ~ t).
(23)
The diagonal elements of V vlill be
222
2
(J.
r +0'c +0'rc +0'S •
(24)
The off-diagonal elements will be
22222
0'C + O'rc or 0"
(25)
O'r" O'c' O'r +
depending on whether the two corresponding observations are in the same row
but different columns, the same column but different rows, the same row and
column or in different rows and columns"
It is easy to show that, for unbiased
estimators"
E(Q.)
1
2
= tr(VM.)
= cr. ; Var(Q.)
11·
1
= 2tr(VM.)
1
2-
,
(26)
2
where O'i is the variance component to be estimated and "tr" refers to the trace
of the matrix.
The lower bound to the variance is
(27)
For the following cases, this lower bound can be realized; for others, it
usually is not possible o
19
* each observation from a different row and column
The estimator in each case is 5(y
~
y)2/(N_l).
The last design is also the
y
optimal design for estimating (.Lc The variance of
is then
2
2
(a2. ~ a
+ a + i)/N.
(29)
s
rc
r
c
2
One special case is of importance. If arc = 0, the optimal design for
2 2
es t"J.mat"mg a 2 or P = ar/as consists of one column; in this case the results
r
r
2
by Crump apply. Similarly, if only a or P is of interest, use one row and
c
c
apply Crumpls results.
5.2
2
2
Estimation of 0r or P when 0 ) O.
r
rc
It was not possible to develop a
completely general class of designs to estimate
i
= a
2
s
..
i,
rc
0-; or
P
r
I:
0-;/0-2,
where
which could be proven optimal. for all situations.
shown that if the design were limited to a class in which each nij
n(an integer) observations per cell, n should equal 14>
either be empty or contain one observation.
It was first
=0
or
Hence each cell should
The sums of squares used for
a; and Pr were those given in an ANOV table based on the method of
fitting constants (Table 9)~ The estimators of a; and P and the variances of
r
estimating
these estimators are given in equations (30) and (31).
These results are based
on the fact that (r-l)R* can be subdivided into r-l independent quantities.
20
Table 9
(J
Analysis of Variance for 2-Way Classificati on Data
sv
-
DF
Columns
c-l
Rows(adjusted for co1s.)
EMS
MS
2
s
2
2
2
+c 0: +ro:
1 roc
rc
C
0:
+. 0:
r-1
R*
0:;
+ O:rc
2 + coO:2
r
Interaction (adjusted
for rows and cols~)
N-r-c+l
r*
Total
N- 1
~lr
1:1
(R* _ r*)/c
lO
0
I
J"\
Pr
=[N-r-c,",'l
N-r-c+!
whe re Co = (N-c )!(r-1 ) and F • -- R*!r~*'
2 + 2
CY
O:s
rc
Fi
(0)
10
01a)
2
2"'Q-2
(r-l)(l+c P ) (m+r c 3) + (m-2)p ~ (d.- d)
o r r
J.
where m = N-r-c+l (,.. 4 for ~r)"
r-l
(r-l)R*
1:1
ci '1
~ (1
J.=
+ d. P
J.r
)X~
J.
2
0:
2.
(31b)
a
=0: +0: ..
S
rc
"
**************************
(32a)
(,32.b)
21
All statements refer to the above estimators.
Admittedly other estimators
might be more efficient for a given design.
If for a given N and r, N is an integral multiple of
best design 1rTith n ..
~J
=0
or n to estimate
colunms with one observation per cell.
the
ir
or p
r
~
= rc,
N
then the
consists of r rows and c
-
Graybill and
~{ortham
-
(1956) show that
resultant~; is a minimum variance unbiased quadratic estimator. In this
case in Table 9,
e;_ = 0,
c 0 =- c and r 0
= r,
and d..;• = d = c.
The variances of
the estimators are given in equations (32).
In general N/r will not be an integer.
For a given
~ a design of the type nij
Suppose N
= r(k-1)
+ s, 0
<s
<r.
= 0 or 1 which minimizes var(a-;) based
-
on the above ANOV table consists of r rows by k-l columns (using only N-s
observations) or ::. rows by k-l columns plus one colunm with !. of the:: rows.
The variances are given in equations (33)..
This leads to the interesting
result that a balanc9'd design may lead to a smaller variance than an unbalanced
design with more observations; this situation arises when P is large and
r
indicates that presumably the estimators used are not the best for the
unbalanced situation.
-
In order to find the optimal value of r, one should minimize 03a) or (33b).
This is quite complicated because
!£
and!. are functions of::.o
Because of the
near balance of the design, a good approximation to (33) is available by
replacing each d. in (31) by d = c
~
c
<c "< Cl0
0 .....
+ 1).
0
=- (N-c)/(r-l) and c by c + 1/2 (since
-
0
These approximate variances are given in equations (4) fl
Setting the derivative of 04a) with respect to c
,
0
equal to 0, we have
If N ;:, r(k-l) + s, O(8<r
~;
+ 2Pr !(k-l) +
2
1!(k-l)(k-2~!(r-l),9
2
Pr
r-I'
Var(~r ) _
2
.-
22
~
+ Pr(s-l)(r-s)
(N_k)2 (r-1)
+ (k-1)pJ
using k-1 co1s"
2Pr
r-1
+ N-k + (N-k)(N-k-r+ll
(J3a)
.
' us~ng k co1s.
~(rk-r-k-l)!(k-1)2(r-l)(rk-2r-k-2),usingk
(N-r..k-1)(s-1)(r-s)p; + ~r-1) + (N-k)Pr1
------.-
(N~k)
2
2
(N-k..,2).
"
(r~1)(N-r-k"3)
.. 1 cola.
USlng
(J3b)
k co160
Using all N observations,
~
22
Var(uJ:J • 1 + 20 01"
P "'" cor
P .. 2pr ...
C
2
P
or
•
Var(P'r) •
041»
=
2
04a)
and
,.oJ
r : --;
= (N
.. 1!2)!~ •
o
•
Co = Co
2p (N-l/2) ." (N""1!2) + 1
= _r r p (N-l/2) + p 2
r:.r:
= (N ...
r
1/2)/0o
. r
4>
+ -> 2
+ l/Pr for large N,
(36b)
(J7b)
23
::::;'.
.--.J
Again one can consider integers above and below r or r as the optimal r,
testing by insertion in the true variance formula (33)Q
As in Crumpls case,
the variance profile is so flat that the integer closest to
sufficiente
Once!: is determined, the number of columns will be found from
the fact that N
e~erimenter
:s will probably be
= kr
or N
= (k-l)r+s"
where s
presumably should check whether
<r.
In the latter case, the
to use a balanced design with
(k-l)r observations or an unbalanced one with s rows in the k th column; since
-
it requires a very large Pr to prefer the smaller balanced design, the unbalanced design usually will be better.
In the latter case, the experimenter
might wish to consider the use of enough more observations to have a (kxr)
balanced design; matters of this kind need further investigation, such as
maximizing the information per observation rather than the total information.
Gaylor also investigated the loss of information due to an incorrect
estimate of P , as given in Table 10 0
r
These results are of the same order of
magnitude as found by Crump'.
As before, losses due to use of the incorrect p
are less serious for large po
Gaylor checked sone of these results against
those found by use of exact variances; the agreement on the efficiency factor
was very good; e ",g e, Ef2 = c907 instead of 0909 for N
= 30,
Pr
= 1.0,
P~
= 2 cO.
It should be noted that Gaylor considered ratios of 4:1 for P and only 2:1
r
A2
A2
for O'r; hence, some of these efficiencies are lower than for O'r.
Table 11 presents the efficiency of various designs used to obtain
information on
0'; and Pr
for N
= 30.
This table shows that extreme departures
from the optimal design results in a considerable loss of efficiency; however,
moderate departures usually result in small losses.
l'fuere Pr is small, a
balanced design in the neighborhood of the optimal design has high efficiency
for estimating both parameters.
However, even for P
r
= 1,
the efficiency of
each estimator is less than 90% when the optimal design for the other is used.
On the other hand, one is able to find an intermediate design which is quite
good for estimating both parameters when p is no larger than 2.
r
24
Table 10. Appro:ximate Relative Efficiencies, E of the Restricted Optimal Designs
f
- A2
1\
for O'r and Pr Based on Incorrect Values of p.
I'
,,2
Results for O'r
Design Based on True p~...
~
Design Based on Incorrect
-
~
N
PI'
30
025
1",0
4,,0
1[,90
1,,24
.so
.25
1,,0
4 QO
4,,67
1,,97
1,,25
100
Co
4~04
;:::::.,.-
E
. f1
P~-2
i
c02
Ef2
(,928
.920
..943
,,50
2 00
8.00
2$70
1\)47
2.00
6.01
2.70
1047
11905
0909
.939
.125
.50
2.00
7.82
2.90
1.,49
.907
..912
.942
.50
2,,00
8,,00
2.90
le49
P:-l
.125
f
c01
0
Re suI ts for PI'
"
Ef
True PI'
P~ Used
N = 30
N = 100
0.25
1,,00
0.50
2 00
2,,00
8 0 00
1 0 00
0.25
2.00
0.5.0
80 00
2,,00
0849
,,847
.900
.829
.823
.891
.896
.976
..983
......
0
-
P~
.~90"
..970
.993
-
1~12
1~12
.900
,,909
.936
• -.2,
25
Table 11 • Efficiency of Some Designs for Estimating
Pr
i
:r
2
.3
5
6
7
7*
8
9
10
15
.50
e
1.0
y
5
6
7
7i l8
9
10
11
15
6
7*
9
10
11
14
15
16
19
-1, c
10
6
5
5
4*
4
4
8
0
0
0
0
2
0*
6
3
3
2
0
0
6
5
5
4*
0
0
2
Ef(~)
• •
.42~
.687
.948
.999
.984
.941*
1.000
.959
.965
.732
---
4
6
2
.3
0
8
0
.833
.882
•857i l.938
.954
1.000
.986
.933
5
0
---
4
3
0*
.3
4*
0*
.3
.3
0
8
4
3
2
2
2
.3
2.
0
14
11
.877
,899
.957
1.000
.984
.907
Ef(~)
.446
.739
.975
1.000
.941
.889*
.943
.871
.862
.557
.889
.964
.968
.922*
1.000
.967
.986
.927
.732
2.0
6
7*
9
10
11
12
15
18
19
20
24
4.0
.852
•862oll.956
1.000
.968
.833
.844
r ~ No. rows; c = No. columns. If s
* designs have only 28 observations.
r
Pr
6
7*
9
10
11
12
15
20
23
24
25
28
ir
and p , N
r
,
e
4*
= 30 a/
s
E.r<a;)
0*
-.....
-0
....
.822
~
~
,740
2
2.
2.
6
0
12
11
10
.974
.•997
1.000
.998
3
:3
2
2
5
4*
4
.3
3
3
2
2
2
2
2
2
8
6
10
11
12
13
14
15
16
20
0*
..--
---
.837
.943
.994
1.000
.993
.658
5
= 0,
3
0
8
3
6
4
2.
0
3
2
2
2
14
10
N = rc; if S >0, N
.735
.788*
.927
1.000
1.000
.994
.991
.527
.586
2
--..--
--
--
3
0
8
6
0
10
7
6
s
.3
.
~631
0
c
.3
:Are*
.938
1.000
.987
.969
.931
.902
Ef ('P'r) only fat' Pr
r
Ef'~)
---
-..-_-
= 8 and 16
Pr
=8
.974
.981
.982
.977
.96.3
1.000
.945
.634
= r(o-l)
pr
= 16
.955
.965
.970
.969
.960
1.000
.948
.646
+ s;
\
26
The two sets of restricted optimal designs begin to diverge considerably
as P increases beyond 2"
One notes that the limiting value of c is 1 + lip
r
for
~;
-
and 2 + l/pr for 'Pro
columns, respectively.
r
For P = 1" the optimal designs have 2 and 3
r
However" for P ;:>1, the optimal design for
'&; is non-
balanced with 2 columns, whereas that for ~ approaches a balanced design with
2 columns o
=2
The particular case of c
has been investigated more thoroughly for
r}r
$
Consider a two-column restricted optimal design with N-r l raws in one column
and !,f of these same rows occupied in the second column; i.e e "
r
t
= N/2 er 2 ~ r
= S <N/2
I
If r
t
= N/2~
=
L
=0
if.
From (33a)"
rN- + 2r
( ",2)
Var crr
2(fl
0
S
l
4
-
- r t P2r .. 2Pr + N ri
_ 1
N-2
(38) simplifie s to
(p;
.. 005) I (r i-I) •
+ P
r
11j / (N-2).
(38)
The value of r I which
minimizes (38) is
Pr ~ (N-2)1
2
r
I
'!2
=
1 + (N..2)/p
r
'!2~Pr!:
12
(N-2)/
fi
(39)
p ~{2
r
N/2
Hence when P '7Vi, a two-column design with one column shorter than the other
r
is better than two columns of equal length o
These results suggest the following procedure for finding a nearly optimal
design for ~2:
r
(i)
If P? -{2, use one column with r
with r
t
of these rows, where r
nearest 1 + (N-2 )/pr
(ii)
t
=N -
r
t
revlS am a second column
is the integer (>,.2) which is
12"
When Pr~~ ure a balanced design with ~ as the integer above or
below
~o = ~r(N"1/2) + N ..
1/2] I
~r(N"'1/2) + 2] $
27
which minimises
V(c)
=
(40)
and where!. was set equal to N/ce
Presumably little information
-..
,......
is lost by simply using as c tre integer cl.oser to c
-
IJ
0
Another approach to (ii) above is to determine intervals of
Pr for which c =
3 is best, c =4 is best, ••••••
accomplished by determining P
r
V(c)
When Pr
= p(c)
~
This is
in the equation
= V(c-1)0
<p(c),
(Ll)
.£ is preferred to c-l columns.
The solution to
(41) is
For N = 30" the following are the fir st three separation points~
Hence
c
=2
c
= 3 for
= 4 for
c
for .62 (" Pr <
Note that for P
r
c
v'2;
.33.( Pr ".62
.22<. Pr I.. .33.
= .25
in Table 11, c
=4
is slightly preferred to
= 5.
/\
Table 11 indicates that fer P there is a considerable range of P for
r
r
which the best allocation plan will have 3 columns and N/3 rows.
If N is not
divisible by 3, some gain may be obtained by using an extra row with one column
not filled o It appears that the best plan vlould be to ei ther increase N or
reduce N by enough to have it divisible by 3.
should be used with N/2 rows.
As P increase s, only 2 columns
r
Again add or subtract one to make N even.
28
/\.
The following general procedure is sugges,ted for Prl
(i)
Use a balanced design with
Co
= (2Pr-N
2. as the integer above or below (J6b)"
+ N-Pr + 1/2)/{PrN + Prl2 + 2)"
which minimize s (J2b):
V~(~)
2
2
= (1
2
+ cPr) (N-c-2)/(N-o)(No-N-o -30) •
(43)
o
-
Choose r so that rc = Net
(ii)
The same procedure of finding separation points on Pr could be used on
The equation oomparable to (41) is considerably more
(43) as was done above.
complicated; hence" all I am reporting here is the asymptCDtio result (for
large N)1l
In this oase the solution is
P2 (0) = 1 +1(0-1)(0-2)
c-30+1
Some of the separation points
P2.(3) =- 1 +-(2
= 2 4,
0
(44)
are~
P2(4) = (1 +16)/5 = 073
P2(5) = (1 + 2 13)/11=.4;
=(1 ~ 21f5)/19 =- 028.
P2(6)
Hence
o
=4
o =
for 04 <' Pr <. 07
5 for
o28<Pr <.. ,,4
Note that in Table 11"
0
= 5 is
slightly preferred to
0
= 6 for
p
= .25.
The
above separation points would be altered slightly if the adjustlOOnts of O(l!N)
were made.
5" 3
222
Estimation of ac or Po =- (Jc/a.
rev~sing
-
-
the roles of rand 00
*
Use the re sul ts in Section
5.2
by
In Table 9, one ccmputes mean squares, Rand
C , for Rows and Columns (adjusted for rows)e
The latter can be computed
quickly by use of the identity:
(r-l)R*
-&
(0-1)0
=- (r-l)R + (c-1)0* e
(45)
29
0'; and cr~o
5.4 Simultaneous Estimation of
Gaylor obtained only some tentative
results for this case but they are indicative of the problem.
estimation of
cr; requires many rows
Since efficient
and efficient estimation of
O"~
requires
many columns, simu.ltaneous estimation requires many rows and columns.
Gaylor
considered two sr;ecial designs, called the L-design and the Balanced Disjoint
a
Rectangles (BDR)-design~ and compared them for the case O'r
a
= O'c
•
The L-design is as follows:
Jr
r
I
r
I
1 <
l
Rows
e
ra
rows in 1
3
I
i
-.
....
ool~
°3 Cw.s, in 1 row
- - ..
-"'-~
1
{
'----/-----'"
ca
"
Columns
Figure Ie
I
J
----,/
.....
~
The L-Design
The total rows and
columns used are r = r
l
+ r 2 and c =
~ -lr
c a'
N = rCa - r observations
l
3
contain one column udth r 3 (O~r3< r l ) empty cells and Na
contain one row with 03 (O~ 03 \
I....
,
!.',li"
a . ~.:.
•
-'-'1
~
'1.)
form of a restricted optimal design
general °2 and r a will be small •
empty cells 0
= cra -
03 observations
The NI obser vations are of the
for~; and the Na observations for 1r~. In
30
The BDR design consists of
rows by
~
~
distinct rectangles, each consisting of ::.
colums, with one observation per cell e
The rectangles are distinct
in that each rectangle samples a different set of r rous and c columns.
The
ANOV for this design is given in Table 12 0
2
If the componm t of variance for the rectangle effects is zero, a
g
then a
uni~ue
= 0"
solution for the estimators of the components of variance is not
obtained by equating the mean squares to their e:xpected values.
Then, there are
five mean squares (equations) with which to estilnate four components of variance.
Since the best procedure in this case is unknown, the case will not be considered
here, but this certainly warrants future investigation.
If G is neglected, the estimators and their variances are quite simple"
as given in equations (46 ) ..
. 1 case a2
·
h ave b een compared f or the specla
The tw 0 deSlgns
r
= a2c
and
Some specific examples are given in Table 13 for
2
0'2=0'2
+0'
S
rc
~
The total number of observations, N..
was allowed to vary slightly in order to obtain balance if desired.
comparisons were made on the basis of HVar(a;)
:;:
NVar(a~)
reciprocal of the amount of information per observation.
0'; for
Thus
lihich is the.
The estimator for
the BDR-design is given in (46) and for the L-design in
OO~. The
•
·2
variances in the last column of Table 13, NV, are based on the estimator, a ,
r
in which the last
estimator,
0';,
~
columns in Figure 1 for the L-design are ignored.
Tte
could be improved slightly by pooling the error sum of squares
from both legs of the L-design"
Several observations can be made from Table 13.
designs are better than the L-designs"
For those designs where p2 ~ (d
i
When p is small" BDR-
When p is large, the L-design is better.
- d)2 is large, see Ola), the estilnator,
~;..
31
De~
Table 12. ANOV for a Balanced Dis.ioint Rectangles
DF
SV
1'13
EMS
Rectangles
g...l
G
22
2 + 2 + car2
+ roc + reO'g
Os
arc
Rows
g(r-I)
R
222
as + arc + cOr
Columns
g(e-1)
1'1-
v
0'
Interaction
g (r-1 )( c-1)
I
0'
2
2
2
s + 0'rc + rooe
..1
2 + 2
O'rc
s
---------------------------112
O'r
0:
(R-I)/c ;
2
0'0
= (C..IQ/r ,:
2
2.
~s '*'
arc)
0--1
var('i) =
C
Table 13.
-Nr~lc
\C"".1,)1'
[(i
S
+
irc
0
1
1
1
2.
2
2.
4
4
g
,,5
.5
.5
4
2
1
(46)
0
Comparison of BDR and L Designs with Var(&2) = Var(&2)" Where (J2
r
.
r=o
3
4
6
c
r
L-Design
WI!!
r::e
r 2D C/\
10
7
6
2
3
6
0
0
0
36
33
36
40 24
3..29
36
.3.38
.3.11
3 24
3~24
5'000
4,,12
3,,24
10",00
10
7
6
2.
3
6
0
0
0
36
33
36
10.89
9046
9.84
10.00
10.08
9.84
26 .. 0
24,,8
27 ..1
14
10
7
2
2
3
8
0
0
36
36
33
26.8
34.8
3106
25 III 6
26.0
30.2
82.0
84.8
16
10
N
36
.32
0
9
4
2:
36
.3
36
8 0 25
2
4
32
8,,44
~
4
2
2
.3
4
36
32
9
2;
36
3
36
4
•
'
i
BDR-Design
p
]
+
)2 ]
src
r--1
(0'2
+ rcl)2+
2
36
~.
2:
2
r 3-c!_ N
12
0
NV
72
36
36 12,
NV"
7g e 8
82.0
= i 0 -.
32
a;,
which discards the last c
columns is often better than the estimator,
l
which uses all of the observations. This is a result found previously where it
was shown desirab:::'e to keep the numbers of ob servations per row nearly equal.
Also, as was shown in Section
5.2,
the L-design is desirable vlhich has less
than two complete columns in the N observations and less than two complete
l
rows in the N2 observations when p)
i2.
For an L-design with r 2 = c 2
al)
= 2cr4{2
r
Var
where P =
r
cr;/i.
= 2,
r = c, and r
is the same as (47).
var{a~)
--
var(~).
= c
3
= 0,
+ 4p + 4p2 )/N.
r
r"
For a BDR-design l-lith r
var~;)
3
(47)
=c
= 2 and g
= N/4,
from (46),
This result is borne out in Table 13.
No restriction that cr;
~ cr~
has been imposed here.
When P?2 87, the best BDR-design, has r = c = 2 and g = N/4o
0
P?;J!2,
r3
the L-design with r
= c3 )
O.
BOO-design.
in the last
i
Hence, when
2
= c2 = 2
P~ 2.87,
Similarly,
Since
can be improved upon by letting
for the L-design Var (~;) ~ var(&;) for a
Var{&-;) could be reduced still further by using the observations
~
columns to estimate
from the first c
2
columns.
i
and pooling this with the estimate of
Also, when r
number of rectangles, g, become s large
I>
If
=c
is small for a BDR-desjgn, the
cr~ = 0,
considerable information
may be lost here by not making use of the mean squares among rectangles.
A few tentative observations can be made.
In general, as P increases, the
L-design is favored over disjoint rectangles if
cr~ '> O.
Also, the analysis
lvhich ignores the ob servations in the other leg of the L-design in order to
achieve balance sometimes reduces the variance
I>
33
60
Additional Comments and SUggestions for Future Research
6.1
Extensions of Nested Samplirg to More than '!Wo Stages.
One important
research problem is to extend Crumpis results to more than two stages of
sampling.
In S9dtion 1, two five-stage problems were mentioned.
In the
AIrlerson-Bancroft book and in my International Statistic81 Institute paper,
I proposed a so-called staggered design, which consists of a number of
balanced designs but with incomplete samplirl?; at some stage s"
For three stages, two different groups (I and II) would be used wi th ~
first-stage samples for I and a
2
for II, as follows:
I(4 ~ observations)
II (2 a 2 observations)
A
B
C
ANAIX SIS OF VARJ:ANCE
-
Groups
-
1
-MBA 3
Al
~-l
MSA I
<Y + 2<Yb + 4<Y
o
a
A
2
a 2-1
MSA
<Yo + "b + 2<Ya
B
l
~
MSBI
B2
a
2
MSB
C
2~
mc
SV
DF
El'fJS
1'£
SY I1c
222
<YC + ~Cb + ka <Ya
222
222
2
2
2
2
2
<Yo + 2"b
fJc
2
<Yo
2
+ "b
34
One might consider using an additional third plan with a
3 first-stage
samples, only one second-stage sample per first-stage and one third-stage sa'Tlple
per second-stage, giving a
3
observations~
R. R. Prairie is st1ldying another plan, which would attempt to equalize
the total number of observations at each stage.
In this case there would be
! first-stage samples, with each followed by the following second. and thirdstages:
A
B
C
ANALYSIS 01" VARIANCE
-
-
].f)A
Cl'e
B
a
MaB
Cl'c +
C
a
MSC
Cl'c
DF
SV
A
a-I
EMS
].f)
2 +
5 2 + 3O"a2
j~
2
~ c{
2
....2
The variance of O"a for this design is complicated by the fact that MSA and MSB
are positively correlated.
were not anticipating.
This is a feature of non-balanced designs which we
It indicates a fact which many of us often forget;
namely, that the usually accepted concepts regarding the analysis of variance
are based on a regression and not a variance components model.
Naturally if
one uses the ML approach, he does not start with such preconceived ideas.
Also one never has any trouble when the design is balanced.
Note that my
staggered design is balanced within groups; the major difficulty there is how
to use the between-groups mean
square~
35
6.2
gther Suggestions
1)
(a)
Develop numerical methods, suitable for high-speed computers,
to obtain, under normal theory" the ML estimates of variance
components from 'W1balanced
(b)
data~
Study the small sample properties of the ML estimators by
empirical means,9 and" where possible" by analytic means.
(c)
It might be possible to find approximate formulas to estimate
the biases and variances of these ML estimates.
A comparison
of asympto tic variances and empirically determine d small
sample variances would be useful.
(d)
Compare the snall sample properties of the ML estimates with
those of more convenient estimates, e.g." ANOV or iterated
least squares.
2)
Study the effect of non-normality on estimating procedureso
3)
Introduce unequal cost factors into the problem o
4)
Develop suitable criteria for use in sinmltaneous estimation of
sever al parameters.
5)
Develop a sequential estimation procedure in vlhich the first
stage (or stages) would be used to estimate p (or varj.ous piS)
needed to design the remaining stageso
36
REFEllENCES
Anderson, R. L. 1960. Uses of variance component analysis in the interpretation
of biological experiments. Bulletin Int. Statist. Instit. 37:3:71-90.
Anderson, R. L. and T. A. Bancroft. 1952. Statistioal Theory in Research.
McGraw-Hill Book Company, Inc., New York.
Crump, P. P. 1954. Optimal designs to estimate the parameters of a variance
component model. Unpublished Ph.D. Thesis, North Carolina St.ate ColleGe,
Raleigh, N.C.
Ganguli, M.
1941. A note on nested sampling.
Sankhya ~:449-452.
Gaylor, D. W. 1960. The construction and evaluation of some designs for the
estimation of parameters in random models. Unpublished Ph.D. Thesis,
North Carolina State College, Raleigh. Institute of Statistics Mimeo
Series No. 256.
Graybill, F. A. and A. W. vJortham. 1956. A note on uniformly best unbiased
estimators for variance components. J. Amer. Stat. Assoc. 2!:266-268.
Newton, R. G., 11. W. Philpott, H. Fairfield-Smith and ~v. G. ~vren. 1951.
Variability of Malayan rubber. Industrial and Engineering Chemistry !±1.:329-334.
INSTITUTE OF STATISTICS
NORTH CAROLINA STATE COLLEGE
(Mimeo Series available for distribution at cost)
265. Eicker, Friedheim. Consistency of parameter-estimates in a linear time-series model. October, 1960.
266. Eicker, Friedheim. A necessary and sufficient condition for consistency of the LS estimates in linear regreseion. October,
1960.
267. Smith,W. L. On some general renewal theorems for nonidentically distributed variables. October, 1960.
268. Duncan, D. B. Bayes rules for a common multiple comparisons problem and related Student-t problems. November,
1960.
269. Bose, R. C. Theorems in the additive theory of numbers. November, 1960.
270. Cooper, Dale and D. D. Mason. Available soil moisture as a stochastic process. December, 1960.
271. Eicker, Friedheim. Central limit theorem and consistency in linear regression. December, 1960.
272. Rigney, Jackson A. The cooperative organization in wildlife statistics. Presented at the 14th Annual Meeting, Southeastern Association of Game and Fish Commissioners, Biloxi, Mississippi, October 23-26, 1960. Published in Mimeo Series,
January, 1961.
273. Schutzenberger, M. P. On the definition of a certain class of automata. January, 1961.
274. Roy, S. N. and J. N. Shrizastaza. Inference on treatment effects and design of experiments in relation to such inferences.
January, 1961.
275. Ray-Chaudhuri, D. K. An algorithm for a minimum cover of an abstract complex. February, 1961.
276. Lehman, E. H., Jr. and R. L. Anderson. Estimation of the scale parameter in the Weibull distribution using samples cen·
sored by time and by number of failures. March, 1961.
277. Hotelling, Harold. The behavior of some standard statistical tests under non-standard conditions. February, 1961.
278. Foata, Dominique. On the construction of Bose-Chaudhuri matrices with help of Abelian group characters. February,
1961.
279. Eicker, Friedheim. Central limit theorem for sums over sets of random variables. February, 1961.
280. Bland, R. P. A minimum average risk solution for the problem of choosing the largest mean. March, 1961.
281. Williams, J. S., S. N. Roy and C. C. Cockerham. An evaluation of the worth of some selected indices. May, 1961.
282. Roy, S. N. and R. Gnanadesikan. Equality of two dispersion matrices against alternatives of intermediate specificity.
April, 1961.
283. Schutzenberger, M. P. On the recurrence of patterns. April, 1961.
284. Bose, R. C. and I. M. Chakravarti. A coding problem arising in the transmission of numerical data. April, 1961.
285. Patel, M. S. Investigations on factorial designs. May, 1961.
286. Bishir, J. W. Two problems in the theory of stochastic branching processes. May, 1961.
287. Kohsler, T. R. A quantitative analysis of the growth and regrowth of a forage crop. May, 1961.
288. Zaki, R. M. and R. L. Anderson. Applications of linear programming techniques to some problems of production planning over time. May, 1961.
289. Schutzenberger, M. P. A remark on finite transducers. June, 1961.
290. Schutzenberger, M. P. On the equation a2+n = b2+m c"+p in a free group. June, 1961.
291. Schutzenberger, M. P. On a special class of recurrent events. June, 1961.
292. Bhattacharya, P. K. Some properties of the least square estimator in regression analysis when the 'independent' variables
are stochastic. June, 1961.
293. Murthy, V. K. On the general renewal process. June, 1961.
294. Ray-Chaudhuri, D. K. Application of geometry of quadrics of constructing PBIB designs. June, 1961.
295. Bose, R. C. Ternary error correcting codes and fractionally replicated designs. May, 1961.
296. Koop, J. C. Contributions to the general theory of sampling finite populations without replacement and with unequal
probabilities. September, 1961.
297. Foradori, G. T. Some non-response sampling theory for two stage designs. Ph.D. Thesis. November, 1961.
298. MalIios, W. S. Some aspects of linear regression systems. Ph.D. Thesis. November, 1961.
299. Taeuber, R. C. On sampling with replacement: an axiomatic approach. Ph.D. Thesis. November, 1961.
300. Gross, A. J. On the construction of burst error correcting codes. August, 1961.
301. Srivastava, J. N. Contribution to the construction and analysis of designs. August, 1961.
302. Hoeffding, Wassily. The strong laws of large numbers for u-statistics. August, 1961.
303. Roy, S. N. Some recent results in normal multivariate confidence bounds. August, 1961.
304. Roy, S. N. Some remarks on normal multivariate analysis of variance. August, 1961.
305. Smith, W. L. A necessary and sufficient condition for the convergence of the renewal density. August, 1961.
306. Smith, W. L. A note on characteristic functions which vanish identically in an interval. September, 1961.
307. Fukushima, Kozo. A comparison of sequential tests for the Poisson parameter. September, 1961.
308. Hall, W. J. Some sequential analogs of Stein's two-stage test. September, 1961.
309. Bhattacharya, P. K. Use of concomitant measurements in the design and analysis of experiments. November, 1961.
Download