THE EXPECTATIONS OF MEAN SQUARES

advertisement
:,,/t
f'!
~ .
THE EXPECTATIONS OF MEAN SQUARES
by
R. E. Comstock
Institute of Statistics
Mimeograph Series No. 76
For Limited Distribution
•
I'
! j1"""'
) ,;
Chapter VI
ThL r.,XP.c.CTATIONS OF ML;AN
SQU.l~
The Expectation of a Variable
If individuals are drawn randomly from a population their avera3e value in
terms of any specified measurement will be equal in th8 long run to the mean for
the measurement in the population. We say that the value to be expected on the
average is that of the population mean. In fact, in Statistics the expectation of
a variable quantity is defined as the mean for such quantities in the population to
which the particular variate belongs. For example, let Xl' X , ••••• Xi •••••
2
symbolize the values of th8 individuals in any univariate population. Then the
XIS constitute a population of quantities of which the expectation of anyone
chosen at random is ~ where IJ. is the population mean. This is stated symbolix
x
cally as follows:
E(X.) = IJ.
x
~
-
where X~ can be any of th~ XIS depending on the value given i and E(l.) is read
...
~
"the expectation of X.".
~
As a second example, recall that the population variance is defined as
2
(J
=
~
.~ (Xi i
2
IJ.) /N
the population variance,
X.~ symbolizes the value of any individual quantity in the population,
N is the number of individuals in the population, and
8~mbolizes
IJ.x as before is the population mean.
.
f' d as th e mean 0 f a 11 va1ues, 1. 0 e. t h e population
Thus the varlance,
(J2
, .
~s d Slne
2
mean, of (Xi - IJ. ) • In accord with the definition of expectation we see that
x
E(X - IJ. )2 = i
(2a)
i
x
or if we wish to represent the deviation of Xi from its population mean by a single
symbol, say xi' we can write
X•
~
."
~x
X.~ -
(2b)
As a final example recall that the population covariance of two variables, say
X and Y is defined as
(J
xy
=~
i
(X. - IJ. )(Y. - IJ. )/N
1
X
~
Y
-2-
where crxy is the covariance ana other symbols have meanings in conformity with
those listed above when considering the variance of X. We see that the covariance,
a , is defined as the population mean of (X. xy
~
~
x
)(y. - ~ ) and therefore that
y
~
(3a)
cr
xy
Again if we set
and y.
~
Y. -
z:
~
jJ.
y
we can write
E(x.y.)
= cr
(3b)
~ ~
xy
Interest in expectations centers around the fact that by setting observed
quantities equal to their expectations we find a basis for unbiased estimation of
parameters involved in the expectation. For example, it can be shown that
E(X. ~
1)2 = n-l
n
i
Xis
the mean of a sample of XIS, and
n is the number of individuals in the sample.
It follows that
E
(X. n(n-l)
= (n - 1)
where
I~
~=l
or
~
j{)J .
J
r-
E
ls2
i
n
n
=
~
i=1
i
(x. - X)2
°1
n-l
J
~
I
~
= a
2
From this we see that sample variance obtained by dividing the sum of squares by
2
degrees of freedom has cr as its expectation, i.e. that it provides on unbiased
estimate of
i.
Expectation of a Constant
This is specifically mentioned for completeness. Since a constant, by definition, is a quantity that always has the same value, the expectation of a constant
could hardly be anything but that particular value. For example, a population mean
is a constant and its expectation is the mean itself. Symbolically, if c is any
constant,
1( c) = c
(4)
-)-
e
.li.xpectation of the Product of a Constant and a Variable
Consider the product
Y '" c X
where X is a variable and c is a constant.
is c
~x
and,
We know that the population mean of Y
tberefor~,
E(Y
e
C
X) '" c ~x = c E(X)
(5)
In g6neral, the expectation of such a product is the product of the constant and
the expectation of the variable.
The Expectation of a Linear Function
Consider the linear function
F '" a + b + c Xl + X
2
in which ~, £. and c are constants and Xl and X are variable qUJrlti ties drawn
2
randomly (but not necessarily independently) from two populations (one, a population of quantities symbolized as Xl' the other a population of quantities symbolized as X ). Two points are worth special attention.
2
(1) The specific manner in which F is definod may have the result that
value~
of Xl and 1
contributing to different values of the quantity, F, are
2
correlat~d or on the other hand are independent, i.e. uncorrelated.
For
example, suppose F is designed to reflGct in some special way the h6ight
of married couples.
Then any single value of F would involve the height
of the husband (Xl) and that of his wife (X ). If the couples are chosen
2
randomly both Xl and X are random values from their respective popula2
tions, but art not necessarily independent in magnitude from one couple
to anothero In fact, evidence indicates that there is a degree of correlation in stature of man and wife.
On the other hand, suppose F were defined as the h8ight of plants,
Xl as ti-!.,;: effect of genotype, and ~ as the effect of environment on
height; and it were known that in the population of plants involved genotypes were distributed randomly with rl:Jspect to environment. The magnitudes of Xl ill1d X would vary independently from plant to plant and,
2
therefore, from one valub of F to another.
-4(2)
The different variables may actually belong to the
sam~
population
though it may be useful to think of them as coming from different ones.
For example, in the function given above Xl and X could be a pair of
2
values drawn randomly from the same population, Xl' being the first and
X2 the second drawn of any pair. In this case Xl and X would vary
2
independently, i.e. be uncorrelated.
Corresponding to every possible pair of values of Xl and ~ there is obviously
a value of F. These values comprise a population of F's. iie know that the mean
value of F in that population is
a ... b +c!J.l
where
~l
and
~
+~2
are the population means for Xl and X2 , respectively.
E(F)
= !J.F
a
Hence
a + b + c !J.l + !J.2
where !J.F is the population mean of F. This serves to demonstrat~ the general fact
that the 6xpectation of ~ variable quantity ~ is ~ linear function of other
variables is the ~ linear function of ~ expectations of those variables. By
this rule
E(F) : E(a) + E(b) + E(c Xl) +
E(~)
and since
E(a)
=a
E(b)
=b
E(c Xl)
= c 1J.1
E(Y.2 ) = 1J.2
1,je
have by substitution
E( F)
=a
+b +
C
!J.l + 1J.2
as given above.
Expectations of hcan Squares
Any mean square can be writt8n as a linear function in which the variable
quantities are the squares of variables, products of a variable with a constant, or
products of variableS. Bbnc~, the expectations can always be written in terms of
what is presented above. This fact will b~ clarified by examples.
-5Example I
Consider thG case
groups of equal size.
rtp~es~nted
by the analysis of variance for comparing
The form of the analysis is as follows:
Variance Source
d.fo
Groups
Within groups
m-l
m(n-l)
Total
m n-l
m.s.
where m is the number of groups and n is number of individuals within groups.
The
model on which the analysis is based can be stated symbolically as follows:
Y.. =
1)
where
jJ.
+ g. + e ..
1
1J
is the population mean taken over all groups,
~
gi isthe effect of the i-th group (the amount by which the
for the i-th group deviates from ~), and
~opulation
mean
e .. is a random effect contributing to tht valu8 of Y for the j-th individual
lJ
in the i-th group (the amount by which the individual deviates from the
mean for its group),
One of
two assumptions is usually made concerning the groups: (a) that they
are random members of a population of groups, or (b) that the ones on which data
are taken are of special interest in themselves rather than as a sample from a
population.
In case (a) the assll.'1lpt.io:1. is frequently stated by saying that g. is
1
considered a random v3.riabl<.., in con7..:;.'at:' to case (b) where it is alternatively
said that the g. arc considered constant or fixed.
1
~e
g assumed to be a random variable
will consider first the case where g. is consid0red a random variable.
1
G. be the sum of Yls for the N individuals of
1
T
be thE..
Then thG mean
sum
of Y's for all
nm
th~
i-th group, and
individuals on which data were collectedg
squar~
2
T
-nm
/m-l
-6This may
b~
considtred the product of a constant and a variable where
constant and the quantity in brackets is the variable.
be written,
1
.
the
m-.L
Hence, its expectation may
~ 15
1 E, [ 1(G
2 + G2 + .... G2 )
E(M l ) '" m-l
m
l
2
:n
122
2
Note that :n
(G + G + •••• G ) is "That we commonly call the "uncorrected sum of
2
l
m
squares", that T2 /nm is what we call the "correction factor", and that the whole
quantity in brackets is the "corrected sum of squares".
By the rule that the expectation of a linear function is the same function of
~xpectations of the variables in the function, we can write
1 [1
2
2
2
1
21
E(M l ) ... m-l .:n(EGl + EG2 + .... EGm) - run ET j
(6)
Now the separate expectations in the expression can be considered one by one.
Consider EG~.
In terms of our model,
1
EGi •EL~l Iij] 2•E[In
.E(n~
.... 2
J
+ Y'2
+ •••• Y.1n
1
+ ng.1
+ e' l + e'2 + • • •• e.1n )
11
2
Squaring and taking expectations term by term this can be written
'~G2
En 2 ~2 + En2 gi2 + E{ e il + e i2
1
i =
.t.
+ ••••
e. )
2
1n
+ •••• e. )
1n
(7)
Before going further note, that both the g I S and e I s are defined as deviations from
a mean and hence, that the population mean of both the gls and e's is zero. Thus,
E(g.) ... 0, E(e .. ) D 0, E{g~) = 0 2 and B{e~.) = 0 2 where 0 2 is the population vari1
1J 2
1
g
1J
e
g
.
ance of g's and cr is the population variance of e's. It is common to assume that
e
all els are members of the same population and, therefore, that 0 2 is homogeneous
e
over all groups. This assumption will be made for the purpose of our example but
e
it should be understood that special cases may arise where the variance of e varies
from group to group. It should also be noted that all g' sand e I s are assumed to
-7be random members of their populations. The significance of this is that in th~
population (the population that would be generated by repeating the experiment in
identical fashion an infinity of times) the correlation between (1) any two gIs,
(2) any e's, or (3) any ~ and any ~ would be zero. If the correlation is zero, so
also is the covariance and this means that the expectations of all products of two
gIs, two e's or a K and an e are all equal to zero. Symbolically this is stated
as follows:
E(g.1. g!)
1. "" 0
E(e .. et I,) =0
1.J 1. J
E(g,' e, ,) = 0
1. 1.J
(i
f
i' )
(ir i ' ifj
=j',jrj' ifi=i')
(either when i = i' or Wben i , i')
Now let us consider the s~veral terms of EGi2 one by one.
.. 2 2
2 2
(a) J:!,njJ. ... n IJ. • (since n2~2 is a constant)
2
2 2
2
(since n is a constant)
(b) E n gi "" n
1.
2 2
(12 )
"" n (1 (since
Ei
Egi
g
(c)
•••• Sin)
IS
g
2
••••
+ ••••
2
... n(1~ (since thu expectation of each of the 6 ,S, n of them, is
that of each product term is zero)
2
2
(d) E2n IJ.g, ... 2n2~g. (sinc~ 2n 1J. is a constant)
1.
1.
... zero (since Egi = 0)
')
(J£.
e
::md
(e)
E2nIJ. (e i1 + 8 i2 + •••• Sin)
... 2n~ (e
+ e
+ •••• e ) (since 2n~ is a constant)
il
i2
in
= z~ro (since the expectation of all e's is zero and, therefore, that
of the sum of any set of e's is also zero)
(f)
E2ng, (e'l + b i2 + •••• e. ) ... 2nEg . «(;'1 + (;. 2 + •••• e. ) (sinct;: 2n is
1. 1.
1.n
1. 1.
1.
.
1.n a constant)
• Z0ro (since the expectation of the product of any
~
and
~
is zero)
REC:2:5J
-8Substituting in (7) in terms of (a) to (f) we find that,
22222
2
~
+ n a + na
g
e
EG.~ = n
(8)
Now note that nothing in (8) is specific for the particular group in question (!
does not appear as a subscript in the right hand member). The significance is-that
2
the expectation of G is the same for all groups, that
EGi =
In order to evaluate
Substituting for the
2
E T
GIS
=E [ ~
EG~ = ....
EG;
) it remains only to obtain
l
E T2 = E (Gl + G + •••• G )2
2
m
E(M
2
ET •
we obtain
+ n(gi + g2
+ e 2l + e 22 + .... + e 2n .... + e ml + em2 + •••• emn
J
2
(9a )
Squaring, taking expectations term by term, and moving constants to th~ left of the
sign for expectation (proper because the expectation of the product of a constant
and a variable is equal to the product of the constant and the expectation of the
variable) we get
E T2
=
222
2
2
2
2
2
n m ~ + n Eg l + n Eg2 + •••• n
2
E~
- 2
d t t erms
12 + •••• ~emn + pro uc
2
2
of the types 2n m~ Eg , 2n Eg l g ,
l
2
2 + Ee 2
+ E-ell
2n E.g l ell' or 2~ 11 8 12
(9b)
Consider thE. various terms of this expression
2
Ei1 = n2 .c.g_ 22 =
(g)
n
(h)
.2
J::!,e
ll
c:
Ee 2 =
••••
12
•••• n
= Ee
2
2 E2
gm
= n2
2
ri)
g (since Eg~ = g
(J
2
e (since Ee~.
~J
= C1
mn
=
(i) all proauct torms are of types shown to have
process of developing E G~.
i)e
z~ro
expectation in the
~
Substituting in (9b) in terms of (g) to (i) we obtain
222
2
2 2 2
E T =nrotJ. + n ma + nma
g
e
(10)
-9Finally substituting in (6) in terms of (8) and (10) we find
1 [m
1 (n2m:1.
2 2 + n2mO"2 + nma2]
E(M 1 ) ... ~
-n (n 2~II.2 + n2 a2g + no2e ) - -nm
m-~
~
g
e
. . ~2 l-mn-mn]
m-r
The within group
2
+ ag
rmn-n]
l- m-lJ
m-l
= nag2+, aE:2
2
~m-r
+ ae
(11)
me~n
square may be computed as follows:
m
02
1
~
2
u2
_2
1
M2 = m(n- 1) .~1 (Y'J. l + ~2
+ •••• ~J.n ---)
J.
n
J.=
RcmE:mbering (a) that the expectation of the product and of a constant and a variable is thE. product of the constant and the expectation of thE; varic::blt: and (b)
that tht: E:xp~ctation of a variable that is a linear function of variables is the
same function of the expectations of these later variables, we se& that
E(~)
z
m(;-l}
~l [E~l + E~2
+ ••••
E~n - ~ Eoi]
(12 )
Consider the expectation of Y2
ij
Therefore,
E1:,
• E(~
J.J
Expanding
~nd
+ g. +
J.
6
.)2
iJ
taking expectations of individual terms separately we obtain,
Ei,J.J
= E~2
+
Eg~J.
Ee~,
J.J
+
+ E
2~g.J.
+ E
2~e J.J
..
+ E 2g.e..
J.
(13)
J.J
Taking the terms of this 6xpr0ssion separately,
(j) E~2 ~ ~2 (bbcausG ~2 is a constant),
(k)
Ei = ig
J.
(by definition when thcl g's arE.: assum8d random),
(1) Ee~,
= 0 2e (by definition),
J.J
(m) E 2~g. = 2~ Eg. = zero (since
J.
J.
(n)
2~
E 2~e .. = 2~ £8., ... Z8ro (since
J.J
J.J
E 2g.e., ... 2 Eg.e ..
is a constant and Eg. = 0),
2~
J.
is a constant and Ee'iJ' = 0),
zero (since 2 is a constant and Eg.e.,
J. J.J
Substituting in (13) in terms of (j) to (0) we obtain,
(0)
J. J.J
=
J. J.J
;;'y2. .
l~
J.J
=1-.1.2
2
+0
g
2
+0
e
a
0).
(14)
-10-
We have already shown (8) that the expectation of G2 is,
i
G~1 ,.. n 21J.2 + n 2 i g + neie
E
(8)
Note that both (14) and (8) are the same for all Y's and GIS, respectively (all
terms in right hand members are constants). Recognizing this and substituting in
(12) in terms of (8) and (14) we obtain,
E(~) • m(:-l) [n(~2 + <7~ + <7~) - ~(nV + n2<7~
- 2 f'rn(n-n) -J
- lJ._m(n-l)
eig
+
ei
[m(n-n) ] +
[m(n-l)l
men-i)
e _m(n-l)-.
Using (11) and (15) the analysis of variance can now be
expectations of the mean squares.
Variance Source
d.f.
Group·s
m-l
Within groups
Total
+ n<7;)l
Ul
cle
pr~sented
(15)
giving
t~e
Expectation of m.s.
2
2
(j
+ n(j
e
2
(j
m(n-l)
g
e
mn-l
assumed to be constants
occasioned by assuming the gls constant rather than random are
~~~
Differ~nces
listed below.
gls random
E
=0
g1
2
E gi
=
gls constant
E
(j
2
g
Bcg. = 0
1
g. :; g.
1
1
Ei
= gi2
Ecg.
= cg.1
1
l.
where c is any constant
other expectations involv8d in (7), (9b), and (13) are not affGcteo. ~'iith
the above differences in mind we s~& that in this case (7) does not reduce to (8)
but to
E
G~1
(16)
In like manner (9b) reduces to
E.T
2 2 2 2 + nm(j2
=nmlJ.
e
(17)
-11-
rather th<l.n to (10).
The reason why no terms involving gls or the squar·:;s or
products of gls occurs in (17) is clarified by refer6nce to (9a). Note that the
gls ent~r (9a) in a tarm that is the sum of th8 gls for the ~ groups. In the caS6
where; ths g I S are assumed constant tJ. is taken as the population mean for the ~
groups in qU0stion. Then, since the gls are defined as deviations from this mean,
th~ir ~um
must be zero. Hence, the term n(gl + g2 + •••• gn) disappears from (9a)
and correspondingly t~rms involving gls disappear from (9b). Finally (13) reduces
to
(18)
rather than to (14).
Substituting in (6) in terms of (16) and (17) rather than in terms of (8) and
(10) we obtain,
m
m
1
1
2 2
2'~
2
2'"
2
E(M ) = -(ron j.L + n ..::::; g. + 2n jJ. ~ g~ + nmo )
l
e
.~= 1 ~
.1'"1 ...
m-l [ n
m
Keeping in mind that
:2
i==l
g.
0 as pointed out above this reduces to
I::
J.
r-~?-.!~ 1 ~
.1
+ m-l
m,,·J.
~
m
2: i.l.' + a:""
i=l
[
m-
1
1
m-l _
(19)
Substituting in (12) in t,:;rms of (16) and (18) rather than in terms of (8)
and (14) we obtain,
,
f
._ \lTlniJ.
m
2
+ n
2
g.~ + 21J.I1
m
~
i=1
2
g. + mna )
.1.
e
m
1
2 2
2~
2
2 ~
- - (mn jJ.. + n ~ g. + 2n u. ~ g. +
n
i=l
= ,J.2
2
'"
(J
e
.1.
i=l
n-n ]
[_nl(n-l)
.1.
mna~]
2
+ aE;;
(20)
-12-
m
~ie have again used the fact that ~ g. = o. The analysis of variance with GXp0Ci=l
tatiom is now as follows:
~
Variance Source
d.f.
Groups
m-l
Expectation of m.Q.
m
2+..£... ~ 2
O"e
m-l .~ gi
1=1
TNithin gr oups
2
m(n-l)
Total
0"
e
mn-l
Example 2
As a variation of example 1 consider th~ analysis of variance for comparison
of groups of unequal size. Let n , n •••••• n symbolize the number per group
l
m
2
in groups 1, 2, •••••• m, respectively. Th~ form of th~ analysis is as follows:
Variance Sourcf:)
d.f.
m. s.
Groups
m-l
M
l
m
~
Hithin groups
i=l
Total
(ni-l)
N-l
wher\:J N is the total numb"r of individuals in all groups.
Lxcept for variation in group size the model will be the same as in example 1.
will consider only the case where
square for groups is
is considered a random variahle.
~
G~
::L .... G~_
c:mlP:tc:1a[s
+
mnl
n2
The mean
(21)
nm
R.eferring to (7) and (8) it is clear that
E
G~
~
2
2
n.~
~
= n1
jJ.
=
+ ni2
0"
+ n.J.
0"
2
g
+ n.
~
2
(j
e
and hbnce that
2
E(G.J. In.J. )
2
2
+ 0"2
e
g
(22 )
-13-
'l' is now equal to
N~
+
n1g1 + n2g2 + •••• nmgm + ell + 8 12 + •••• sln
1
+ e
21
+
8
+ •••• e
22
2n2 + •••• em1 + em2 + •••• emn
m
Squaring and taking expuctations but ommitting t~rms with ~xp~ctation zero we obtain,
222.22
2222
E T '" N t.l. + E n 1g'l + E
g2 + •••• + E nmgm
nz
+ E
2 +
ell
~
~ 6
2
E 2
12 + •••• + sln
+
1
222
+ E t
mn
m
terms, this becomes
•••• E em1 + E 8m2 + ••••
Evaluating thb
s~parat~
2
E T2 • N r"',,2 + n210'2g + n 22 0'2g +
and. hence,
m
2
E(T IN)
2
= N~
2
2
.z
n. IN + 0'
g i"'l
e
2
+ 0'
~
(23 )
1.
2
2
lJic have NO'e bbcause there are a total of N terms of the type 1e
that art-' equal
11
2
to 0' • Writing E(M ) in tbrms of (21), (22), and (23) we get
1
e
E(M1 ) '"
m~l
[i ~
i"'l
m
Noting that ~ n.
i=l 1.
= N,
ni
+
ig
~
i=l
m
ni
+
mO";
Ni -
i g i=l
~ n~/N
1.
-
this reduc0s to
(24)
The coefficient of 0'2 in
g
(24) is of the same form as given by Sn~docor (p.234, 1948).
The wi thin group mean square is computed as
-14-
N~m [i!
M2 •
N~m f'til ti2
+
•
+
~l
+
~2
+
+ •••
+ ....
.....fmn )m
'raking the expectation term by term we have
E(~) • N~m [Etil + Ei1
12 + ....
+
Ei.fml
+
E~2
Ei1
ln1
+
E~l + E~2 + .... E~~
+ •••• E.fmn - EG~/nl
m
(25)
The L-xpectation of the square of any single Y is in no way affect8d by the numb(~r
of individuals obs~rv8d in each group. Therefore, it is giv6n by (14). Substituting in (25) in terms of (14) ,md (22) we obtain,
E(~'2) N~m
•
t
~
N"~ + N"~ i=l
m
- 1=1
~ n. i - miJ
fL
2
m
+
•
J.
Rdmembering tha t
jJ.
and
ig arE;
g
n.
jJ.
J.
e
m
cons tan ts and that ~ n.
i=l J.
= N, this reduces to
(26)
R~f8rring
to (24)
the analysis of variance with mean square 8xp0ctations
can now bo writtbn as follows:
and (26)
-15VariancE: Source
d.f.
Bxpcctationc of m.s.
Groups
m-l
r::le
Within groups
N-m
Total
N-l
+ nt
m
-.:'1
1
where n ~ ". ~
m-J.
--~
i=l
r::lg
2
ni
N - --;:NO;---
Genoral Procedures
Before turning to other examples it will be useful to summariZ8 the general
procedures demonstrat0d in the foregoing examples. Steps in th8 procedure are list8d
b,;;low.
1.
Specification of the model. This includes a symbolic statement of the
composition of the individual values that make up th0 data, assumptions
as to whether the various ~ffccts are fixed or random, and assumptions
concerning whether separate 8ffects vary independently.
2.
composition of each moan square is written out in terms of the
and the steps followed in computing the mean square~
Th~
3. The expectation of the mean square is developed term
Rules employed in step 3 may be summarized as follows.
mod~l
by term.
1.
The expectation of a constant is thu constant itsolf.
2.
The 6xp~ctation of ~ variable is the population mGan of the v~riable.
The expectation of the square of a variable '~hat has population mean zero
3.
4.
5.
is the population variance of the variable.
The expectation of tht product of a constant and a variable is the product
of the constant and th~ expectation of the variable.
The expectation of th6 product of two variables that have population mean
zero is the population covariance of th0 variables.
6. The population covariance of any two variable effects is zero whenever
the particular two effects contributing to any OnE)
data may be assumed to
lations.
bejn~ drawn
m(;asur~ment
in thi'j
from their respective popu-
-16Two points merit special attention.
1. It is d6sirable to write the model in terms of a gen~ral mean so that all
effects will have zero as thdr popula.tion m8an. This allows taking
advantage of 3 and 5 above.
20 If 6 aboVe is kept in mind a great deal of labor can bo saved, in writing
out the composition of m",an squaros in expanded form, by omittingprcd\\cJG . .
terms that have expectation zero. For example with this in mind (7) might
have been written
2222222
2
+ •••• E e.
E G.~ = E n ~ + Eng.~ + E e'l
+ E 8'2
1
~
1n
for thE. case where g. was consided a random variable.
1
In the case of more complicated analyses than thoSG considered in the foregoing examples, expressiuns for the composition of the various mean squares may
be v~ry long. Rather than follow the procedure outlin6d above in just the form
demon stated by examp16s 1 and 2.t it is more conVenient in these cases to recognize
that every m~an square can be computed as a linear function of one or mor~
"uncorrected" sums of squares and what is commonly called the correction factor.
Thus the expectation of a mean squarE. can be obtained by combining the bxpectations
of uncorrected s~~s of squar~s and the corrbction factor in the same way that the
sums of square and correction factor were combined to obtain the mean square.
The
procedure is to find the expectations of the uncorrected sums of squares that must
be computE:Q in the analysis and of the correction factor and then combine these
appropriately to obtain th~ 8xpectations of the mean squares.
Example 3
Consider the analysis of data obtained from comparison of ~ gcn6tic strains of
a particular annual crop in a randomized block design at each of s locations in each
of ! years. Assum8 ~ roplications in each location each year and that diff8rent
land or at least a new randomization is used in succ8ssive years at each location.
The form of the variance analysis is as follows:
-17Varianc;;; Source
d.f.
Locations
Years
. -m.s.
s-l
Lx Y
Reps in years and locations
Strains
L x Strains
Y x Strains
L x Y x Strains
Strains x reps in Land Y
t-l
(s-l)(t-l)
st(r-l)
n-l
(s-l) (n-l)
(t-l)(n-l)
(s-l)( t-l)(n-l)
st(r-l) (n-)
Total
rstn-l
Th.., modGl employcd will be as follows:
y, 'kl
~J
=~
+ g.]. + a.J + b k + (ab)'k
+ (ga),.
J
].J + (gb)l.'k
+ (g ab) , 'k + c, kl + (gc)., kl
l.J
wh8re
~
J
l.J
is th8 population mean
is thG effect of the i-th strain
is the effGct of the j-th location
is the tffect of the k-th year
is an 0ffoct arising from first order int~raction betw8en
ment conditions of the j-.th location and k-th year
~nviron­
is an effect arising from first order interaction of the i-th strain
wi th the j -th location
is an 8ffcct arising from first ordGr interaction of the i-th strain
with the k-th year
(gab)ijk is an &ff6ct arising from second order interaction of the i-th strain
with the j-th location and k-th year
is the effect of the l-th block at the j-th location in the k-th
year as a deviation from th6 mean for that location and year, and
-18(gc), 0kl is the effect of the plot to Which the i-th strain is assignbd in
1.J
the I-th block in the j-th location and k-th year (strictly speaking
it also contains a plot~strain interaction effect and the error of
measurement, but only in special cases would it be important to
indicate this sub-division in the model).
All effects will be considered random variables with mean zero. This would be
appropriate if the objective of the work was to compare the strains for use in
locations and years of which those involved in the experiment were a random sample,
and if th~ strains represented a random sample from a population from which other
strains might have been taken for comparison. It will also be assumed that all
effects vary randomly with respect to each other so that all covarianc6s among
pairs of 8ffccts are ZEro. This is an appropriate assumption in consideration of
th~ way work like this is usually conducted.
Finally, it will be assum6d that
E(ga)~ ,
is constant OVtlr all values of i and j
1.J
.
2
is constant over all values of i and k
2
is constant oVt:r all valu6s of j and k
E(gb)ik
E(ab)jk
2
E(gab), 'k is constant OVt;r all values of i, j, and k
1.J
2
is constant ovt::r all values of j" k" a.nd 1
E c jk1
2
E(gc) 0, kl is constant over all values of i, j, k, and 1.
1.J
The sense of this is that all individual EJff6cts within anyone of th8 six kinds
belong to a common population and have the variance of that population as the
Qxpectation of their squares. This is an assumption very commonly made in connection with analyses of the typE; in question, though it may not always be justified.
The letter T with appropriate subscripts is used to symboliZE: different sums
of the Y's.
T
T
i
T.
J
T
ij
For example,
= grand total
= sum for the
= sum for the
i-th variety (over all locations, years, and blocks)
j-th location (over all strains, years, and blocks)
.. sum for thE: i-th strain at thE; j -th location (over all years and blocks)
E:tc.
-19Carriud to its ultimat6 this means
but Yijkl will be used instead of Tijkle The uncorr~ct~d sums of squares will be
symbolized by S with appropriate subscripts. For 8xample,
= T2/nrst = the correction factor.
S
n
= ~ T~ /rst
i=l ~
n
s
S.. ...
~J
~
= uncorrected sum of squares for strains.
~ T~ ./rt
~J
i=l j=l
= uncorrected sum of squares for strain-location totals,
etce
The process of obtaining thE; expectations of the mean squar(;s can be amply
illustrated by considering only one mean square, say
~
--
1
[Sij
(n-i{s-l)
-
= (n-l) (s-l) [Si j - S. - Sj
~
1
E(M2 ) ... (n-i)(s-i)
The
SIS
rLE S..kJ
J
+
s]
- E S. - E S. + E
involved have th8 following composition
SiJ'
= r; ~
~
T~~J.
i
j
S.
J
=-
5
=-
1
nrt ~T~
j
J
2
T
nrst
1
It is computed us follows:
(Si - S) - (Sj - 5) -
1
Consequently
M2.
~
J
sl
-
sl
-
(27)
-20-
It follows that their expectations arB,
1
E Sij =rt
~
i
1
E S.~ :: rst
.~
~
j
2
E Tij
\
I
E T~~
~
i
(28)
1
E S. " "
- ~ E T~
nrt
J
J
j
E S ""
2
E T
-l:...nrat
the basis for obtaining the expccta tions of the
position. The expectation of the square of any of
4S
TJ.'j ::
TIs,
th~sc
~
~ y, 'ld
k 1 J.J
:2~~ Yijld
T.J.
::
T.
=
1
k
j
J
must !mow their com-
TIS w~
2~~ Yijkl
i
1
k
T ...
Expanding th8se sums in terms of the model for the analysis we have the following:
~ bk + r ~ (ab) 'k + rt(ga) ..
TJ.' ' = rt~ + rtg. + rta. + r
1..
J
J
+ r
~
k
(gab) .. k +
J.J
T :: rstiJ. + rstg
i
+
i
+
k
~~ c . kl
J.
k
1
rt ~ a.
j
J
+
+
1.
j
k
J.J
+ r
~
k
(gb). k
1.
~]
(go)., kl
k 1
1.J
rs ~ b k + r .~ ~ (ab) 'k + rt ~ (ga) ..
k
j
k
J
j
J.J
~ (gb). k + r ~~
_ ~ (gab), 'k +
rs ~
k
J
k
1.J
~ ~ ~
j
k
1
c'kl +
J
~ ~ ~
j
k
1
(go)"kl
J.J
-21-
= nrt~
T.
J
+
T
=s
+ rt
+
+
ff
r
nrst~
+ rst
~ gi
i
k
J
~ .~ (gab) .. k
+
i
k
1J
nrt ~ a. + nrs
j
J
~ ~ (ga) .. + rs ~ ~ (gb) 'k +
i
1J
j
j
k
1
The expectation -,f'
~ (ab)'k
nrta. + nr 2bk + nr
+
i
(gb)ik + r
~ ~ ~ ~
i
~g.1
rt
i
k
1.
r
+ n
k
~ ~ c. kl
k 1 J
~ bk
k
J
+
nr
+
+
~~
j
k
rt
Zi (ga) 1J..
~ ~ ~ (gc) .. kl
i
k 1
(ab) . k
J
Z
Z~ (gab) .. k + n
ij k
1.J
1J
~ ~ ~ CJ. kl
j
kl
(gc)'jkl
1.
:" ..
of the square of any of these T's is thb sum of the expecta-
tions of each term in the square. however, since all covariances among different
effects are Zbro (see statement of model) the expectations of all product terms in
the square of any T are also zero.
Thus only the expectations of the squares of
thG separate terms in the above expressions o.:mtribu'b<.; to the expectations we ar8
seeking.
These can be written directly from inspection of the terms.
For example,
2 2 2
'" r t tL
wh6re cr2
s~~bolizes
th~
population variance of the.bffect indicated by subscript
(bocause (1) the numbor of bls in the sum indicated
is t, (2) Eb~ = cr~, and (J) the expectation of the
product of two bls is zero)
Proceeding in this way the expectations can be written from the equations for the
T's as follows:
2 2 2
2 2 2
2 2 2
2 2
2 2
2 2 2
2 2
.. r t lJ. + r t crg + r t cra + r t crb + r t crab + r t ~ga + r t crgb
2
+ r t i b + rti + rti
ga
c
go
(29a)
-22b
E
2 2 2 2
T~~ =rs
t eJ.
22
22222 2
2
2 t2 2
r 2it a + r st aa + r stab + r 2 staab
+ r s a
g
ga
222
2
2
2
2
+ r s ta
+ r sta b + rstO' + rstO'
gb
ga
0
go
2 2 2 2
t IJ.
T~ =nr
+
222 + 2 2 2 2
222
0'
n r t cra + n r to'b
g
J
2 2
2 2
2 2
2
+ nr ta
gb + nr tcrgab + n rtcr0 + nrtO'go
+ nr t
= n22222
r s t tJ.
+
(29b)
n222
r taab + nr222
t 0'ga
(29c)
2222
2222
222222
2
222
+ nr s t 0'g + n r st 0'a + n r &I 'to:b + n r stO'a b + nr st 0'ga
2
2
n l i t igb + nr stigab + n rstic + nrstigo
(29d)
Note that the first of these expressions is constant no mattor which genotypelocation sum is in question (this is apparent since neith6r ~ nor ~ appears as a
subscript in the right hand side of the bxpression). The same sort of thing is
tru6 for the second and third expressions as well. Therefore, equations (28) can
bB rewritten as follows:
E S.. =
~J
Ii,
s.
~
it ens 1Ti
r;t [ n
=
E
E S. = n;t [ s E
J
E. S
j]
T~ ]
(0)
T~ J
2
E T
=.2.nrst
The only r0maining step is to substitute in (27) in terms of equations (29) and
(30). Collecting tGrms involving a common parameter at the sam6 time that the
su~stitutions are madt, we obtain,
E(M )
2
=
r2
llJ.
2
(nrst - nrst - nrst + nrst) + O'g (nrst - nrst - rst
2
2
O'a (nrst - nrt - nrst + nrt) + ab (nrs - nrs - nrs + nrs)
2
2
+ cra b (nrs - nr - nrs + nr) + aga (nrst - nrt - rst + rt)
+
2
gb (nrs - nrs - rs + rs) +
2
gab (nrs - nr - rs + r)
+
0"
+
O"c (ns - n - ns + n) + O'gc ens - n - s + 1)
2
2
= (rt (ns - n - s + 1)
+
(ns - n - s + 1)
0'
cr~cJ
i
]
+ r (ns - n - s + 1)
g;n-1f(s-1)
1
(n-l)(s-l)
i
b
g3
+
rst)
RLC:2:53
-23-
41'.
Sinc~
(n-l)(s-l)
= (ns
- n - s + 1) this reduc6s furthGr to
= rtiga
E(N )
2
i
+ r igab + gc
It is worth noting that thG mean square for locations is computod as
.ls-1
(S. - S)
J
and th e ont; for strains as
Thus thu 8xpectations of
mean squares could be quickly obtained in terms of
th(.s~
information developed in working out E(M ).
2
An important practical ang18 to note is that as one gains experience in working out mean square expectations various short cuts becomE:: apparent (for em ;,:xamplt,
Sub
Crump, Biometrics 1946).
short-cuts and when they can
applications if
h~
However, no attempt will be made to describe such
b~
goes through
used, as the novicG will run less chance of misth~
full procedure in d8tail until he perceives
short-cuts and their rationale by himself.
In doubtful caSeS it is always best to
proc06d in a straight-forward m;mn8r working through the full procedure described
above.
Example 4
On occasion estimates of
v~iancc
components are required from n-fold classi-
fication data in which sub-classes are disproportionate and in which in many
instances a portion of the sub-classes are not represented at all in the data.
the case of data avai1ab18 to
th~
In
animal geneticist for estimation of variances
arising from genetic variation or genotype-8nvironment interaction this can almost
be said to be the rule rather than the exception.
As a specific example suppose that data are available on the annual milk production of cows that were by different ,sires and that were members of different
herds.
It will be assumed that ffi8mb8rs of any particular sire family may have b6en
scattered through two or more herds but not ntcGssarily all herds.
e,
Herd
eff~cts
will vary due to managemtnt practices (and perhaps for other reasons), family
~ff€cts will vary as a result of g~notypic variation among sires, and
herdfamily int0raction 6ffocts may be pr0sum8d to Gxist.
base anaJ.ysis of the data would be as follows:
A rational model on which to
-24Y" k
1.J
=~
g. +
+
1.
3..
J
+ (ga) .. +
l.J
8.
ok
J.J
WhvrG Y
is tht. production of th.... k-th cow that is by the i-th sirt..' ;=..nd loc·.t\-(t
ijk
in th\S; j -th h8rd,
gi is the &ffect of th~ g~notype of the i-th sirE: (on production by his
daughters) J
~
is the effect of the j-th h~rd,
j
(ga)ij is a.n ('ffect due to intGraction betw(.;·.• n aver£l.ge genotype of thl;;; i-th
family and th0 environment to which cows are exposed in the j-th h~rdJ
and
6
is the deviation in production of the k-th cow from thl;) population
ijk
aV6rage for th~ i-th family in the j-th h8rd.
It will be assumed that all effects are random with population mean zero J that ::tll
individual effbcts ar", random wi th r~,spect to E:ach othE;r so that tht) 8XP(~ct9.tion
for any product of two 8ffGcts is zero, ~nd finally that
2
° is constant oVtjr all
1.J
E(ga).
valu~s
of i and j
and E 8 2
is constant oVe:r all values of i and j
ijk
';:1' production Wf~re measurt.:d in various years a raalistic mod.:l would includ·~ oth",r
c..t'i'l..cts but for th~ fJurpOS0 of this example we will assume all records W(jr(. t:o.ksn
in
~
single Y03.r.
'rh~rt:. arE: various computational approaches that may bt:; taken in thE us.? of
such data for estimation of variance components, but one of the. easiest that is
b~colning increasingly popular because of its ease is as follows:
In terms of our
example, four mean squares would be computed: mean squares for (1) families,
(2) herds, (3) hlilrd-family subclasses, and (4) cows within herds. Th<: expectationc
of the first thre8 ~f th~se will b6 linear functions of th8 varianc~ of all four
of th~ variabl~s in the model. The fourth will hav~ dxpectation, cr~. Once com'"
puted the four mean squnres would be equated to their r~spbctivc expectations
to
provida four equations in four unknowns (the varianc6s of the four <.:ff ,~cts) that
would then be solv~d simultaneously to obtain estimates of the four variancYs.
w~ lvill consider the mean square for subclasses (M
sc ) in detail. It would tv
[ ~i ~J. T~.j
~J
n ..
~J
- T2/N]
1
s:I
-25wh~r~
n .. is the number of cows of the i-th family in the k-th herd,
1J
T is the sum of production by all cows of th(; i-th family in tht; k-th h__rd,
ij
T is the grand total of proQuction by all cows,
N is the total
numb~r
of cows,
and s is thG number of sub-classes r8pr6scnted by one or mor6 cows.
Obviously,
=
E M
sc
Yl.'J'k
= n,.
1J
~
E(T~l.J'jn
j
1
s-1
~ ... n .. g.
ij
(31)
)
+ n i . a. + n .. (ga) .. +
J J
1J
1J
1J. l.:
Proceeding in accord with arguments presented in conntction with the pr8vious
examplG we can writ\:: directly
In contrast to example 3 this is not constant for all T., but v,,,,ri8s with n. .•
~
2
must now find the Gxp€cta tion of T •
T
= .r.;:j
"."
4 T., = NiJ.
'j,
j
1J
~
.~ ~
.~ n, ,( ga).. + "~ .~""
+ ~n.g.
+ ~
s../!l,a. + ~
~ ~ e, 'k
i
1 1
j
J J
i
j 1J
l.J
i j k l.J
wher8 n,
l.
= total
number of cows in the i-th family,
and n.
= total
numb~r
J
E T2 =
if tJ.2
+
"I~G
~
of cows in the j-th herd.
~ ni
i
ig
+
~ n~
j
J
(ia
+
~ ~ n~ ,
i
j
l.J
(33)
As ~n example of the detail involved in writing E T2 from the Gxpression for T,
conaid€r the term, ~ n. g.•
i
1
].
-26where f is the number of families
-,
( 2n.
i
~
g.)
2
+ •• ,.
~
+ product terms that need not
be written out since all have
zero Gxpectation.
~
Then E( ~ n. g.)
i J. J.
2
since the expectation of thE; square of any random
.
2
g J.S cr •
g
Substituting in (31) in terms of (32) and '33) we obtain,
E Msc
[2 ~
= S~l
i
(n . .
j
J.J
." 2
i
J.
~n.
-
(N~
2
2
+ crg
--r
· S:l [(Ni
2
2
- (NiJ. + 0'g
[i g
+
2
(J
~
+
i n';J'cr~
+
~
+ ni·i +
J a
~
~ 2
2~
+cr
+ cra
N
ga
~n.
i g ~i ~j n J.J..
~n?
2
~ ~n~.
i
NaN
2•
J.
ga
~n~.
• J.J
J
N
2
e
+0')
.
~
J.J
)
2
+ sa
"
I~
J
~ ~n~. J +
.
J.
2]
+cr)
e
~n~
(~~n
..
i
j J.J
.. ga (~~n
i
j
~J
J.J
~J
~nj2
~
.
J
N
O'~ .~i ~j n J.J.. + iga ~.~
n ..
i 'j
+
2 j
2
--+cr -+cr
i
nJ.J
.. i ga + i)
e
N
)
2
e
(J
bxpectations of the other mean squares are obtained by the same procedure as that
used for E M • For any particular body of data N, the n , the n , and thG n ,
ij
i
sc
j
can be obtained by mGre counting and hence, the coefficients of the sevoral
1farlances in E M can be compute:d.
sc
-27Final Comm0nts
The
follows:
~sscnce
of working out mGan square expectations can be summarizGd as
1.
2.
It is necessary to know what is meant by expectation.
It is nec~8sary to know the values that the definition of
J.
imposes on the expectations of (a) a constant (b) a random variat~ (c)
the product of a constant and a random variate (d) the squar~ of a random
variate, and (e) the product of two random variat~s (only the cases of
random variates with population mean zero are of special importance).
Fundamentally, thE; procedure is to write the mean square out symbolically
in a form that is expanded to the point that it is a linear function of
only terms of thE:. type (a) to (6) of point 2 abovl;.;.
When this has bGcn done, knowlvdge specified in point 2 above, together
with the rule that the ~xpectation of a linear function is equal to th(
4.
Cxp0ct~tion
same function of tlk expectations of the. separate t8rms of the quantity
for which the expectation is desired providus the basis for writing thu
d8sired expectation.
5. From the practical point of vi6W, many of thE; steps can and will be pt:;rformed only mentally (will not be written out).
HowevGr, in case of doubt,
writing steps out i~ detail is likely to insure against an occasional
serious error. There are rules-of-thumb that can sometimes bv used but
thuir application involves risk of error unless the entire matt0r is so
well understood that th~ rvason why these ru18s work in specific cases is
l"~r
entirely clear. Otherwise they may be applied in cases where thbY do not
work.
supplemontary reading on th", derivation of mt:;an squart; t;;xpectations Sc,6
And~rson
and Bancroft (1952) and Kempthorne (1952).
Lit8rature Cited
Anderson, R. L. and T. A. Bancroft (1952)
Hill.
Crump,
.S •• Lee
K~mpthorn~,
Statistical Theory in ResGarch, McGrawN8W York.
(1946)
The Estimation of Variance Compon;;:nts in Analysis of
Variance. Biometrics Bull. 2:7-11.
Oscar
(1952) The Design and Analysis of £Xperilncnts. John Wiley
and Sons, Inc.
New York.
Download