:,,/t f'! ~ . THE EXPECTATIONS OF MEAN SQUARES by R. E. Comstock Institute of Statistics Mimeograph Series No. 76 For Limited Distribution • I' ! j1"""' ) ,; Chapter VI ThL r.,XP.c.CTATIONS OF ML;AN SQU.l~ The Expectation of a Variable If individuals are drawn randomly from a population their avera3e value in terms of any specified measurement will be equal in th8 long run to the mean for the measurement in the population. We say that the value to be expected on the average is that of the population mean. In fact, in Statistics the expectation of a variable quantity is defined as the mean for such quantities in the population to which the particular variate belongs. For example, let Xl' X , ••••• Xi ••••• 2 symbolize the values of th8 individuals in any univariate population. Then the XIS constitute a population of quantities of which the expectation of anyone chosen at random is ~ where IJ. is the population mean. This is stated symbolix x cally as follows: E(X.) = IJ. x ~ - where X~ can be any of th~ XIS depending on the value given i and E(l.) is read ... ~ "the expectation of X.". ~ As a second example, recall that the population variance is defined as 2 (J = ~ .~ (Xi i 2 IJ.) /N the population variance, X.~ symbolizes the value of any individual quantity in the population, N is the number of individuals in the population, and 8~mbolizes IJ.x as before is the population mean. . f' d as th e mean 0 f a 11 va1ues, 1. 0 e. t h e population Thus the varlance, (J2 , . ~s d Slne 2 mean, of (Xi - IJ. ) • In accord with the definition of expectation we see that x E(X - IJ. )2 = i (2a) i x or if we wish to represent the deviation of Xi from its population mean by a single symbol, say xi' we can write X• ~ ." ~x X.~ - (2b) As a final example recall that the population covariance of two variables, say X and Y is defined as (J xy =~ i (X. - IJ. )(Y. - IJ. )/N 1 X ~ Y -2- where crxy is the covariance ana other symbols have meanings in conformity with those listed above when considering the variance of X. We see that the covariance, a , is defined as the population mean of (X. xy ~ ~ x )(y. - ~ ) and therefore that y ~ (3a) cr xy Again if we set and y. ~ Y. - z: ~ jJ. y we can write E(x.y.) = cr (3b) ~ ~ xy Interest in expectations centers around the fact that by setting observed quantities equal to their expectations we find a basis for unbiased estimation of parameters involved in the expectation. For example, it can be shown that E(X. ~ 1)2 = n-l n i Xis the mean of a sample of XIS, and n is the number of individuals in the sample. It follows that E (X. n(n-l) = (n - 1) where I~ ~=l or ~ j{)J . J r- E ls2 i n n = ~ i=1 i (x. - X)2 °1 n-l J ~ I ~ = a 2 From this we see that sample variance obtained by dividing the sum of squares by 2 degrees of freedom has cr as its expectation, i.e. that it provides on unbiased estimate of i. Expectation of a Constant This is specifically mentioned for completeness. Since a constant, by definition, is a quantity that always has the same value, the expectation of a constant could hardly be anything but that particular value. For example, a population mean is a constant and its expectation is the mean itself. Symbolically, if c is any constant, 1( c) = c (4) -)- e .li.xpectation of the Product of a Constant and a Variable Consider the product Y '" c X where X is a variable and c is a constant. is c ~x and, We know that the population mean of Y tberefor~, E(Y e C X) '" c ~x = c E(X) (5) In g6neral, the expectation of such a product is the product of the constant and the expectation of the variable. The Expectation of a Linear Function Consider the linear function F '" a + b + c Xl + X 2 in which ~, £. and c are constants and Xl and X are variable qUJrlti ties drawn 2 randomly (but not necessarily independently) from two populations (one, a population of quantities symbolized as Xl' the other a population of quantities symbolized as X ). Two points are worth special attention. 2 (1) The specific manner in which F is definod may have the result that value~ of Xl and 1 contributing to different values of the quantity, F, are 2 correlat~d or on the other hand are independent, i.e. uncorrelated. For example, suppose F is designed to reflGct in some special way the h6ight of married couples. Then any single value of F would involve the height of the husband (Xl) and that of his wife (X ). If the couples are chosen 2 randomly both Xl and X are random values from their respective popula2 tions, but art not necessarily independent in magnitude from one couple to anothero In fact, evidence indicates that there is a degree of correlation in stature of man and wife. On the other hand, suppose F were defined as the h8ight of plants, Xl as ti-!.,;: effect of genotype, and ~ as the effect of environment on height; and it were known that in the population of plants involved genotypes were distributed randomly with rl:Jspect to environment. The magnitudes of Xl ill1d X would vary independently from plant to plant and, 2 therefore, from one valub of F to another. -4(2) The different variables may actually belong to the sam~ population though it may be useful to think of them as coming from different ones. For example, in the function given above Xl and X could be a pair of 2 values drawn randomly from the same population, Xl' being the first and X2 the second drawn of any pair. In this case Xl and X would vary 2 independently, i.e. be uncorrelated. Corresponding to every possible pair of values of Xl and ~ there is obviously a value of F. These values comprise a population of F's. iie know that the mean value of F in that population is a ... b +c!J.l where ~l and ~ +~2 are the population means for Xl and X2 , respectively. E(F) = !J.F a Hence a + b + c !J.l + !J.2 where !J.F is the population mean of F. This serves to demonstrat~ the general fact that the 6xpectation of ~ variable quantity ~ is ~ linear function of other variables is the ~ linear function of ~ expectations of those variables. By this rule E(F) : E(a) + E(b) + E(c Xl) + E(~) and since E(a) =a E(b) =b E(c Xl) = c 1J.1 E(Y.2 ) = 1J.2 1,je have by substitution E( F) =a +b + C !J.l + 1J.2 as given above. Expectations of hcan Squares Any mean square can be writt8n as a linear function in which the variable quantities are the squares of variables, products of a variable with a constant, or products of variableS. Bbnc~, the expectations can always be written in terms of what is presented above. This fact will b~ clarified by examples. -5Example I Consider thG case groups of equal size. rtp~es~nted by the analysis of variance for comparing The form of the analysis is as follows: Variance Source d.fo Groups Within groups m-l m(n-l) Total m n-l m.s. where m is the number of groups and n is number of individuals within groups. The model on which the analysis is based can be stated symbolically as follows: Y.. = 1) where jJ. + g. + e .. 1 1J is the population mean taken over all groups, ~ gi isthe effect of the i-th group (the amount by which the for the i-th group deviates from ~), and ~opulation mean e .. is a random effect contributing to tht valu8 of Y for the j-th individual lJ in the i-th group (the amount by which the individual deviates from the mean for its group), One of two assumptions is usually made concerning the groups: (a) that they are random members of a population of groups, or (b) that the ones on which data are taken are of special interest in themselves rather than as a sample from a population. In case (a) the assll.'1lpt.io:1. is frequently stated by saying that g. is 1 considered a random v3.riabl<.., in con7..:;.'at:' to case (b) where it is alternatively said that the g. arc considered constant or fixed. 1 ~e g assumed to be a random variable will consider first the case where g. is consid0red a random variable. 1 G. be the sum of Yls for the N individuals of 1 T be thE.. Then thG mean sum of Y's for all nm th~ i-th group, and individuals on which data were collectedg squar~ 2 T -nm /m-l -6This may b~ considtred the product of a constant and a variable where constant and the quantity in brackets is the variable. be written, 1 . the m-.L Hence, its expectation may ~ 15 1 E, [ 1(G 2 + G2 + .... G2 ) E(M l ) '" m-l m l 2 :n 122 2 Note that :n (G + G + •••• G ) is "That we commonly call the "uncorrected sum of 2 l m squares", that T2 /nm is what we call the "correction factor", and that the whole quantity in brackets is the "corrected sum of squares". By the rule that the expectation of a linear function is the same function of ~xpectations of the variables in the function, we can write 1 [1 2 2 2 1 21 E(M l ) ... m-l .:n(EGl + EG2 + .... EGm) - run ET j (6) Now the separate expectations in the expression can be considered one by one. Consider EG~. In terms of our model, 1 EGi •EL~l Iij] 2•E[In .E(n~ .... 2 J + Y'2 + •••• Y.1n 1 + ng.1 + e' l + e'2 + • • •• e.1n ) 11 2 Squaring and taking expectations term by term this can be written '~G2 En 2 ~2 + En2 gi2 + E{ e il + e i2 1 i = .t. + •••• e. ) 2 1n + •••• e. ) 1n (7) Before going further note, that both the g I S and e I s are defined as deviations from a mean and hence, that the population mean of both the gls and e's is zero. Thus, E(g.) ... 0, E(e .. ) D 0, E{g~) = 0 2 and B{e~.) = 0 2 where 0 2 is the population vari1 1J 2 1 g 1J e g . ance of g's and cr is the population variance of e's. It is common to assume that e all els are members of the same population and, therefore, that 0 2 is homogeneous e over all groups. This assumption will be made for the purpose of our example but e it should be understood that special cases may arise where the variance of e varies from group to group. It should also be noted that all g' sand e I s are assumed to -7be random members of their populations. The significance of this is that in th~ population (the population that would be generated by repeating the experiment in identical fashion an infinity of times) the correlation between (1) any two gIs, (2) any e's, or (3) any ~ and any ~ would be zero. If the correlation is zero, so also is the covariance and this means that the expectations of all products of two gIs, two e's or a K and an e are all equal to zero. Symbolically this is stated as follows: E(g.1. g!) 1. "" 0 E(e .. et I,) =0 1.J 1. J E(g,' e, ,) = 0 1. 1.J (i f i' ) (ir i ' ifj =j',jrj' ifi=i') (either when i = i' or Wben i , i') Now let us consider the s~veral terms of EGi2 one by one. .. 2 2 2 2 (a) J:!,njJ. ... n IJ. • (since n2~2 is a constant) 2 2 2 2 (since n is a constant) (b) E n gi "" n 1. 2 2 (12 ) "" n (1 (since Ei Egi g (c) •••• Sin) IS g 2 •••• + •••• 2 ... n(1~ (since thu expectation of each of the 6 ,S, n of them, is that of each product term is zero) 2 2 (d) E2n IJ.g, ... 2n2~g. (sinc~ 2n 1J. is a constant) 1. 1. ... zero (since Egi = 0) ') (J£. e ::md (e) E2nIJ. (e i1 + 8 i2 + •••• Sin) ... 2n~ (e + e + •••• e ) (since 2n~ is a constant) il i2 in = z~ro (since the expectation of all e's is zero and, therefore, that of the sum of any set of e's is also zero) (f) E2ng, (e'l + b i2 + •••• e. ) ... 2nEg . «(;'1 + (;. 2 + •••• e. ) (sinct;: 2n is 1. 1. 1.n 1. 1. 1. . 1.n a constant) • Z0ro (since the expectation of the product of any ~ and ~ is zero) REC:2:5J -8Substituting in (7) in terms of (a) to (f) we find that, 22222 2 ~ + n a + na g e EG.~ = n (8) Now note that nothing in (8) is specific for the particular group in question (! does not appear as a subscript in the right hand member). The significance is-that 2 the expectation of G is the same for all groups, that EGi = In order to evaluate Substituting for the 2 E T GIS =E [ ~ EG~ = .... EG; ) it remains only to obtain l E T2 = E (Gl + G + •••• G )2 2 m E(M 2 ET • we obtain + n(gi + g2 + e 2l + e 22 + .... + e 2n .... + e ml + em2 + •••• emn J 2 (9a ) Squaring, taking expectations term by term, and moving constants to th~ left of the sign for expectation (proper because the expectation of the product of a constant and a variable is equal to the product of the constant and the expectation of the variable) we get E T2 = 222 2 2 2 2 2 n m ~ + n Eg l + n Eg2 + •••• n 2 E~ - 2 d t t erms 12 + •••• ~emn + pro uc 2 2 of the types 2n m~ Eg , 2n Eg l g , l 2 2 + Ee 2 + E-ell 2n E.g l ell' or 2~ 11 8 12 (9b) Consider thE. various terms of this expression 2 Ei1 = n2 .c.g_ 22 = (g) n (h) .2 J::!,e ll c: Ee 2 = •••• 12 •••• n = Ee 2 2 E2 gm = n2 2 ri) g (since Eg~ = g (J 2 e (since Ee~. ~J = C1 mn = (i) all proauct torms are of types shown to have process of developing E G~. i)e z~ro expectation in the ~ Substituting in (9b) in terms of (g) to (i) we obtain 222 2 2 2 2 E T =nrotJ. + n ma + nma g e (10) -9Finally substituting in (6) in terms of (8) and (10) we find 1 [m 1 (n2m:1. 2 2 + n2mO"2 + nma2] E(M 1 ) ... ~ -n (n 2~II.2 + n2 a2g + no2e ) - -nm m-~ ~ g e . . ~2 l-mn-mn] m-r The within group 2 + ag rmn-n] l- m-lJ m-l = nag2+, aE:2 2 ~m-r + ae (11) me~n square may be computed as follows: m 02 1 ~ 2 u2 _2 1 M2 = m(n- 1) .~1 (Y'J. l + ~2 + •••• ~J.n ---) J. n J.= RcmE:mbering (a) that the expectation of the product and of a constant and a variable is thE. product of the constant and the expectation of thE; varic::blt: and (b) that tht: E:xp~ctation of a variable that is a linear function of variables is the same function of the expectations of these later variables, we se& that E(~) z m(;-l} ~l [E~l + E~2 + •••• E~n - ~ Eoi] (12 ) Consider the expectation of Y2 ij Therefore, E1:, • E(~ J.J Expanding ~nd + g. + J. 6 .)2 iJ taking expectations of individual terms separately we obtain, Ei,J.J = E~2 + Eg~J. Ee~, J.J + + E 2~g.J. + E 2~e J.J .. + E 2g.e.. J. (13) J.J Taking the terms of this 6xpr0ssion separately, (j) E~2 ~ ~2 (bbcausG ~2 is a constant), (k) Ei = ig J. (by definition when thcl g's arE.: assum8d random), (1) Ee~, = 0 2e (by definition), J.J (m) E 2~g. = 2~ Eg. = zero (since J. J. (n) 2~ E 2~e .. = 2~ £8., ... Z8ro (since J.J J.J E 2g.e., ... 2 Eg.e .. is a constant and Eg. = 0), 2~ J. is a constant and Ee'iJ' = 0), zero (since 2 is a constant and Eg.e., J. J.J Substituting in (13) in terms of (j) to (0) we obtain, (0) J. J.J = J. J.J ;;'y2. . l~ J.J =1-.1.2 2 +0 g 2 +0 e a 0). (14) -10- We have already shown (8) that the expectation of G2 is, i G~1 ,.. n 21J.2 + n 2 i g + neie E (8) Note that both (14) and (8) are the same for all Y's and GIS, respectively (all terms in right hand members are constants). Recognizing this and substituting in (12) in terms of (8) and (14) we obtain, E(~) • m(:-l) [n(~2 + <7~ + <7~) - ~(nV + n2<7~ - 2 f'rn(n-n) -J - lJ._m(n-l) eig + ei [m(n-n) ] + [m(n-l)l men-i) e _m(n-l)-. Using (11) and (15) the analysis of variance can now be expectations of the mean squares. Variance Source d.f. Group·s m-l Within groups Total + n<7;)l Ul cle pr~sented (15) giving t~e Expectation of m.s. 2 2 (j + n(j e 2 (j m(n-l) g e mn-l assumed to be constants occasioned by assuming the gls constant rather than random are ~~~ Differ~nces listed below. gls random E =0 g1 2 E gi = gls constant E (j 2 g Bcg. = 0 1 g. :; g. 1 1 Ei = gi2 Ecg. = cg.1 1 l. where c is any constant other expectations involv8d in (7), (9b), and (13) are not affGcteo. ~'iith the above differences in mind we s~& that in this case (7) does not reduce to (8) but to E G~1 (16) In like manner (9b) reduces to E.T 2 2 2 2 + nm(j2 =nmlJ. e (17) -11- rather th<l.n to (10). The reason why no terms involving gls or the squar·:;s or products of gls occurs in (17) is clarified by refer6nce to (9a). Note that the gls ent~r (9a) in a tarm that is the sum of th8 gls for the ~ groups. In the caS6 where; ths g I S are assumed constant tJ. is taken as the population mean for the ~ groups in qU0stion. Then, since the gls are defined as deviations from this mean, th~ir ~um must be zero. Hence, the term n(gl + g2 + •••• gn) disappears from (9a) and correspondingly t~rms involving gls disappear from (9b). Finally (13) reduces to (18) rather than to (14). Substituting in (6) in terms of (16) and (17) rather than in terms of (8) and (10) we obtain, m m 1 1 2 2 2'~ 2 2'" 2 E(M ) = -(ron j.L + n ..::::; g. + 2n jJ. ~ g~ + nmo ) l e .~= 1 ~ .1'"1 ... m-l [ n m Keeping in mind that :2 i==l g. 0 as pointed out above this reduces to I:: J. r-~?-.!~ 1 ~ .1 + m-l m,,·J. ~ m 2: i.l.' + a:"" i=l [ m- 1 1 m-l _ (19) Substituting in (12) in t,:;rms of (16) and (18) rather than in terms of (8) and (14) we obtain, , f ._ \lTlniJ. m 2 + n 2 g.~ + 21J.I1 m ~ i=1 2 g. + mna ) .1. e m 1 2 2 2~ 2 2 ~ - - (mn jJ.. + n ~ g. + 2n u. ~ g. + n i=l = ,J.2 2 '" (J e .1. i=l n-n ] [_nl(n-l) .1. mna~] 2 + aE;; (20) -12- m ~ie have again used the fact that ~ g. = o. The analysis of variance with GXp0Ci=l tatiom is now as follows: ~ Variance Source d.f. Groups m-l Expectation of m.Q. m 2+..£... ~ 2 O"e m-l .~ gi 1=1 TNithin gr oups 2 m(n-l) Total 0" e mn-l Example 2 As a variation of example 1 consider th~ analysis of variance for comparison of groups of unequal size. Let n , n •••••• n symbolize the number per group l m 2 in groups 1, 2, •••••• m, respectively. Th~ form of th~ analysis is as follows: Variance Sourcf:) d.f. m. s. Groups m-l M l m ~ Hithin groups i=l Total (ni-l) N-l wher\:J N is the total numb"r of individuals in all groups. Lxcept for variation in group size the model will be the same as in example 1. will consider only the case where square for groups is is considered a random variahle. ~ G~ ::L .... G~_ c:mlP:tc:1a[s + mnl n2 The mean (21) nm R.eferring to (7) and (8) it is clear that E G~ ~ 2 2 n.~ ~ = n1 jJ. = + ni2 0" + n.J. 0" 2 g + n. ~ 2 (j e and hbnce that 2 E(G.J. In.J. ) 2 2 + 0"2 e g (22 ) -13- 'l' is now equal to N~ + n1g1 + n2g2 + •••• nmgm + ell + 8 12 + •••• sln 1 + e 21 + 8 + •••• e 22 2n2 + •••• em1 + em2 + •••• emn m Squaring and taking expuctations but ommitting t~rms with ~xp~ctation zero we obtain, 222.22 2222 E T '" N t.l. + E n 1g'l + E g2 + •••• + E nmgm nz + E 2 + ell ~ ~ 6 2 E 2 12 + •••• + sln + 1 222 + E t mn m terms, this becomes •••• E em1 + E 8m2 + •••• Evaluating thb s~parat~ 2 E T2 • N r"',,2 + n210'2g + n 22 0'2g + and. hence, m 2 E(T IN) 2 = N~ 2 2 .z n. IN + 0' g i"'l e 2 + 0' ~ (23 ) 1. 2 2 lJic have NO'e bbcause there are a total of N terms of the type 1e that art-' equal 11 2 to 0' • Writing E(M ) in tbrms of (21), (22), and (23) we get 1 e E(M1 ) '" m~l [i ~ i"'l m Noting that ~ n. i=l 1. = N, ni + ig ~ i=l m ni + mO"; Ni - i g i=l ~ n~/N 1. - this reduc0s to (24) The coefficient of 0'2 in g (24) is of the same form as given by Sn~docor (p.234, 1948). The wi thin group mean square is computed as -14- N~m [i! M2 • N~m f'til ti2 + • + ~l + ~2 + + ••• + .... .....fmn )m 'raking the expectation term by term we have E(~) • N~m [Etil + Ei1 12 + .... + Ei.fml + E~2 Ei1 ln1 + E~l + E~2 + .... E~~ + •••• E.fmn - EG~/nl m (25) The L-xpectation of the square of any single Y is in no way affect8d by the numb(~r of individuals obs~rv8d in each group. Therefore, it is giv6n by (14). Substituting in (25) in terms of (14) ,md (22) we obtain, E(~'2) N~m • t ~ N"~ + N"~ i=l m - 1=1 ~ n. i - miJ fL 2 m + • J. Rdmembering tha t jJ. and ig arE; g n. jJ. J. e m cons tan ts and that ~ n. i=l J. = N, this reduces to (26) R~f8rring to (24) the analysis of variance with mean square 8xp0ctations can now bo writtbn as follows: and (26) -15VariancE: Source d.f. Bxpcctationc of m.s. Groups m-l r::le Within groups N-m Total N-l + nt m -.:'1 1 where n ~ ". ~ m-J. --~ i=l r::lg 2 ni N - --;:NO;--- Genoral Procedures Before turning to other examples it will be useful to summariZ8 the general procedures demonstrat0d in the foregoing examples. Steps in th8 procedure are list8d b,;;low. 1. Specification of the model. This includes a symbolic statement of the composition of the individual values that make up th0 data, assumptions as to whether the various ~ffccts are fixed or random, and assumptions concerning whether separate 8ffects vary independently. 2. composition of each moan square is written out in terms of the and the steps followed in computing the mean square~ Th~ 3. The expectation of the mean square is developed term Rules employed in step 3 may be summarized as follows. mod~l by term. 1. The expectation of a constant is thu constant itsolf. 2. The 6xp~ctation of ~ variable is the population mGan of the v~riable. The expectation of the square of a variable '~hat has population mean zero 3. 4. 5. is the population variance of the variable. The expectation of tht product of a constant and a variable is the product of the constant and th~ expectation of the variable. The expectation of th6 product of two variables that have population mean zero is the population covariance of th0 variables. 6. The population covariance of any two variable effects is zero whenever the particular two effects contributing to any OnE) data may be assumed to lations. bejn~ drawn m(;asur~ment in thi'j from their respective popu- -16Two points merit special attention. 1. It is d6sirable to write the model in terms of a gen~ral mean so that all effects will have zero as thdr popula.tion m8an. This allows taking advantage of 3 and 5 above. 20 If 6 aboVe is kept in mind a great deal of labor can bo saved, in writing out the composition of m",an squaros in expanded form, by omittingprcd\\cJG . . terms that have expectation zero. For example with this in mind (7) might have been written 2222222 2 + •••• E e. E G.~ = E n ~ + Eng.~ + E e'l + E 8'2 1 ~ 1n for thE. case where g. was consided a random variable. 1 In the case of more complicated analyses than thoSG considered in the foregoing examples, expressiuns for the composition of the various mean squares may be v~ry long. Rather than follow the procedure outlin6d above in just the form demon stated by examp16s 1 and 2.t it is more conVenient in these cases to recognize that every m~an square can be computed as a linear function of one or mor~ "uncorrected" sums of squares and what is commonly called the correction factor. Thus the expectation of a mean squarE. can be obtained by combining the bxpectations of uncorrected s~~s of squar~s and the corrbction factor in the same way that the sums of square and correction factor were combined to obtain the mean square. The procedure is to find the expectations of the uncorrected sums of squares that must be computE:Q in the analysis and of the correction factor and then combine these appropriately to obtain th~ 8xpectations of the mean squares. Example 3 Consider the analysis of data obtained from comparison of ~ gcn6tic strains of a particular annual crop in a randomized block design at each of s locations in each of ! years. Assum8 ~ roplications in each location each year and that diff8rent land or at least a new randomization is used in succ8ssive years at each location. The form of the variance analysis is as follows: -17Varianc;;; Source d.f. Locations Years . -m.s. s-l Lx Y Reps in years and locations Strains L x Strains Y x Strains L x Y x Strains Strains x reps in Land Y t-l (s-l)(t-l) st(r-l) n-l (s-l) (n-l) (t-l)(n-l) (s-l)( t-l)(n-l) st(r-l) (n-) Total rstn-l Th.., modGl employcd will be as follows: y, 'kl ~J =~ + g.]. + a.J + b k + (ab)'k + (ga),. J ].J + (gb)l.'k + (g ab) , 'k + c, kl + (gc)., kl l.J wh8re ~ J l.J is th8 population mean is thG effect of the i-th strain is the effGct of the j-th location is the tffect of the k-th year is an 0ffoct arising from first order int~raction betw8en ment conditions of the j-.th location and k-th year ~nviron­ is an effect arising from first order interaction of the i-th strain wi th the j -th location is an 8ffcct arising from first ordGr interaction of the i-th strain with the k-th year (gab)ijk is an &ff6ct arising from second order interaction of the i-th strain with the j-th location and k-th year is the effect of the l-th block at the j-th location in the k-th year as a deviation from th6 mean for that location and year, and -18(gc), 0kl is the effect of the plot to Which the i-th strain is assignbd in 1.J the I-th block in the j-th location and k-th year (strictly speaking it also contains a plot~strain interaction effect and the error of measurement, but only in special cases would it be important to indicate this sub-division in the model). All effects will be considered random variables with mean zero. This would be appropriate if the objective of the work was to compare the strains for use in locations and years of which those involved in the experiment were a random sample, and if th~ strains represented a random sample from a population from which other strains might have been taken for comparison. It will also be assumed that all effects vary randomly with respect to each other so that all covarianc6s among pairs of 8ffccts are ZEro. This is an appropriate assumption in consideration of th~ way work like this is usually conducted. Finally, it will be assum6d that E(ga)~ , is constant OVtlr all values of i and j 1.J . 2 is constant over all values of i and k 2 is constant oVt:r all valu6s of j and k E(gb)ik E(ab)jk 2 E(gab), 'k is constant OVt;r all values of i, j, and k 1.J 2 is constant ovt::r all values of j" k" a.nd 1 E c jk1 2 E(gc) 0, kl is constant over all values of i, j, k, and 1. 1.J The sense of this is that all individual EJff6cts within anyone of th8 six kinds belong to a common population and have the variance of that population as the Qxpectation of their squares. This is an assumption very commonly made in connection with analyses of the typE; in question, though it may not always be justified. The letter T with appropriate subscripts is used to symboliZE: different sums of the Y's. T T i T. J T ij For example, = grand total = sum for the = sum for the i-th variety (over all locations, years, and blocks) j-th location (over all strains, years, and blocks) .. sum for thE: i-th strain at thE; j -th location (over all years and blocks) E:tc. -19Carriud to its ultimat6 this means but Yijkl will be used instead of Tijkle The uncorr~ct~d sums of squares will be symbolized by S with appropriate subscripts. For 8xample, = T2/nrst = the correction factor. S n = ~ T~ /rst i=l ~ n s S.. ... ~J ~ = uncorrected sum of squares for strains. ~ T~ ./rt ~J i=l j=l = uncorrected sum of squares for strain-location totals, etce The process of obtaining thE; expectations of the mean squar(;s can be amply illustrated by considering only one mean square, say ~ -- 1 [Sij (n-i{s-l) - = (n-l) (s-l) [Si j - S. - Sj ~ 1 E(M2 ) ... (n-i)(s-i) The SIS rLE S..kJ J + s] - E S. - E S. + E involved have th8 following composition SiJ' = r; ~ ~ T~~J. i j S. J =- 5 =- 1 nrt ~T~ j J 2 T nrst 1 It is computed us follows: (Si - S) - (Sj - 5) - 1 Consequently M2. ~ J sl - sl - (27) -20- It follows that their expectations arB, 1 E Sij =rt ~ i 1 E S.~ :: rst .~ ~ j 2 E Tij \ I E T~~ ~ i (28) 1 E S. " " - ~ E T~ nrt J J j E S "" 2 E T -l:...nrat the basis for obtaining the expccta tions of the position. The expectation of the square of any of 4S TJ.'j :: TIs, th~sc ~ ~ y, 'ld k 1 J.J :2~~ Yijld T.J. :: T. = 1 k j J must !mow their com- TIS w~ 2~~ Yijkl i 1 k T ... Expanding th8se sums in terms of the model for the analysis we have the following: ~ bk + r ~ (ab) 'k + rt(ga) .. TJ.' ' = rt~ + rtg. + rta. + r 1.. J J + r ~ k (gab) .. k + J.J T :: rstiJ. + rstg i + i + k ~~ c . kl J. k 1 rt ~ a. j J + + 1. j k J.J + r ~ k (gb). k 1. ~] (go)., kl k 1 1.J rs ~ b k + r .~ ~ (ab) 'k + rt ~ (ga) .. k j k J j J.J ~ (gb). k + r ~~ _ ~ (gab), 'k + rs ~ k J k 1.J ~ ~ ~ j k 1 c'kl + J ~ ~ ~ j k 1 (go)"kl J.J -21- = nrt~ T. J + T =s + rt + + ff r nrst~ + rst ~ gi i k J ~ .~ (gab) .. k + i k 1J nrt ~ a. + nrs j J ~ ~ (ga) .. + rs ~ ~ (gb) 'k + i 1J j j k 1 The expectation -,f' ~ (ab)'k nrta. + nr 2bk + nr + i (gb)ik + r ~ ~ ~ ~ i ~g.1 rt i k 1. r + n k ~ ~ c. kl k 1 J ~ bk k J + nr + + ~~ j k rt Zi (ga) 1J.. ~ ~ ~ (gc) .. kl i k 1 (ab) . k J Z Z~ (gab) .. k + n ij k 1.J 1J ~ ~ ~ CJ. kl j kl (gc)'jkl 1. :" .. of the square of any of these T's is thb sum of the expecta- tions of each term in the square. however, since all covariances among different effects are Zbro (see statement of model) the expectations of all product terms in the square of any T are also zero. Thus only the expectations of the squares of thG separate terms in the above expressions o.:mtribu'b<.; to the expectations we ar8 seeking. These can be written directly from inspection of the terms. For example, 2 2 2 '" r t tL wh6re cr2 s~~bolizes th~ population variance of the.bffect indicated by subscript (bocause (1) the numbor of bls in the sum indicated is t, (2) Eb~ = cr~, and (J) the expectation of the product of two bls is zero) Proceeding in this way the expectations can be written from the equations for the T's as follows: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 .. r t lJ. + r t crg + r t cra + r t crb + r t crab + r t ~ga + r t crgb 2 + r t i b + rti + rti ga c go (29a) -22b E 2 2 2 2 T~~ =rs t eJ. 22 22222 2 2 2 t2 2 r 2it a + r st aa + r stab + r 2 staab + r s a g ga 222 2 2 2 2 + r s ta + r sta b + rstO' + rstO' gb ga 0 go 2 2 2 2 t IJ. T~ =nr + 222 + 2 2 2 2 222 0' n r t cra + n r to'b g J 2 2 2 2 2 2 2 + nr ta gb + nr tcrgab + n rtcr0 + nrtO'go + nr t = n22222 r s t tJ. + (29b) n222 r taab + nr222 t 0'ga (29c) 2222 2222 222222 2 222 + nr s t 0'g + n r st 0'a + n r &I 'to:b + n r stO'a b + nr st 0'ga 2 2 n l i t igb + nr stigab + n rstic + nrstigo (29d) Note that the first of these expressions is constant no mattor which genotypelocation sum is in question (this is apparent since neith6r ~ nor ~ appears as a subscript in the right hand side of the bxpression). The same sort of thing is tru6 for the second and third expressions as well. Therefore, equations (28) can bB rewritten as follows: E S.. = ~J Ii, s. ~ it ens 1Ti r;t [ n = E E S. = n;t [ s E J E. S j] T~ ] (0) T~ J 2 E T =.2.nrst The only r0maining step is to substitute in (27) in terms of equations (29) and (30). Collecting tGrms involving a common parameter at the sam6 time that the su~stitutions are madt, we obtain, E(M ) 2 = r2 llJ. 2 (nrst - nrst - nrst + nrst) + O'g (nrst - nrst - rst 2 2 O'a (nrst - nrt - nrst + nrt) + ab (nrs - nrs - nrs + nrs) 2 2 + cra b (nrs - nr - nrs + nr) + aga (nrst - nrt - rst + rt) + 2 gb (nrs - nrs - rs + rs) + 2 gab (nrs - nr - rs + r) + 0" + O"c (ns - n - ns + n) + O'gc ens - n - s + 1) 2 2 = (rt (ns - n - s + 1) + (ns - n - s + 1) 0' cr~cJ i ] + r (ns - n - s + 1) g;n-1f(s-1) 1 (n-l)(s-l) i b g3 + rst) RLC:2:53 -23- 41'. Sinc~ (n-l)(s-l) = (ns - n - s + 1) this reduc6s furthGr to = rtiga E(N ) 2 i + r igab + gc It is worth noting that thG mean square for locations is computod as .ls-1 (S. - S) J and th e ont; for strains as Thus thu 8xpectations of mean squares could be quickly obtained in terms of th(.s~ information developed in working out E(M ). 2 An important practical ang18 to note is that as one gains experience in working out mean square expectations various short cuts becomE:: apparent (for em ;,:xamplt, Sub Crump, Biometrics 1946). short-cuts and when they can applications if h~ However, no attempt will be made to describe such b~ goes through used, as the novicG will run less chance of misth~ full procedure in d8tail until he perceives short-cuts and their rationale by himself. In doubtful caSeS it is always best to proc06d in a straight-forward m;mn8r working through the full procedure described above. Example 4 On occasion estimates of v~iancc components are required from n-fold classi- fication data in which sub-classes are disproportionate and in which in many instances a portion of the sub-classes are not represented at all in the data. the case of data avai1ab18 to th~ In animal geneticist for estimation of variances arising from genetic variation or genotype-8nvironment interaction this can almost be said to be the rule rather than the exception. As a specific example suppose that data are available on the annual milk production of cows that were by different ,sires and that were members of different herds. It will be assumed that ffi8mb8rs of any particular sire family may have b6en scattered through two or more herds but not ntcGssarily all herds. e, Herd eff~cts will vary due to managemtnt practices (and perhaps for other reasons), family ~ff€cts will vary as a result of g~notypic variation among sires, and herdfamily int0raction 6ffocts may be pr0sum8d to Gxist. base anaJ.ysis of the data would be as follows: A rational model on which to -24Y" k 1.J =~ g. + + 1. 3.. J + (ga) .. + l.J 8. ok J.J WhvrG Y is tht. production of th.... k-th cow that is by the i-th sirt..' ;=..nd loc·.t\-(t ijk in th\S; j -th h8rd, gi is the &ffect of th~ g~notype of the i-th sirE: (on production by his daughters) J ~ is the effect of the j-th h~rd, j (ga)ij is a.n ('ffect due to intGraction betw(.;·.• n aver£l.ge genotype of thl;;; i-th family and th0 environment to which cows are exposed in the j-th h~rdJ and 6 is the deviation in production of the k-th cow from thl;) population ijk aV6rage for th~ i-th family in the j-th h8rd. It will be assumed that all effects are random with population mean zero J that ::tll individual effbcts ar", random wi th r~,spect to E:ach othE;r so that tht) 8XP(~ct9.tion for any product of two 8ffGcts is zero, ~nd finally that 2 ° is constant oVtjr all 1.J E(ga). valu~s of i and j and E 8 2 is constant oVe:r all values of i and j ijk ';:1' production Wf~re measurt.:d in various years a raalistic mod.:l would includ·~ oth",r c..t'i'l..cts but for th~ fJurpOS0 of this example we will assume all records W(jr(. t:o.ksn in ~ single Y03.r. 'rh~rt:. arE: various computational approaches that may bt:; taken in thE us.? of such data for estimation of variance components, but one of the. easiest that is b~colning increasingly popular because of its ease is as follows: In terms of our example, four mean squares would be computed: mean squares for (1) families, (2) herds, (3) hlilrd-family subclasses, and (4) cows within herds. Th<: expectationc of the first thre8 ~f th~se will b6 linear functions of th8 varianc~ of all four of th~ variabl~s in the model. The fourth will hav~ dxpectation, cr~. Once com'" puted the four mean squnres would be equated to their r~spbctivc expectations to provida four equations in four unknowns (the varianc6s of the four <.:ff ,~cts) that would then be solv~d simultaneously to obtain estimates of the four variancYs. w~ lvill consider the mean square for subclasses (M sc ) in detail. It would tv [ ~i ~J. T~.j ~J n .. ~J - T2/N] 1 s:I -25wh~r~ n .. is the number of cows of the i-th family in the k-th herd, 1J T is the sum of production by all cows of th(; i-th family in tht; k-th h__rd, ij T is the grand total of proQuction by all cows, N is the total numb~r of cows, and s is thG number of sub-classes r8pr6scnted by one or mor6 cows. Obviously, = E M sc Yl.'J'k = n,. 1J ~ E(T~l.J'jn j 1 s-1 ~ ... n .. g. ij (31) ) + n i . a. + n .. (ga) .. + J J 1J 1J 1J. l.: Proceeding in accord with arguments presented in conntction with the pr8vious examplG we can writ\:: directly In contrast to example 3 this is not constant for all T., but v,,,,ri8s with n. .• ~ 2 must now find the Gxp€cta tion of T • T = .r.;:j "." 4 T., = NiJ. 'j, j 1J ~ .~ ~ .~ n, ,( ga).. + "~ .~"" + ~n.g. + ~ s../!l,a. + ~ ~ ~ e, 'k i 1 1 j J J i j 1J l.J i j k l.J wher8 n, l. = total number of cows in the i-th family, and n. = total numb~r J E T2 = if tJ.2 + "I~G ~ of cows in the j-th herd. ~ ni i ig + ~ n~ j J (ia + ~ ~ n~ , i j l.J (33) As ~n example of the detail involved in writing E T2 from the Gxpression for T, conaid€r the term, ~ n. g.• i 1 ]. -26where f is the number of families -, ( 2n. i ~ g.) 2 + •• ,. ~ + product terms that need not be written out since all have zero Gxpectation. ~ Then E( ~ n. g.) i J. J. 2 since the expectation of thE; square of any random . 2 g J.S cr • g Substituting in (31) in terms of (32) and '33) we obtain, E Msc [2 ~ = S~l i (n . . j J.J ." 2 i J. ~n. - (N~ 2 2 + crg --r · S:l [(Ni 2 2 - (NiJ. + 0'g [i g + 2 (J ~ + i n';J'cr~ + ~ + ni·i + J a ~ ~ 2 2~ +cr + cra N ga ~n. i g ~i ~j n J.J.. ~n? 2 ~ ~n~. i NaN 2• J. ga ~n~. • J.J J N 2 e +0') . ~ J.J ) 2 + sa " I~ J ~ ~n~. J + . J. 2] +cr) e ~n~ (~~n .. i j J.J .. ga (~~n i j ~J J.J ~J ~nj2 ~ . J N O'~ .~i ~j n J.J.. + iga ~.~ n .. i 'j + 2 j 2 --+cr -+cr i nJ.J .. i ga + i) e N ) 2 e (J bxpectations of the other mean squares are obtained by the same procedure as that used for E M • For any particular body of data N, the n , the n , and thG n , ij i sc j can be obtained by mGre counting and hence, the coefficients of the sevoral 1farlances in E M can be compute:d. sc -27Final Comm0nts The follows: ~sscnce of working out mGan square expectations can be summarizGd as 1. 2. It is necessary to know what is meant by expectation. It is nec~8sary to know the values that the definition of J. imposes on the expectations of (a) a constant (b) a random variat~ (c) the product of a constant and a random variate (d) the squar~ of a random variate, and (e) the product of two random variat~s (only the cases of random variates with population mean zero are of special importance). Fundamentally, thE; procedure is to write the mean square out symbolically in a form that is expanded to the point that it is a linear function of only terms of thE:. type (a) to (6) of point 2 abovl;.;. When this has bGcn done, knowlvdge specified in point 2 above, together with the rule that the ~xpectation of a linear function is equal to th( 4. Cxp0ct~tion same function of tlk expectations of the. separate t8rms of the quantity for which the expectation is desired providus the basis for writing thu d8sired expectation. 5. From the practical point of vi6W, many of thE; steps can and will be pt:;rformed only mentally (will not be written out). HowevGr, in case of doubt, writing steps out i~ detail is likely to insure against an occasional serious error. There are rules-of-thumb that can sometimes bv used but thuir application involves risk of error unless the entire matt0r is so well understood that th~ rvason why these ru18s work in specific cases is l"~r entirely clear. Otherwise they may be applied in cases where thbY do not work. supplemontary reading on th", derivation of mt:;an squart; t;;xpectations Sc,6 And~rson and Bancroft (1952) and Kempthorne (1952). Lit8rature Cited Anderson, R. L. and T. A. Bancroft (1952) Hill. Crump, .S •• Lee K~mpthorn~, Statistical Theory in ResGarch, McGrawN8W York. (1946) The Estimation of Variance Compon;;:nts in Analysis of Variance. Biometrics Bull. 2:7-11. Oscar (1952) The Design and Analysis of £Xperilncnts. John Wiley and Sons, Inc. New York.