advertisement

Coding: With class variables, the most simple model is Y = + I where m is an overall constant (mean?) and each i is an effect of one level (i) of the class variable. Now suppose you know (or have estimates) + 1 = 12 + 2 = 21 + 3 = 15 From middle school algebra or just common sense, you know that there are lots of solutions for the Greek letter that will work, for example =0, 1=12, 2=21 3=15. The three equations above show that these are really estimates of + 1, + 2, and + 3 . This set of solutions, in which is arbitrarily set to 0, is called “cell means coding” but only takes care of data sets with one classificatory variable. That was one specific solution to the system. When we add the three equations above and divide both 1 1 0 12 sides by 3 we see that 16. Another solution is 1 0 1 1 21 where under 1 1 1 15 2 the restriction that 0 (which means the alphas sum to 0) we have =16, 1=-4, 2=5 3=-1. This has the nice property that is the average of 12, 21, and 15 and the deviations, -4, 5, -1, add up to 0. The restriction that 3=1-2 produces the last row of the matrix above and this particular solution is known as “deviations coding” or sometimes “effects coding.” For estimation, this solution can be obtained by repeating the rows of the left matrix for as many observations as are in each of the three groups and entering the responses Y in the right hand vector. Regression is then used to estimate the parameters. The parameters (or parameter estimates) now are 16, -4 and 5 where 16-4=12, 16+5=21, and 16 -(-4+5)=15 are the expected means for each of the three groups. In other words we see that 1 1 0 16 12 1 0 1 4 21 . 1 1 1 5 15 1 1 0 3 12 In GLM coding, the equation for the particular solution becomes 1 0 1 1 3 21 and the 1 0 0 15 3 2 1 1 0 15 12 solution vector entries are 15, -3, and 6 because 1 0 1 3 21 . In GLM coding, the arbitrary 1 0 0 6 15 assumption that 3 = 0 renders the other parameters estimable as can be seen by setting 3 to 0 everywhere in the parameter vector. In other words because +3 is (or is estimating) 12, the (arbitrary) assumption that 3 =0 (a fourth equation in the 4 unknowns) means that we know what is (or that is estimable in the case of data).