Topic 21: ANOVA and Linear Regression Outline • Review cell means and factor effects models • Relationship between factor effects constraint and explanatory variables Cell Means Model • Yij = μi + εij – where μi is the theoretical mean or expected value of all observations at level i – The εij are iid N(0, σ2) – Yij ~N(μi, σ2), independent Factor Effects Model • A reparameterization of the cell means model • Useful way at looking at more complicated models • Null hypotheses are easier to state • Yij = μ + i + εij – the εij are iid N(0, σ2) Parameters • The cell means model has r+1 parameters – r μ’s and σ2 • The factor effects model has r+2 parameters – μ, the r ’s, and σ2 • Build restriction on ’s in factor effects model to remove one degree of freedom (e.g., Σi i = 0 or r = 0) Regression Approach • We can use multiple regression to reproduce the results based on the factor effects model • Yij = μ + i + eij and we will restrict Σi i = 0 Coding for Explanatory Variables • Σi i = 0 implies r= -1- 2-…-r-1 • Due to restriction, i = 1 to r-1 columns • Xij = 1 if Y is observation from level i = -1 if Y is observation at level r = 0 if Y is from any other level KNNL Example • Recall KNNL p 687 from Topic 20 • It is a bit messy because ni = 5, 5, 4, 5 • The grand mean is not necessarily the same as the mean of the group means (i.e., μ = (Σiniμi)/nT ) • We will calculate these two values • You will have an easier example in the homework (ni is constant) where they are the same value Means proc means data=a1 noprint; class design; var cases; output out=a2 mean=mclass; run; proc print data=a2; run; Output Obs des _TYPE_ 1 . 0 2 1 1 3 2 1 4 3 1 5 4 1 _FREQ_ 19 5 5 4 5 mclass 18.6316 14.6000 13.4000 19.5000 27.2000 Grand sample mean…not the average of the four trt sample means shown below it The mean of the means proc means data=a2 mean; where _TYPE_ eq 1; var mclass; run; Output The MEANS Procedure Analysis Variable : mclass Mean ƒƒƒƒƒƒƒƒƒƒƒƒ 18.6750000 ƒƒƒƒƒƒƒƒƒƒƒƒ Not a big difference from grand sample mean in this example Generate explanatory variables for REG data a1; set a1; x1=(design eq 1)-(design eq 4); x2=(design eq 2)-(design eq 4); x3=(design eq 3)-(design eq 4); proc print data=a1; run; Output Obs cases design 1 11 1 x1 x2 x3 1 0 0 6 12 2 0 1 0 11 23 3 0 0 1 15 27 4 -1 -1 -1 Output with parameters des x1 x2 x3 1 1 0 0 2 0 1 0 3 0 0 1 4 -1 -1 -1 m is the result of including an intercept μ + 1 μ + 2 μ + 3 μ - 1 - 2 - 3 Run the regression proc reg data=a1; model cases=x1 x2 x3; run; Output Anova Source DF SS MS F P Model 3 588 196 18.59 <.0001 Error 15 158 10 Total 18 746 Same ANOVA table as GLM Regression coefficients Var Est Int 18.675 mean of the means x1 -4.075 Y1./n1 - Int x2 -5.275 Y2./n2 - Int x3 0.825 Y3./n3 - Int Get same trt means 18.675-4.075 = 14.6 18.675-5.275 = 13.4 18.675+0.825 = 19.5 18.675+4.075+5.275-0.825=27.2 Last slide • Read KNNL Chapter 16 • We used program topic21.sas to generate the output for today