PLS205 Lab 7 February 20, 2014 Laboratory Topics 10 & 11 ∙ Random models (EMS tables by hand and in SAS) ∙ Generating random datasets to explore EMS tables ∙ Unbalanced designs and SS types I - IV Random Models in SAS Another reason why we have not encouraged the use of the more limited Proc ANOVA is because it lacks the ability to deal with random effects. For random and mixed models, one can use Proc GLM [or the even more general Proc Mixed, which we will not cover in this class]. The expected mean squares (EMS's) for the model effects are generated via the Random statement in Proc GLM. The Random statement is used to designate those model effects that are random, and it must appear after the Model statement. The generic syntax: Random effects / test; And an example: Proc GLM; Class A B C; Model Response = A|B|C; Random B A*B B*C / test; With no specified options, the Random statement alone will produce a table of EMS's for each effect in the model. When you include the "/ test" option, the Random statement tells SAS to determine the form of the appropriate F test for each effect using an approach similar to the Satterthwaite approximation (Topic 10.5). SAS will then generate F ratios (approximated, when necessary) and p-values for each test. Very handy, needless to say. It is now commonly accepted that an interaction should be treated as a random effect if any one of the effects involved in the interaction is random. However, Proc GLM does not operate under this presumption; therefore, it is your responsibility to explicitly designate main random effects and their interactions as random using the Random statement. In other words, if B is a random effect, the following code is incorrect: Proc GLM; Class A B; Model Response = A|B; Random B / test; because it does not explicitly declare the A*B interaction to be random. The correct code in this case is: Proc GLM; Class A B; Model Response = A|B; Random B A*B / test; PLS205 2014 1 Lab Topics 10-11 Example 7.1 Random effects model [Lab7ex1.sas] In this example, we're using the data set from Lab 6 (Example 2), a three-way factorial CRD with one replication per three-way combination of factors. As before, there are not enough degrees of freedom to include the three-way interaction in the model. Not as before, we are going to treat all three factors as random effects. Recall the general linear model for this design: Yijk i j k ( )ij ( )ik ( ) jk ijk The table of expected mean squares (EMS) for this model is shown below: Source Expected Mean Squares F A σ 2ε + c∙σ 2αβ + b∙σ 2αγ + bc∙σ 2α MSA MSE MS ( AB) MS ( AC ) B σ 2ε + a∙σ 2γβ + c∙σ 2αβ + ac∙σ 2β MSB MSE MS ( AB) MS ( BC ) C σ 2ε + b∙σ 2αγ + a∙σ 2γβ + ab∙σ 2γ MSC MSE MS ( AC ) MS ( BC ) AB σ 2ε + c∙σ 2αβ MS ( AB) MSE AC σ2ε + b∙σ 2αγ MS ( AC ) MSE BC σ2ε + a∙σ 2βγ MS ( BC ) MSE Error σ 2ε With a = 3, b = 5, and c = 2. PLS205 2014 2 Lab Topics 10-11 The corresponding SAS code Data Interac3; Input A B C Response @@; Cards; 1 1 1 61 2 1 1 38 3 1 1 81 1 1 2 31 2 1 2 27 3 1 2 113 1 2 1 39 2 2 1 61 3 2 1 49 1 2 2 68 2 2 2 103 3 2 2 143 1 3 1 121 2 3 1 82 3 3 1 41 1 3 2 78 2 3 2 57 3 3 2 63 1 4 1 79 2 4 1 68 3 4 1 59 1 4 2 122 2 4 2 127 3 4 2 167 1 5 1 91 2 5 1 31 3 5 1 61 1 5 2 92 2 5 2 43 3 5 2 128 ; Proc GLM Data = Interac3; Class A B C; Model Response = A|B|C@2; Random A|B|C@2 / test; * Performs approximate hypothesis tests (i.e. F tests) for each effect specified in the model, using appropriate error terms as determined by the EMS's; Run; Quit; The Output Source DF Sum of Squares Model Error Corrected Total 21 8 29 38710.26667 23.20000 38733.46667 A B A*B C A*C B*C F Value Pr > F 1843.34603 2.90000 635.64 <.0001 R-Square Coeff Var Root MSE Response Mean 0.999401 2.198286 1.702939 77.46667 DF Type III SS Mean Square F Value Pr > F 2 4 8 1 2 4 3599.266667 6423.133333 9675.066667 5333.333333 5692.466667 7987.000000 1799.633333 1605.783333 1209.383333 5333.333333 2846.233333 1996.750000 620.56 553.72 417.03 1839.08 981.46 688.53 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 !! WRONG !! Source Mean Square NOTE: The above table incorrectly uses the MSE (2.9) as the denominator for all F tests! This incorrect ANOVA table is immediately followed, however, by a table of EMS's that are needed to construct the correct F tests. SAS does all this as a result of the Random statement in Proc GLM. Source Type III Expected Mean Square A B A*B C A*C B*C Var(Error) Var(Error) Var(Error) Var(Error) Var(Error) Var(Error) + + + + + + 5 3 2 3 5 3 . Var(A*C) + 2 Var(A*B) + 10 Var(A) Var(B*C) + 2 Var(A*B) + 6 Var(B) Var(A*B) Var(B*C) + 5 Var(A*C) + 15 Var(C) Var(A*C) Var(B*C) SAS uses these EMS's to carry out approximate F tests: Source A PLS205 2014 DF 2 Type III SS 3599.266667 3 Mean Square 1799.633333 F Value 0.44 Pr > F 0.6704 Lab Topics 10-11 Error 3.8798 15724 Error: MS(A*B) + MS(A*C) - MS(Error) Source DF Type III SS B 4 6423.133333 Error 8.6986 27864 Error: MS(A*B) + MS(B*C) - MS(Error) 4052.716667 Mean Square 1605.783333 3203.233333 F Value 0.50 Pr > F 0.7362 Type III SS 9675.066667 5692.466667 7987.000000 23.200000 Mean Square 1209.383333 2846.233333 1996.750000 2.900000 F Value 417.03 981.46 688.53 Pr > F <.0001 <.0001 <.0001 Source DF Type III SS C 1 5333.333333 Error 4.6414 22465 Error: MS(A*C) + MS(B*C) - MS(Error) Mean Square 5333.333333 4840.083333 F Value 1.10 Pr > F 0.3454 Source A*B A*C B*C Error: MS(Error) DF 8 2 4 8 Notice how SAS uses the correct error terms to calculate the approximate F values in each case. The specific error term is listed beneath each result and is based on the EMS table. The non-integer error df's are approximated using Satterthwaite's method. Generating Random Datasets in SAS In the following example, you will learn a method for generating a dataset in SAS [Disclaimer: We do not recommend trying to get published using SAS-generated datasets, though it's probably happened]. This is a nice little program for exploring how the EMS's change with different combinations of effects (random and fixed). Example 7.2 [Lab7ex2.sas] Data EMSPlay; Do A = 1 to 3; Do B = 1 to 5; Do C = 1 to 2; Response = 4*RanNor(0)+15; Output; End; End; End; Proc Print; Proc GLM Data = EMSPlay; Class A B C; Model Response = A|B|C@2; Random A B A*B A*C B*C / test; Run; Quit; PLS205 2014 * Picks RANdom numbers from a NORmal; * distribution with mean 15 and stdev 4; * The model if C were a fixed effect; 4 Lab Topics 10-11 The EMS table Source Type III Expected Mean Square A B A*B C A*C B*C Var(Error) Var(Error) Var(Error) Var(Error) Var(Error) Var(Error) + + + + + + 5 3 2 3 5 3 Var(A*C) + 2 Var(A*B) + 10 Var(A) Var(B*C) + 2 Var(A*B) + 6 Var(B) Var(A*B) Var(B*C) + 5 Var(A*C) + Q(C) Var(A*C) Var(B*C) Now, since C is a fixed effect, we do not include it in the Random statement; but its interactions with random factors are included (A*C and B*C). Notice how, in the EMS expressions, SAS designates the fixed effect of C as "Q(C)" rather than as 15 Var(C). Don't be thrown by this. Since C is a fixed effect (i.e. the levels of C were not randomly sampled from a normal population), it technically has no variance, though the "fixed effect" of C has a computational form that is identical to that for variances, namely: Q(C ) r ( .. k ... ) 2 c 1 This formula is analogous to all the MS formulae you've seen until now. You can determine the value of the leading coefficient (r) in various ways: 1. Just thinking about it… The mean of each level of C is found by taking the average of 15 numbers (3 levels of A, 5 levels of B). So, to express the variance in a per-observation-basis, one must multiply the "variance" of C by 15. 2. By hand (referencing an EMS table, like the one in example 10.1)… r a b 15 By the way, this is the same coefficient needed to calculate MSC by hand: (Y MSC a b .. k Y... ) 2 c 1 15 (64.1 3 77.4 6 ) 2 (90.8 77.4 6 ) 2 5333.3 3 2 1 Notice this exactly matches MSC in the ANOVA table. PLS205 2014 5 Lab Topics 10-11 Unbalanced Designs The following example demonstrates the effect an unbalanced design has on the computation of means and sums of squares (i.e. we're finally going to look at the difference between Types I and III SS!): Example 7.3 [Lab7ex3.sas] Data SSBonanza; Input A B Response @@; Cards; 1 1 5 1 1 6 1 2 2 1 2 3 1 2 5 1 2 6 1 3 3 1 2 7 2 1 2 2 1 3 2 2 8 2 2 8 2 2 9 2 3 4 2 3 4 2 3 6 2 3 6 2 3 7 Proc GLM Data = SSBonanza; Class A B; Model Response = A|B / SS1 SS2 SS3 SS4; * Tells SAS to generate SS Types I-IV; Means A B; * Generates normal arithmetic means; LSMeans A B / pdiff lines; * Generates least-square means. The PDIFF option requests p-values for all pairwise comparisons with H0: LSμi = LSμj … but no error control!; LSMeans A B / pdiff Adjust = Tukey lines;* Controls error. Can also do Dunnett, etc; Run; Quit; Output Source DF Sum of Squares Model Error Corrected Total 5 12 17 51.04444444 26.06666667 77.11111111 Source A B A*B Source A B A*B Source A B A*B Source PLS205 2014 Mean Square F Value Pr > F 10.20888889 2.17222222 4.70 0.0131 R-Square Coeff Var Root MSE Response Mean 0.661960 28.22258 1.473846 5.222222 DF Type I SS Mean Square F Value Pr > F 1 2 2 5.13611111 15.68286517 30.22546816 5.13611111 7.84143258 15.11273408 2.36 3.61 6.96 0.1501 0.0592 0.0099 DF Type II SS Mean Square F Value Pr > F 1 2 2 9.70786517 15.68286517 30.22546816 9.70786517 7.84143258 15.11273408 4.47 3.61 6.96 0.0561 0.0592 0.0099 DF Type III SS Mean Square F Value Pr > F 1 2 2 3.59186992 21.00074906 30.22546816 3.59186992 10.50037453 15.11273408 1.65 4.83 6.96 0.2227 0.0289 0.0099 DF Type IV SS Mean Square F Value Pr > F 6 Lab Topics 10-11 A B A*B 1 2 2 3.59186992 21.00074906 30.22546816 3.59186992 10.50037453 15.11273408 1.65 4.83 6.96 0.2227 0.0289 0.0099 Wow! Notice how the Interaction SS is the same in the Type I and Type III analyses. This is because the A*B interaction was included as the last term in the model, so the SS assigned to it was whatever was left after the main effects of A and B were accounted for. In this case, the Type III SS does a better job of minimizing the overlap due to the broken orthogonality of the treatments by treating each effect (A, B, and their interaction) individually as the last effect in the model. Output of Means Level of A -----------Response---------Mean Std Dev N 1 2 8 10 Level of B N 1 2 3 4 8 6 4.62500000 5.70000000 1.76776695 2.35937845 -----------Response---------Mean Std Dev 4.00000000 6.00000000 5.00000000 1.82574186 2.50713268 1.54919334 Output of LSMeans and p-values for unprotected (LSD-like) pair-wise comparisons Least Squares Means A Response LSMEAN 1 2 4.36666667 5.41111111 H0:LSMean1= LSMean2 Pr > |t| 0.2227 B Response LSMEAN LSMEAN Number 1 2 3 4.00000000 6.46666667 4.20000000 1 2 3 NS Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Response i/j 1 1 2 3 PLS205 2014 2 0.0192 * 0.0192 0.8579 3 0.8579 NS 0.0376 * 0.0376 7 Lab Topics 10-11 T Comparison Lines for Least Squares Means of B LS-means with the same letter are not significantly different. Response LSMEAN B LSMEAN Number A 6.4666667 2 2 B 4.2000000 3 3 B 4.0000000 1 1 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. As highlighted above, SAS offers the good advice to use p-values for planned comparisons only because these particular pair-wise comparisons are being made without any control for MEER. In other words, it is best to perform as few of these comparisons as absolutely necessary in order to keep the EER to a minimum. Output of LSMeans and p-values for protected (Tukey-like) pairwise comparisons Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer A Response LSMEAN 1 2 4.36666667 5.41111111 H0:LSMean1= LSMean2 Pr > |t| 0.2227 Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer B Response LSMEAN LSMEAN Number 1 2 3 4.00000000 6.46666667 4.20000000 1 2 3 Least Squares Means for effect B Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: Response i/j 1 1 2 3 PLS205 2014 2 0.0470 * 0.0470 0.9817 3 0.9817 NS 0.0887 NS 0.0887 8 Lab Topics 10-11 Tukey-Kramer Comparison Lines for Least Squares Means of B LS-means with the same letter are not significantly different. B B Response LSMEAN B LSMEAN Number A 6.4666667 2 2 A 4.2000000 3 3 4.0000000 1 1 Using the more conservative Tukey-Kramer test which controls MEER, we lose significance between levels 2 and 3 of factor B. The Take Home Message For unbalanced designs, use LS Means and Type III SS. For unbalanced mixed models with crossed factors, it is necessary to use a different SAS procedure called Proc Mixed (ST&D 411) that will not be covered in this class. The syntax is similar to Proc GLM, but the output is substantially more complex. Information about Proc Mixed is available at: https://jukebox.ucdavis.edu/slc/sasdocs/sashtml/stat/chap41/index.htm PLS205 2014 9 Lab Topics 10-11