Laboratory for Topics 10 & 11

advertisement
PLS205
Lab 7
February 20, 2014
Laboratory Topics 10 & 11
∙ Random models (EMS tables by hand and in SAS)
∙ Generating random datasets to explore EMS tables
∙ Unbalanced designs and SS types I - IV
Random Models in SAS
Another reason why we have not encouraged the use of the more limited Proc ANOVA is because it
lacks the ability to deal with random effects. For random and mixed models, one can use Proc GLM [or
the even more general Proc Mixed, which we will not cover in this class]. The expected mean squares
(EMS's) for the model effects are generated via the Random statement in Proc GLM. The Random
statement is used to designate those model effects that are random, and it must appear after the Model
statement. The generic syntax:
Random effects / test;
And an example:
Proc GLM;
Class A B C;
Model Response = A|B|C;
Random B A*B B*C / test;
With no specified options, the Random statement alone will produce a table of EMS's for each effect in
the model. When you include the "/ test" option, the Random statement tells SAS to determine the form
of the appropriate F test for each effect using an approach similar to the Satterthwaite approximation
(Topic 10.5). SAS will then generate F ratios (approximated, when necessary) and p-values for each test.
Very handy, needless to say.
It is now commonly accepted that an interaction should be treated as a random effect if any one of the
effects involved in the interaction is random. However, Proc GLM does not operate under this
presumption; therefore, it is your responsibility to explicitly designate main random effects and their
interactions as random using the Random statement.
In other words, if B is a random effect, the following code is incorrect:
Proc GLM;
Class A B;
Model Response = A|B;
Random B / test;
because it does not explicitly declare the A*B interaction to be random. The correct code in this case is:
Proc GLM;
Class A B;
Model Response = A|B;
Random B A*B / test;
PLS205 2014
1
Lab Topics 10-11
Example 7.1
Random effects model [Lab7ex1.sas]
In this example, we're using the data set from Lab 6 (Example 2), a three-way factorial CRD with one
replication per three-way combination of factors. As before, there are not enough degrees of freedom to
include the three-way interaction in the model. Not as before, we are going to treat all three factors as
random effects. Recall the general linear model for this design:
Yijk     i   j   k  ( )ij  ( )ik  (  ) jk   ijk
The table of expected mean squares (EMS) for this model is shown below:
Source
Expected Mean Squares
F
A
σ 2ε + c∙σ 2αβ + b∙σ 2αγ + bc∙σ 2α
MSA  MSE
MS ( AB)  MS ( AC )
B
σ 2ε + a∙σ 2γβ + c∙σ 2αβ + ac∙σ 2β
MSB  MSE
MS ( AB)  MS ( BC )
C
σ 2ε + b∙σ 2αγ + a∙σ 2γβ + ab∙σ 2γ
MSC  MSE
MS ( AC )  MS ( BC )
AB
σ 2ε + c∙σ 2αβ
MS ( AB)
MSE
AC
σ2ε + b∙σ 2αγ
MS ( AC )
MSE
BC
σ2ε + a∙σ 2βγ
MS ( BC )
MSE
Error
σ 2ε
With a = 3, b = 5, and c = 2.
PLS205 2014
2
Lab Topics 10-11
The corresponding SAS code
Data Interac3;
Input A B C Response @@;
Cards;
1 1 1 61 2 1 1 38 3 1 1 81 1 1 2 31 2 1 2 27 3 1 2 113
1 2 1 39 2 2 1 61 3 2 1 49 1 2 2 68 2 2 2 103 3 2 2 143
1 3 1 121 2 3 1 82 3 3 1 41 1 3 2 78 2 3 2 57 3 3 2 63
1 4 1 79 2 4 1 68 3 4 1 59 1 4 2 122 2 4 2 127 3 4 2 167
1 5 1 91 2 5 1 31 3 5 1 61 1 5 2 92 2 5 2 43 3 5 2 128
;
Proc GLM Data = Interac3;
Class A B C;
Model Response = A|B|C@2;
Random A|B|C@2 / test;
* Performs approximate hypothesis tests (i.e. F tests) for each effect specified in
the model, using appropriate error terms as determined by the EMS's;
Run;
Quit;
The Output
Source
DF
Sum of
Squares
Model
Error
Corrected Total
21
8
29
38710.26667
23.20000
38733.46667
A
B
A*B
C
A*C
B*C
F Value
Pr > F
1843.34603
2.90000
635.64
<.0001
R-Square
Coeff Var
Root MSE
Response Mean
0.999401
2.198286
1.702939
77.46667
DF
Type III SS
Mean Square
F Value
Pr > F
2
4
8
1
2
4
3599.266667
6423.133333
9675.066667
5333.333333
5692.466667
7987.000000
1799.633333
1605.783333
1209.383333
5333.333333
2846.233333
1996.750000
620.56
553.72
417.03
1839.08
981.46
688.53
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
!! WRONG !!
Source
Mean Square
NOTE: The above table incorrectly uses the MSE (2.9) as the denominator for all F tests!
This incorrect ANOVA table is immediately followed, however, by a table of EMS's that are needed to
construct the correct F tests. SAS does all this as a result of the Random statement in Proc GLM.
Source
Type III Expected Mean Square
A
B
A*B
C
A*C
B*C
Var(Error)
Var(Error)
Var(Error)
Var(Error)
Var(Error)
Var(Error)
+
+
+
+
+
+
5
3
2
3
5
3
.
Var(A*C) + 2 Var(A*B) + 10 Var(A)
Var(B*C) + 2 Var(A*B) + 6 Var(B)
Var(A*B)
Var(B*C) + 5 Var(A*C) + 15 Var(C)
Var(A*C)
Var(B*C)
SAS uses these EMS's to carry out approximate F tests:
Source
A
PLS205 2014
DF
2
Type III SS
3599.266667
3
Mean Square
1799.633333
F Value
0.44
Pr > F
0.6704
Lab Topics 10-11
Error
3.8798
15724
Error: MS(A*B) + MS(A*C) - MS(Error)
Source
DF
Type III SS
B
4
6423.133333
Error
8.6986
27864
Error: MS(A*B) + MS(B*C) - MS(Error)
4052.716667
Mean Square
1605.783333
3203.233333
F Value
0.50
Pr > F
0.7362
Type III SS
9675.066667
5692.466667
7987.000000
23.200000
Mean Square
1209.383333
2846.233333
1996.750000
2.900000
F Value
417.03
981.46
688.53
Pr > F
<.0001
<.0001
<.0001
Source
DF
Type III SS
C
1
5333.333333
Error
4.6414
22465
Error: MS(A*C) + MS(B*C) - MS(Error)
Mean Square
5333.333333
4840.083333
F Value
1.10
Pr > F
0.3454
Source
A*B
A*C
B*C
Error: MS(Error)
DF
8
2
4
8
Notice how SAS uses the correct error terms to calculate the approximate F values in each case.
The specific error term is listed beneath each result and is based on the EMS table.
The non-integer error df's are approximated using Satterthwaite's method.
Generating Random Datasets in SAS
In the following example, you will learn a method for generating a dataset in SAS [Disclaimer: We do not
recommend trying to get published using SAS-generated datasets, though it's probably happened]. This is
a nice little program for exploring how the EMS's change with different combinations of effects (random
and fixed).
Example 7.2
[Lab7ex2.sas]
Data EMSPlay;
Do A = 1 to 3;
Do B = 1 to 5;
Do C = 1 to 2;
Response = 4*RanNor(0)+15;
Output;
End;
End;
End;
Proc Print;
Proc GLM Data = EMSPlay;
Class A B C;
Model Response = A|B|C@2;
Random A B A*B A*C B*C / test;
Run;
Quit;
PLS205 2014
* Picks RANdom numbers from a NORmal;
* distribution with mean 15 and stdev 4;
* The model if C were a fixed effect;
4
Lab Topics 10-11
The EMS table
Source
Type III Expected Mean Square
A
B
A*B
C
A*C
B*C
Var(Error)
Var(Error)
Var(Error)
Var(Error)
Var(Error)
Var(Error)
+
+
+
+
+
+
5
3
2
3
5
3
Var(A*C) + 2 Var(A*B) + 10 Var(A)
Var(B*C) + 2 Var(A*B) + 6 Var(B)
Var(A*B)
Var(B*C) + 5 Var(A*C) + Q(C)
Var(A*C)
Var(B*C)
Now, since C is a fixed effect, we do not include it in the Random statement; but its interactions with
random factors are included (A*C and B*C). Notice how, in the EMS expressions, SAS designates the
fixed effect of C as "Q(C)" rather than as 15 Var(C). Don't be thrown by this. Since C is a fixed effect
(i.e. the levels of C were not randomly sampled from a normal population), it technically has no variance,
though the "fixed effect" of C has a computational form that is identical to that for variances, namely:
Q(C )  r 
 (
.. k
 ... ) 2
c 1
This formula is analogous to all the MS formulae you've seen until now. You can determine the value of
the leading coefficient (r) in various ways:
1. Just thinking about it…
The mean of each level of C is found by taking the average of 15 numbers (3 levels of A, 5 levels of B).
So, to express the variance in a per-observation-basis, one must multiply the "variance" of C by 15.
2. By hand (referencing an EMS table, like the one in example 10.1)…
r  a  b  15
By the way, this is the same coefficient needed to calculate MSC by hand:
 (Y
MSC  a  b 
.. k
 Y... ) 2
c 1
 15 
(64.1 3  77.4 6 ) 2  (90.8  77.4 6 ) 2
 5333.3 3
2 1
Notice this exactly matches MSC in the ANOVA table.
PLS205 2014
5
Lab Topics 10-11
Unbalanced Designs
The following example demonstrates the effect an unbalanced design has on the computation of means
and sums of squares (i.e. we're finally going to look at the difference between Types I and III SS!):
Example 7.3
[Lab7ex3.sas]
Data SSBonanza;
Input A B Response @@;
Cards;
1 1 5 1 1 6
1 2 2 1 2 3 1 2 5 1 2 6
1 3 3
1 2 7
2 1 2 2 1 3
2 2 8 2 2 8 2 2 9
2 3 4 2 3 4 2 3 6 2 3 6 2 3 7
Proc GLM Data = SSBonanza;
Class A B;
Model Response = A|B / SS1 SS2 SS3 SS4;
* Tells SAS to generate SS Types I-IV;
Means A B;
* Generates normal arithmetic means;
LSMeans A B / pdiff lines;
* Generates least-square means. The PDIFF option
requests p-values for all pairwise comparisons
with H0: LSμi = LSμj … but no error control!;
LSMeans A B / pdiff Adjust = Tukey lines;* Controls error. Can also do Dunnett, etc;
Run;
Quit;
Output
Source
DF
Sum of
Squares
Model
Error
Corrected Total
5
12
17
51.04444444
26.06666667
77.11111111
Source
A
B
A*B
Source
A
B
A*B
Source
A
B
A*B
Source
PLS205 2014
Mean Square
F Value
Pr > F
10.20888889
2.17222222
4.70
0.0131
R-Square
Coeff Var
Root MSE
Response Mean
0.661960
28.22258
1.473846
5.222222
DF
Type I SS
Mean Square
F Value
Pr > F
1
2
2
5.13611111
15.68286517
30.22546816
5.13611111
7.84143258
15.11273408
2.36
3.61
6.96
0.1501
0.0592
0.0099
DF
Type II SS
Mean Square
F Value
Pr > F
1
2
2
9.70786517
15.68286517
30.22546816
9.70786517
7.84143258
15.11273408
4.47
3.61
6.96
0.0561
0.0592
0.0099
DF
Type III SS
Mean Square
F Value
Pr > F
1
2
2
3.59186992
21.00074906
30.22546816
3.59186992
10.50037453
15.11273408
1.65
4.83
6.96
0.2227
0.0289
0.0099
DF
Type IV SS
Mean Square
F Value
Pr > F
6
Lab Topics 10-11
A
B
A*B
1
2
2
3.59186992
21.00074906
30.22546816
3.59186992
10.50037453
15.11273408
1.65
4.83
6.96
0.2227
0.0289
0.0099
Wow! Notice how the Interaction SS is the same in the Type I and Type III analyses. This is because the
A*B interaction was included as the last term in the model, so the SS assigned to it was whatever was left
after the main effects of A and B were accounted for. In this case, the Type III SS does a better job of
minimizing the overlap due to the broken orthogonality of the treatments by treating each effect (A, B,
and their interaction) individually as the last effect in the model.
Output of Means
Level of
A
-----------Response---------Mean
Std Dev
N
1
2
8
10
Level of
B
N
1
2
3
4
8
6
4.62500000
5.70000000
1.76776695
2.35937845
-----------Response---------Mean
Std Dev
4.00000000
6.00000000
5.00000000
1.82574186
2.50713268
1.54919334
Output of LSMeans and p-values for unprotected (LSD-like) pair-wise comparisons
Least Squares Means
A
Response
LSMEAN
1
2
4.36666667
5.41111111
H0:LSMean1=
LSMean2
Pr > |t|
0.2227
B
Response
LSMEAN
LSMEAN
Number
1
2
3
4.00000000
6.46666667
4.20000000
1
2
3
NS
Least Squares Means for effect B
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: Response
i/j
1
1
2
3
PLS205 2014
2
0.0192 *
0.0192
0.8579
3
0.8579 NS
0.0376 *
0.0376
7
Lab Topics 10-11
T Comparison Lines for Least Squares Means of B
LS-means with the same
letter are not significantly
different.
Response LSMEAN B
LSMEAN Number
A
6.4666667 2
2
B
4.2000000 3
3
B
4.0000000 1
1
NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons
should be used.
As highlighted above, SAS offers the good advice to use p-values for planned comparisons only because
these particular pair-wise comparisons are being made without any control for MEER. In other words, it
is best to perform as few of these comparisons as absolutely necessary in order to keep the EER to a
minimum.
Output of LSMeans and p-values for protected (Tukey-like) pairwise comparisons
Least Squares Means
Adjustment for Multiple Comparisons: Tukey-Kramer
A
Response
LSMEAN
1
2
4.36666667
5.41111111
H0:LSMean1=
LSMean2
Pr > |t|
0.2227
Least Squares Means
Adjustment for Multiple Comparisons: Tukey-Kramer
B
Response
LSMEAN
LSMEAN
Number
1
2
3
4.00000000
6.46666667
4.20000000
1
2
3
Least Squares Means for effect B
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: Response
i/j
1
1
2
3
PLS205 2014
2
0.0470 *
0.0470
0.9817
3
0.9817 NS
0.0887 NS
0.0887
8
Lab Topics 10-11
Tukey-Kramer Comparison Lines for Least Squares Means of B
LS-means with the same letter
are not significantly different.
B
B
Response LSMEAN B
LSMEAN Number
A
6.4666667 2
2
A
4.2000000 3
3
4.0000000 1
1
Using the more conservative Tukey-Kramer test which controls MEER, we lose significance between
levels 2 and 3 of factor B.
The Take Home Message
For unbalanced designs, use LS Means and Type III SS.
For unbalanced mixed models with crossed factors, it is necessary to use a different SAS
procedure called Proc Mixed (ST&D 411) that will not be covered in this class. The syntax is
similar to Proc GLM, but the output is substantially more complex. Information about Proc
Mixed is available at:
https://jukebox.ucdavis.edu/slc/sasdocs/sashtml/stat/chap41/index.htm
PLS205 2014
9
Lab Topics 10-11
Download