Topic 21: ANOVA and Linear Regression

advertisement
Topic 21: ANOVA and
Linear Regression
Outline
• Review cell means and factor
effects models
• Relationship between factor
effects constraint and
explanatory variables
Cell Means Model
• Yij = μi + εij
– where μi is the theoretical mean or
expected value of all observations
at level i
– The εij are iid N(0, σ2)
– Yij ~N(μi, σ2), independent
Factor Effects Model
• A reparameterization of the cell means
model
• Useful way at looking at more
complicated models
• Null hypotheses are easier to state
• Yij = μ + i + εij
– the εij are iid N(0, σ2)
Parameters
• The cell means model has r+1 parameters
– r μ’s and σ2
• The factor effects model has r+2
parameters
– μ, the r ’s, and σ2
• Build restriction on ’s in factor effects
model to remove one degree of freedom
(e.g., Σi i = 0 or r = 0)
Regression Approach
• We can use multiple regression to
reproduce the results based on the
factor effects model
• Yij = μ + i + eij
and we will restrict Σi i = 0
Coding for Explanatory
Variables
• Σi i = 0 implies r= -1- 2-…-r-1
• Due to restriction, i = 1 to r-1 columns
• Xij = 1 if Y is observation from level i
= -1 if Y is observation at level r
= 0 if Y is from any other level
KNNL Example
• Recall KNNL p 687 from Topic 20
• It is a bit messy because ni = 5, 5, 4, 5
• The grand mean is not necessarily
the same as the mean of the group
means (i.e., μ = (Σiniμi)/nT )
• We will calculate these two values
• You will have an easier example in
the homework (ni is constant) where
they are the same value
Means
proc means data=a1 noprint;
class design;
var cases;
output out=a2 mean=mclass;
run;
proc print data=a2;
run;
Output
Obs des _TYPE_
1
.
0
2
1
1
3
2
1
4
3
1
5
4
1
_FREQ_
19
5
5
4
5
mclass
18.6316
14.6000
13.4000
19.5000
27.2000
Grand sample mean…not the average of the
four trt sample means shown below it
The mean of the means
proc means data=a2 mean;
where _TYPE_ eq 1;
var mclass;
run;
Output
The MEANS Procedure
Analysis Variable : mclass
Mean
ƒƒƒƒƒƒƒƒƒƒƒƒ
18.6750000
ƒƒƒƒƒƒƒƒƒƒƒƒ
Not a big
difference from
grand sample
mean in this
example
Generate explanatory
variables for REG
data a1; set a1;
x1=(design eq 1)-(design eq 4);
x2=(design eq 2)-(design eq 4);
x3=(design eq 3)-(design eq 4);
proc print data=a1;
run;
Output
Obs cases design
1
11
1
x1 x2 x3
1 0 0
6
12
2
0
1
0
11
23
3
0
0
1
15
27
4
-1 -1 -1
Output with parameters
des x1 x2 x3
1
1 0 0
2
0 1 0
3
0 0 1
4
-1 -1 -1
m is the result
of including
an intercept
μ + 1
μ + 2
μ + 3
μ - 1 - 2 - 3
Run the regression
proc reg data=a1;
model cases=x1 x2 x3;
run;
Output Anova
Source DF SS MS
F
P
Model
3 588 196 18.59 <.0001
Error 15 158 10
Total 18 746
Same ANOVA table as GLM
Regression coefficients
Var
Est
Int 18.675 mean of the means
x1 -4.075 Y1./n1 - Int
x2 -5.275 Y2./n2 - Int
x3
0.825 Y3./n3 - Int
Get same
trt means
18.675-4.075 = 14.6
18.675-5.275 = 13.4
18.675+0.825 = 19.5
18.675+4.075+5.275-0.825=27.2
Last slide
• Read KNNL Chapter 16
• We used program topic21.sas to
generate the output for today
Download