ppt

advertisement
Analysis of Covariance
(Chapter 16)
• A procedure for comparing treatment means that incorporates
information on a quantitative explanatory variable, X, sometimes
called a covariate.
• The procedure, ANCOVA, is a combination of ANOVA with
regression.
23-1
Example: Calf Weight Gain
• An animal scientist wishes to examine the impact of a pair of new dietary
supplements on calf weight gain (response).
• Three treatments are defined: standard diet, standard diet + supplement Q,
and standard diet + supplement R.
• All new calves from a large herd are available for use as study units. She
selects 30 calves for study. Calves are randomized to the three diets at
random (completely randomized design).
• Initial weights are recorded, then calves are placed on the diets. At the
end of four weeks the final weight is taken and weight gain is computed.
• Simple analysis of variance and associated multiple comparisons
procedures indicate no significant differences in weight gain between the two
supplementary diets, but big differences between the supplemental diets and
the standard diet.
• Is this the end of the story? …
23-2
ANOVA Results
Average Weight Gain
(Response g/day)
xx
x xx x x xx
Standard
Diet
xxx
xxx x xx
xx x
x x xx x x x
+ Supplement Q
+ Supplement R
Simple ANOVA of a one-way classification would suggest no
difference between Supplements Q and R but both different from
Standard diet.
23-3
Initial Weights
Initial Weight
xx x
x xx x x xx
xx
x xxxx x xx
Standard
Diet
+ Supplement Q
x xx
xx xx x x x
+ Supplement R
Plotting of the initial weights by group shows that the groups were not
equal when it came to initial weights.
23-4
Weight Gain to Initial Weight
Standard Diet
Weight (kg)
2
wF
2
w g ain
1
w gain
1
wF
1
wi
w
2
i
age
If animals come into the study at different ages, they have
different initial weights and are at different points on the
growth curve. Expected weight gains will be different
depending on age at entry into study.
23-5
Regression of Initial Weight to Weight Gain
2
w gain
Weight
Gain
(g/day)
(Y)
1
w gain
1
wi
w
2
i
Initial
Weight
(x)
If we disregard the age
of the animal but instead
focus on the initial
weight, we see that there
is a linear relationship
between initial weight
and the weight gain
expected.
23-6
Covariates
Initial weight in the previous example is a covariable or covariate.
A covariate is a disturbing variable (confounder), that is, it is known to have
an effect on the response. Usually, the covariate can be measured but
often we may not be able to control its effect through blocking.
In the EXAMPLE, had the animal scientist known that the calves were very
variable in initial weight (or age), she could have:
•
Created blocks of 3 or 6 equal weight animals, and randomized
treatments to calves within these blocks.
•
This would have entailed some cost in terms of time spent sorting the
calves and then keeping track of block membership over the life of the
study.
•
It was much easier to simply record the calf initial weight and then use
analysis of covariance for the final analysis.
•
In many cases, due to the continuous nature of the covariate, blocking is
just not feasible.
23-7
Expectations under Ho
Under Ho: no treatment
effects.
If all animals had come in with the same initial
weight, All three treatments would produce the
same weight gain.
Expected
Weight
Gain
(g/day)
(Y)
Initial
Weight
(x)
Average Weight Animal
23-8
Expectations under HA
Under Ha: Significant
Treatment
effects
+ Supplement Q (q)
+ Supplement R (r)
Standard Diet (c)
WGQ
WGR
WGs
Different treatments
produce different weight
gains for animals of the
same initial weight.
Expected
Weight
Gain
(g/day)
(Y)
Average Weight Animal
Initial
Weight
(x)
23-9
Different Initial Weights
Under Ho: no treatment
effects.
If the average initial weights in the
treatment groups differ, the observed
weight gains will be different, even if
treatments have no effect.
WGR
WGs
WGQ
Expected
Weight
Gain
(g/day)
(Y)
cc c
qq
r rr
c cc c c cc
q qqqq q qq
rr rr r r r
Initial
Weight
(x)
23-10
Observed Responses under HA
Suppose now that different supplements actually do increase weight gain.
This translates to animals in different treatment groups following different,
but parallel regression lines with initial weight.
+ Supplement Q
+ Supplement R
WGR
WGQ
WGs
Weight
Gain
(g/day)
(Y)
q
rr r
r rr r
rr
r
q
qq q
q q
c
q q
c cc
q
c cc c
c c
cc c
qq
r rr
c cc c c cc
q qqqq q qq
rr rr r r r
Standard Diet
Under HA: Significant
Treatment
effects
Initial
Weight
(x)
What difference in weight gain is due to Initial weight and what is due to Treatment?
23-11
Observed Group Means
Weight
Gain
(g/day)
(Y)
Simple one-way classification ANOVA
(without accounting for initial weight) gives
us the wrong answer!
+ Supplement Q
+ Supplement R
yr
yq
yc
Unadjusted
treatment means
q
r rr
rr
r r r
rr
Standard Diet
q
qq q
q q
c c
q
q
c
q
c
c
c
c
c
cc
cc c
qq
r rr
c cc c c cc
q qqqq q qq
rr rr r r r
Initial
Weight
(x)
23-12
Predicted Average Responses
Weight
Gain
(g/day)
(Y)
y q | X  x
y r | X  x
y c | X  x
Adjusted
treatment means
Expected weight gain is
computed for treatments for the
average initial weight and
comparisons are then made.
+ Supplement Q
+ Supplement R
r rr
rr
r r r
rr
q
Standard Diet
q
qq q
q q
c c
q
q
c
q
c
c
c
c
c
cc
cc c
qq
r rr
c cc c c cc
q qqqq q qq
rr rr r r r
X  x
Initial
Weight
(x)
23-13
ANCOVA: Objectives
The objective of an analysis of covariance is to compare the
treatment means after adjusting for differences among the
treatments due to differences in the covariate levels for the
treatments groups.
The analysis proceeds by combining a regression model with
an analysis of variance model.
23-14
Model
E ( y ij ) = m+ a i + b x ij
The ai, i=1,…,t, are estimates of how each of the t treatments modifies the
overall mean response. (The index j=1,…,n, runs over the n replicates
for each treatment.)
The slope coefficient, , is a measure of how the average response changes
as the value of the covariate changes.
The analysis proceeds by fitting a linear regression model with dummy
variables to code for the different treatment levels.
23-15
A Priori Assumptions
The covariate is related to the response, and can account for variation
in the response.
Check with a scatterplot of Y vs. X.
The covariate is NOT related to the treatments.
If Y is related to X, then the variance of the treatment differences is
increased relative to that obtained from an ANOVA model
without X, which results in a loss of precision.
The treatment’s regression equations are linear in the
covariate.
Check with a scatterplot of Y vs. X, for each treatment. Non-linearity
can be accommodated (e.g. polynomial terms, transforms), but
analysis may be more complex.
The regression lines for the different treatments are parallel.
This means there is only one slope in the Y vs. X plots. Non-parallel
lines can be accommodated, but this complicates the analysis
23-16
since differences in treatments will now depend on the value
of
Example
Four different formulations of an industrial glue are
being tested. The tensile strength (response) of the
glue is known to be related to the thickness as
applied. Five observations on strength (Y) in
pounds, and thickness (X) in 0.01 inches are made
for each formulation.
Here:
• There are t=4 treatments (formulations of glue).
• Covariate X is thickness of applied glue.
• Each treatment is replicated n=5 times at different
values of X.
Formulation
Strength
Thickness
1
46.5
13
1
45.9
14
1
49.8
12
1
46.1
12
1
44.3
14
2
48.7
12
2
49.0
10
2
50.1
11
2
48.5
12
2
45.2
14
3
46.3
15
3
47.1
14
3
48.9
11
3
48.2
11
3
50.3
10
4
44.7
16
4
43.0
15
4
51.0
10
4
48.1
12
4
46.8
11
23-17
Formulation Profiles
52.0
48.0
Strength
(Y)
44.0
40.0
16
15
10
12
11
Thickness (X)
Form_1
Form_2
Form_3
Form_4
23-18
SAS Program
The basic model is a combination
of regression and one-way
classification.
data glue;
input Formulation Strength Thickness;
datalines;
1
46.5
13
1
45.9
14
1
49.8
12
1
46.1
12
1
44.3
14
2
48.7
12
2
49.0
10
2
50.1
11
2
48.5
12
2
45.2
14
3
46.3
15
3
47.1
14
3
48.9
11
3
48.2
11
3
50.3
10
4
44.7
16
4
43.0
15
4
51.0
10
4
48.1
12
4
46.8
11
;
run;
proc glm;
class formulation;
model strength = thickness formulation
/ solution ;
lsmeans formulation / stderr pdiff;
run;
23-19
Output: Use Type III SS to test significance of each variable
MSE
Source
Model
Error
Corrected Total
DF
4
15
19
R-Square
0.730636
Squares
66.31065753
24.44684247
90.75750000
Coeff Var
2.691897
Mean Square
16.57766438
1.62978950
Root MSE
1.276632
F Value
10.17
Pr > F
0.0003
Regression on
thickness is
significant.
No formulation
differences.
Strength Mean
47.42500
Source
Thickness
Formulation
DF
1
3
Type I SS
63.50120135
2.80945618
Mean Square
63.50120135
0.93648539
F Value
38.96
0.57
Pr > F
<.0001
0.6405
Source
Thickness
Formulation
DF
1
3
Type III SS
53.20115753
2.80945618
Mean Square
53.20115753
0.93648539
F Value
32.64
0.57
Pr > F
<.0001
0.6405
Parameter
Intercept
Thickness
Formulation
Formulation
Formulation
Formulation
Estimate
1
2
3
4
58.93698630
-0.95445205
-0.00910959
0.62554795
0.86732877
0.00000000
B
B
B
B
B
Standard
Error
t Value
Pr > |t|
2.21321008
0.16705494
0.80810401
0.82451389
0.81361075
.
26.63
-5.71
-0.01
0.76
1.07
.
<.0001
<.0001
0.9912
0.4598
0.3033
.
Divide by
MSE to get
mean
squares.
23-20
Least Squares Means
(Adjusted Formulation means computed at the
average value of Thickness [=12.45])
The GLM Procedure
Least Squares Means
Formulation
1
2
3
4
Strength
LSMEAN
Standard
Error
Pr > |t|
LSMEAN
Number
47.0449486
47.6796062
47.9213870
47.0540582
0.5782732
0.5811616
0.5724527
0.5739134
<.0001
<.0001
<.0001
<.0001
1
2
3
4
Least Squares Means for effect Formulation
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: Strength
i/j
1
2
3
4
1
0.4574
0.3011
0.9912
2
0.4574
0.7695
0.4598
3
0.3011
0.7695
4
0.9912
0.4598
0.3033
0.3033
23-21
ANCOVA in Minitab
Stat > ANOVA > General Linear Model …
> Responses: Strength
> Model: Formulation
> Covariates: Thickness
> Options: Adjusted (Type III) Sums of Squares
General Linear Model: Strength versus Formulation
Factor
Formulat
Type Levels Values
fixed
4 1 2 3 4
Source
Thicknes
Formulat
Error
Total
DF
1
3
15
19
Seq SS
63.501
2.809
24.447
90.758
Adj SS
53.201
2.809
24.447
Adj MS
53.201
0.936
1.630
Term
Constant
Thicknes
Formulat
1
2
3
Coef
59.308
-0.9545
SE Coef
2.099
0.1671
T
28.25
-5.71
P
0.000
0.000
-0.3801
0.2546
0.4964
0.5029
0.5062
0.4962
-0.76
0.50
1.00
0.462
0.622
0.333
F
32.64
0.57
Formulation
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
Strength
46.5
45.9
49.8
46.1
44.3
48.7
49.0
50.1
48.5
45.2
46.3
47.1
48.9
48.2
50.3
44.7
43.0
51.0
48.1
46.8
Thickness
13
14
12
12
14
12
10
11
12
14
15
14
11
11
10
16
15
10
12
11
P
0.000
0.640
23-22
Factor Plots… > Main Effects Plot > Formulation
Main Effects Plot - LS Means for Strength
47.9
47.8
Strength
47.7
47.6
47.5
47.4
47.3
47.2
47.1
47.0
1
2
3
4
Formulation
23-23
ANCOVA in R
> glue <- read.table("glue.txt",header=TRUE)
> glue$Formulation <- as.factor(glue$Formulation)
> # fit linear models: full, thickness only, formulation only
> full.lm <- lm(Strength ~ Formulation + Thickness, data=glue)
> thick.lm <- lm(Strength ~ Thickness, data=glue)
> formu.lm <- lm(Strength ~ Formulation, data=glue)
>
> anova(thick.lm,full.lm)
Analysis of Variance Table
Model 1:
Model 2:
Res.Df
1
18
2
15
Strength ~ Thickness
Strength ~ Formulation + Thickness
RSS Df Sum of Sq
F Pr(>F)
27.2563
24.4468 3
2.8095 0.5746 0.6405
Test for
Formulation
differences
> anova(formu.lm,full.lm)
Analysis of Variance Table
Model 1:
Model 2:
Res.Df
1
16
2
15
Strength ~ Formulation
Strength ~ Formulation + Thickness
RSS Df Sum of Sq
F
Pr(>F)
77.648
24.447 1
53.201 32.643 4.105e-05 ***
Test for
significance of
Thickness
23-24
> summary(full.lm)
Call: lm(formula = Strength ~ Formulation + Thickness, data = glue)
Residuals:
Min
1Q
-1.6380 -1.0398
Median
0.1873
3Q
0.6966
Max
2.3255
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 58.92788
2.24551 26.243 5.97e-14 ***
Formulation2 0.63466
0.83193
0.763
0.457
Formulation3 0.87644
0.81840
1.071
0.301
Formulation4 0.00911
0.80810
0.011
0.991
Thickness
-0.95445
0.16706 -5.713 4.11e-05 ***
> summary(thick.lm)
Call: lm(formula = Strength ~ Thickness, data = glue)
Residuals:
Min
1Q
-2.0813 -0.7324
Median
0.1274
3Q
0.9090
R
Full model
(can be
refined by
omitting
formulation)
Max
1.9230
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 59.9294
1.9504 30.726 < 2e-16 ***
Thickness
-1.0044
0.1551 -6.476 4.32e-06 ***
Residual standard error: 1.231 on 18 degrees of freedom
Multiple R-Squared: 0.6997,
Adjusted R-squared: 0.683
F-statistic: 41.94 on 1 and 18 DF, p-value: 4.317e-06
Reduced
model
(formulation
omitted)
23-25
Plot lines for full model; but these can all be replaced by single
line for reduced model (blue).
R
23-26
Check fit of reduced model (with just thickness).
R
23-27
Download