Analysis of Covariance

advertisement
Analysis of Covariance
Allows comparison between groups
allowing for effect of other variate
e.g. comparison of nitrate levels over region
allowing for differences in salinity
Regression with a combination of a
categorical (factor) and continuous
variate
Additive model
simple interpretation
separate effects
1
Example – Mountain Possums
•
•
•
1.
trapped at seven different sites
nine morphometric measurements
gender and age also recorded
Is there any gender difference in total
length?
2. Relationship between total length and head
length - is this the same for males and
females?
Prediction of animal size from skulls
2
Gender difference?
sex
f
m
N
43
61
Mean
87.91
86.51
StDev
4.18
4.34
SE Mean
0.64
0.56
Difference = mu (f) - mu (m)
Estimate for difference: 1.39550
95% CI for difference: (-0.28452, 3.07552)
T-Test of difference = 0 (vs not =):
T-Value = 1.65 P-Value = 0.102 DF = 92
Conclusion?
3
Relationship between lengths
Linear regression:
Predictor
Constant
hdlngth
Coef
9.888
0.83367
SE Coef
8.000
0.08633
T
1.24
9.66
P
0.219
0.000
Fitted Line Plot
totlngth = 9.888 + 0.8337 hdlngth
S
R-Sq
R-Sq(adj)
95
3.13075
47.8%
47.2%
totlngth
90
85
80
75
80
85
90
95
hdlngth
100
105
4
Head length and gender?
The regression equation is
totlngth = 8,26 + 0,864 hdlngth - 2,06 male
Predictor
Constant
hdlngth
male
Coef
8.261
0.86432
-2.0646
SE Coef
7.615
0.08249
0.5957
T
1.08
10.48
-3.47
P
0.281
0.000
0.001
• Both terms are significant
• Additive model
– Total length increases by 0.86 for each unit increase in head
length
– For the same head length males are 2 units shorter
5
Scatterplot of totlngth, main effects vs hdlngth
Variable
totlngth
totlngth
main effects
main effects
95
sex
f
m
f
m
Y-Data
90
85
80
75
80
85
90
95
100
105
hdlngth
Simple interpretation, but not a convincing fit!
Evidence of different slopes INTERACTION
6
Interaction model
• Need product between indicators and continuous
variables (here: m_hdlen = hdlngth * male)
The regression equation is
totlngth =
- 28,7 + 1,27 hdlngth + 45,1 male - 0,511 m_hdlen
Predictor
Constant
hdlngth
male
m_hdlen
Coef
-28.72
1.2657
45.08
-0.5107
SE Coef
15.98
0.1733
18.06
0.1955
T
-1.80
7.30
2.50
-2.61
P
0.075
0.000
0.014
0.010
• All terms significant
7
Use Scatterplot with fitted
lines for groups to display this
Scatterplot of totlngth vs hdlngth
sex
f
m
95
totlngth
90
85
80
75
80
85
90
95
100
105
hdlngth
8
Analysis of covariance
• output as for standard ANOVA:
Source
Regression
Residual Error
Total
DF
3
100
103
SS
1077,41
836,41
1913,83
MS
359,14
8,36
F
42,94
P
0,000
• Total DF: N-1= 104-1 = 103
• Regression DF = p-1, with p = 4 estimated
parameters
• MS = SS/DF
• F = MS (Regr.)/ MS (Residual Err.)
9
Unusual Observations
Obs
31
39
48
51
55
73
82
85
hdlngth
93
85
99
96
103
83
86
90
totlngth
93,000
75,000
85,000
85,000
92,500
82,000
82,000
92,000
Fit
86,419
78,479
90,797
92,275
94,194
78,643
80,125
83,927
SE Fit
0,370
1,364
0,633
0,743
0,993
1,013
1,153
0,483
Residual
6,581
-3,479
-5,797
-7,275
-1,694
3,357
1,875
8,073
St Resid
2,29
-1,36
-2,05
-2,60
-0,62
1,24
0,71
2,83
R
X
R
R
X
X
X
R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large influence.
Interpretation? Look at these observations in scatter plot!
10
Download