R Notes 2011 LAB 3

advertisement
R Notes 2011 LAB 3
Topics covered:
 Orthogonal contrasts
 Class comparisons
 Trend analysis with contrasts and multiple regression
 Multiple mean comparisons (fixed and multiple range tests)
> setwd("G:/Courses/A205/R/Lab3")
CLASS COMPARISONS USING CONTRASTS
> lab3a<-read.table('Lab3a.txt', header=T)
> lab3a
trtmt growth
1
L08
15.0
2
L08
17.5
3
L08
11.5
4
L12
18.0
5
L12
14.0
6
L12
17.5
7
L16
19.0
8
L16
21.5
9
L16
22.0
10
H08
32.0
11
H08
28.0
12
H08
28.0
13
H12
22.0
14
H12
26.5
15
H12
29.0
16
H16
33.0
17
H16
27.0
18
H16
35.0
> str(lab3a)
'data.frame': 18 obs. of 2 variables:
$ trtmt : Factor w/ 6 levels "H08","H12","H16",..: 4 4 4 5 5 5 6 6 6 1 ...
$ growth: num 15 17.5 11.5 18 14 17.5 19 21.5 22 32 ...
> model<-lm(growth~trtmt, lab3a)
> anova(model)
Analysis of Variance Table
Response: growth
Df Sum Sq Mean Sq F value
Pr(>F)
trtmt
5 718.57 143.714 16.689 4.881e-05 ***
Residuals 12 103.33
8.611
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
PLS205 2011
3.1
R Lab 3
# To define the groups of factors to compare, we need to create a matrix of
# orthogonal contrasts. We need to follow these rules:
#
#
#
#
1.
2.
3.
4.
Treatments to be lumped together get the same sign (plus or minus).
Groups of means to be contrasted get opposite sign.
Factor levels to be excluded get a contrast coefficient of 0.
The contrast coefficients must add up to 0.
From the SAS lab:
Contrast ‘Temp’ H0: Mean plant growth under low temperature conditions
is the same as under high temperature conditions.
Contrast ‘Light Linear’ H0: Mean plant growth under 8 hour days is the
same as under 16 hour days (OR: The response of growth to light has
no linear component).
Contrast ‘Light Quadratic’ H0: Mean plant growth under 12 hour days is
the same as the average mean growth under 8 and 16 hour days
combined (OR: The growth response to light is perfectly linear; OR:
The response of growth to light has no quadratic component).
Contrast ‘Temp * Light Linear’ H0: The linear component of the response
of growth to light is the same at both temperatures.
Contrast ‘Temp * Light Quadratic’ H0:
The quadratic component of the
response of growth to light is the same at both temperatures.
#
#
#
#
#
Contrast
Contrast
Contrast
Contrast
Contrast
‘Temp’:
1,1,1,-1,-1,-1
‘Light Linear’:
1,0,-1,1,0,-1
‘Light Quadratic’
1,-2, 1,1,-2,1
‘Temp * Light Linear’
1,0,-1,-1,0,1
‘Temp * Light Quadratic’ 1,-2,1,-1,2,-1
# We create four vectors, one for each comparison, and bind them together
# using the cbind function, which groups vectors into a matrix where each
# vector is a separate column.
> contrastmatrix<-cbind(c(1,1,1,-1,-1,-1),c(1,0,-1,1,0,-1),c(1,-2,1,1,-2,1),
c(1,0,-1,-1,0,1), c(1,-2,1,-1,2,-1))
> contrastmatrix
[,1] [,2] [,3] [,4] [,5]
[1,]
1
1
1
1
1
[2,]
1
0
-2
0
-2
[3,]
1
-1
1
-1
1
[4,]
-1
1
1
-1
-1
[5,]
-1
0
-2
0
2
[6,]
-1
-1
1
1
-1
# Now, we use this contrast matrix to define the contrasts in the factor
# “trtmt”. We use the command contrasts:
> contrasts(lab3a$trtmt)<-contrastmatrix
# If we now look again at the factor trtmt:
# we have assigned the contrasts as attributes to the levels of the factor
PLS205 2011
3.2
R Lab 3
> lab3a$trtmt
[1] L08 L08 L08 L12 L12 L12 L16 L16 L16 H08 H08 H08 H12 H12
[15] H12 H16 H16 H16
attr(,"contrasts")
[,1] [,2] [,3] [,4] [,5]
H08
1
1
1
1
1
H12
1
0
-2
0
-2
H16
1
-1
1
-1
1
L08
-1
1
1
-1
-1
L12
-1
0
-2
0
2
L16
-1
-1
1
1
-1
Levels: H08 H12 H16 L08 L12 L16
> model_contrast<-lm(growth~trtmt, lab3a)
> summary(model_contrast)
Call:
lm(formula = growth ~ trtmt, data = lab3a)
Residuals:
Min
1Q
-4.6667 -1.7083
Median
0.6667
3Q
1.4583
Max
3.3333
Coefficients:
Estimate Std. Error t value
(Intercept) 23.1389
0.6917 33.454
trtmt1
5.8056
0.6917
8.394
trtmt2
-2.1250
0.8471 -2.509
trtmt3
0.9861
0.4891
2.016
trtmt4
0.9583
0.8471
1.131
trtmt5
0.5694
0.4891
1.164
--Signif. codes: 0 '***' 0.001 '**' 0.01
Pr(>|t|)
3.23e-13
2.29e-06
0.0275
0.0667
0.2800
0.2669
***
***
*
.
'*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.934 on 12 degrees of freedom
Multiple R-squared: 0.8743,
Adjusted R-squared: 0.8219
F-statistic: 16.69 on 5 and 12 DF, p-value: 4.881e-05
In SAS
Contrast
Temp
Light linear
Light quadratic
Temp * Light linear
Temp * Light quadratic
PLS205 2011
DF
Contrast SS
Mean Square
F Value
1
1
1
1
1
606.6805556
54.1875000
35.0069444
11.0208333
11.6736111
606.6805556
54.1875000
35.0069444
11.0208333
11.6736111
70.45
6.29
4.07
1.28
1.36
3.3
Pr > F
<.0001 ***
0.0275 *
0.0667
0.2800
0.2669
R Lab 3
TREND ANALYSIS WITH CONTRASTS
# We are interested in the overall relationship between plant spacing and
# yield (i.e. characterizing the response of yield to plant spacing).
> lab3b<-read.table("Lab3b.txt", header=T)
> head(lab3b, 3)
Sp Yield
1 18 33.6
2 18 37.1
3 18 34.1
> str(lab3b)
'data.frame': 30 obs. of 2 variables:
$ Sp
: int 18 18 18 18 18 18 24 24 24 24 ...
$ Yield: num 33.6 37.1 34.1 34.6 35.4 36.1 31.1 34.5 30.5 32.7 ...
> lab3b$Sp<-as.factor(lab3b$Sp)
> str(lab3b)
'data.frame': 30 obs. of 2 variables:
$ Sp
: Factor w/ 5 levels "18","24","30",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Yield: num 33.6 37.1 34.1 34.6 35.4 36.1 31.1 34.5 30.5 32.7 ...
> anova(lm(Yield~Sp, lab3b))
Analysis of Variance Table
Response: Yield
Df Sum Sq Mean Sq F value
Pr(>F)
Sp
4 125.661 31.4153 9.9004 6.079e-05 ***
Residuals 25 79.328 3.1731
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What questions are we asking here exactly? As before it is helpful to articulate the
null hypothesis for each contrast:
Contrast
Contrast
Contrast
Contrast
#
#
#
#
‘Linear’ H0: The response of yield to spacing has no linear component.
‘Quadratic’ H0: The response of yield to spacing has no quadratic component.
‘Cubic’ H0: The response of yield to spacing has no cubic component.
‘Quartic’ H0: The response of yield to spacing has no quartic component.
Linear
Quadratic
Cubic
Quartic
-2, -1, 0, 1, 2
2, -1, -2, -1, 2
1, 2, 0, -2, 1
1, -4, 6, -4, -1
> contrasts(lab3b$Sp)<-cbind(c(-2, -1, 0, 1, 2), c(2, -1, -2, -1, 2), c(-1,
2, 0, -2, 1), c(1, -4, 6, -4, -1))
> lab3b$Sp
[1] 18 18 18 18 18 18 24 24 24 24 24 24 30 30 30 30 30 30 36
[20] 36 36 36 36 36 42 42 42 42 42 42
attr(,"contrasts")
[,1] [,2] [,3] [,4]
18
-2
2
-1
1
24
-1
-1
2
-4
30
0
-2
0
6
36
1
-1
-2
-4
PLS205 2011
3.4
R Lab 3
42
2
2
1
-1
Levels: 18 24 30 36 42
> summary(lm(Yield~Sp, lab3b))
Call:
lm(formula = Yield ~ Sp, data = lab3b)
Residuals:
Min
1Q Median
-2.6333 -1.1333 -0.5417
3Q
1.0375
Max
3.3667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.31225
0.32719 95.701 < 2e-16 ***
Sp1
-1.22441
0.23274 -5.261 1.9e-05 ***
Sp2
0.63971
0.19603
3.263 0.00318 **
Sp3
-0.08721
0.23066 -0.378 0.70857
Sp4
0.02230
0.08948
0.249 0.80519
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.781 on 25 degrees of freedom
Multiple R-squared: 0.613,
Adjusted R-squared: 0.5511
F-statistic:
9.9 on 4 and 25 DF, p-value: 6.079e-05
In SAS
Dependent Variable: Yield
Source
DF
Sum of
Squares
Model
Error
Corrected Total
4
25
29
125.6613333
79.3283333
204.9896667
Source
Mean Square
F Value
Pr > F
31.4153333
3.1731333
9.90
<.0001
R-Square
Coeff Var
Root MSE
Yield Mean
0.613013
5.690541
1.781329
31.30333
DF
Type III SS
Mean Square
F Value
Pr > F
4
125.6613333
31.4153333
9.90
<.0001
Contrast
DF
Contrast SS
Mean Square
F Value
Pr > F
Linear
Quadratic
Cubic
Quartic
1
1
1
1
91.26666667
33.69333333
0.50416667
0.19716667
91.26666667
33.69333333
0.50416667
0.19716667
28.76
10.62
0.16
0.06
Sp
<.0001 ***
0.0032 **
0.6936
0.8052
Interpretation
There is a quadratic relationship between row spacing and yield.
Why?
Because there is a significant quadratic component to the response but no
significant cubic or quartic components. Please note that we are only able
to carry out trend comparisons in this way because the treatments are equally
spaced. Now, exactly the same result can be obtained through a regression
approach, as shown in the next example.
PLS205 2011
3.5
R Lab 3
TREND ANALYSIS WITH MULTIPLE REGRESSION
# We use the same data set lab3b as before, but for the multiple
# regression we cannot use a factor so we need to create a numeric vector
# with the spacing information.
> sp<-rep(c(18, 24, 30, 36, 42), each=6)
> sp
[1] 18 18 18 18 18 18 24 24 24 24 24 24 30 30 30 30 30 30 36
[20] 36 36 36 36 36 42 42 42 42 42 42
# Calculate the quadratic, cubic and quartic vectors
> sp2<-sp^2
> sp3<-sp^3
> sp4<-sp^4
# Run the multiple regression, using + to separate the multiple variables
> anova(lm(Yield~sp+sp2+sp3+sp4, lab3b))
Analysis of Variance Table
Response: Yield
Df Sum Sq Mean Sq F value
Pr(>F)
sp
1 91.267 91.267 28.7623 1.461e-05 ***
sp2
1 33.693 33.693 10.6183 0.003218 **
sp3
1 0.504
0.504 0.1589 0.693568
sp4
1 0.197
0.197 0.0621 0.805187
Residuals 25 79.328
3.173
--Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In SAS
Dependent Variable: Yield
Source
DF
Sum of
Squares
Model
Error
Corrected Total
4
25
29
125.6613333
79.3283333
204.9896667
Source
Sp
Sp*Sp
Sp*Sp*Sp
Sp*Sp*Sp*Sp
PLS205 2011
Mean Square
F Value
Pr > F
31.4153333
3.1731333
9.90
<.0001
Pr > F
R-Square
Coeff Var
Root MSE
Yield Mean
0.613013
5.690541
1.781329
31.30333
DF
Type I SS
Mean Square
F Value
1
1
1
1
91.26666667
33.69333333
0.50416667
0.19716667
91.26666667
33.69333333
0.50416667
0.19716667
28.76
10.62
0.16
0.06
3.6
<.0001 ***
0.0032 **
0.6936
0.8052
R Lab 3
Multiple comparison tests
# All tests are run using the lab3c data.
> lab3c<-read.table("Lab3c.txt", header=T)
> str(lab3a)
'data.frame': 18 obs. of 2 variables:
$ trtmt : Factor w/ 6 levels "H08","H12","H16",..: 4 4 4 5 5 5 6 6 6 1 ...
$ growth: num 15 17.5 11.5 18 14 17.5 19 21.5 22 32 ...
#
#
#
#
#
#
Install the required package
LSD and other posthoc tests are not in the default packages; the package
“agricolae” contains scripts for LSD, Scheffe, Duncan, and SNK tests, among
others. Agricolae was developed by Felipe de Mendiburu as part of his
master thesis "A statistical analysis tool for agricultural research" –
Univ. Nacional de Ingenieria, Lima-Peru (UNI).
> install.packages("agricolae")
# How can we find out which functions are included in a package?
> install.packages("cwhmisc")
# this cwhmisc package helps listing functions within packages
> library(agricolae)
> library(cwhmisc)
> libs(agricolae)
Information on package 'agricolae'
[…]
Index:
AMMI
AMMI Analysis
AMMI.contour
AMMI contour
BIB.test
Finding the Variance Analysis of the Balanced
Incomplete Block Design
CIC
Data for late blight of potatoes
Chz2006
Data amendment Carhuaz 2006
ComasOxapampa
Data AUDPC Comas - Oxapampa
DAU.test
Finding the Variance Analysis of the Augmented
block Design
[…]
# Or if you know (even vaguely) what you are looking for you can use the
# help.search function (equivalent to ??):
> help.search("Duncan")
Help files with alias or concept or title matching ‘duncan’ using fuzzy
matching:
agricolae::duncan.test
Duncan's new multiple range test
agricolae::waller.test
Multiple comparisons, Waller-Duncan
PLS205 2011
3.7
R Lab 3
Fixed range tests
1. LSD
> library(agricolae)
> model<-aov(N_level~Culture, lab3c)
> LSD.test(model, "Culture")
Study:
LSD t Test for N_level
Mean Square Error: 6.668833
Culture, means and individual ( 95 %) CI
3DOk1
3DOk13
3DOk4
3DOk5
3DOk7
Comp
N_level
28.80
13.26
14.60
23.94
19.88
18.70
std.err replication
LCL
UCL
1.5254508
5 25.65162 31.94838
0.6384356
5 11.94233 14.57767
1.3586758
5 11.79583 17.40417
1.2540335
5 21.35180 26.52820
1.1560277
5 17.49408 22.26592
0.7162402
5 17.22175 20.17825
alpha: 0.05 ; Df Error: 24
Critical Value of t: 2.063899
Least Significant Difference 3.37088
Means with the same letter are not significantly different.
Groups, Treatments and means
a
3DOk1
28.8
b
3DOk5
23.94
c
3DOk7
19.88
c
Comp
18.7
d
3DOk4
14.6
d
3DOk13
13.26
2. Tukey
# Using the function TukeyHSD from default ‘stats’ package
> model<-aov(N_level~Culture, lab3c)
> TukeyHSD(model)
Fit: aov(formula = N_level ~ Culture)
$Culture
diff
lwr
upr
p adj
3DOk13-3DOk1 -15.54 -20.5899227 -10.4900773 0.0000000
3DOk4-3DOk1 -14.20 -19.2499227 -9.1500773 0.0000001
3DOk5-3DOk1
-4.86 -9.9099227
0.1899227 0.0640326
3DOk7-3DOk1
-8.92 -13.9699227 -3.8700773 0.0001705
Comp-3DOk1
-10.10 -15.1499227 -5.0500773 0.0000293
3DOk4-3DOk13
1.34 -3.7099227
6.3899227 0.9608138
3DOk5-3DOk13 10.68
5.6300773 15.7299227 0.0000125
3DOk7-3DOk13
6.62
1.5700773 11.6699227 0.0054499
Comp-3DOk13
5.44
0.3900773 10.4899227 0.0295653
PLS205 2011
3.8
R Lab 3
3DOk5-3DOk4
3DOk7-3DOk4
Comp-3DOk4
3DOk7-3DOk5
Comp-3DOk5
Comp-3DOk7
9.34
4.2900773
5.28
0.2300773
4.10 -0.9499227
-4.06 -9.1099227
-5.24 -10.2899227
-1.18 -6.2299227
14.3899227
10.3299227
9.1499227
0.9899227
-0.1900773
3.8699227
0.0000907
0.0367716
0.1606296
0.1679830
0.0388112
0.9772111
# Or HSD.test from the “agricolae” package:
> HSD.test(model, "Culture")
Study: HSD Test for N_level
Mean Square Error:
Culture, means
3DOk1
3DOk13
3DOk4
3DOk5
3DOk7
Comp
N_level
28.80
13.26
14.60
23.94
19.88
18.70
6.668833
std.err replication
1.5254508
5
0.6384356
5
1.3586758
5
1.2540335
5
1.1560277
5
0.7162402
5
alpha: 0.05 ; Df Error: 24
Critical Value of Studentized Range: 4.372651
Honestly Significant Difference: 5.049923
Means with the same letter are not significantly different.
Groups, Treatments and means
a
3DOk1
28.8
ab
3DOk5
23.94
bc
3DOk7
19.88
cd
Comp
18.7
de
3DOk4
14.6
e
3DOk13
13.26
3. Scheffe
# scheffe.test from the “agricolae” package:
> scheffe.test(model, "Culture")
Study: Scheffe Test for N_level
Mean Square Error
Culture, means
3DOk1
: 6.668833
N_level
std.err replication
28.80 1.5254508
5
PLS205 2011
3.9
R Lab 3
3DOk13
3DOk4
3DOk5
3DOk7
Comp
13.26
14.60
23.94
19.88
18.70
0.6384356
1.3586758
1.2540335
1.1560277
0.7162402
5
5
5
5
5
alpha: 0.05 ; Df Error: 24
Critical Value of F: 2.620654
Minimum Significant Difference: 5.912141
Means with the same letter are not significantly different.
Groups, Treatments and means
ab
3DOk1
28.8
b
3DOk5
23.94
bcd
3DOk7
19.88
cd
Comp
18.7
de
3DOk4
14.6
e
3DOk13
13.26
Multiple Range Tests
1. Duncan
# duncan.test from the “agricolae” package:
> duncan.test(model, "Culture")
Study: Duncan's new multiple range test for N_level
Mean Square Error:
Culture, means
3DOk1
3DOk13
3DOk4
3DOk5
3DOk7
Comp
N_level
28.80
13.26
14.60
23.94
19.88
18.70
6.668833
std.err replication
1.5254508
5
0.6384356
5
1.3586758
5
1.2540335
5
1.1560277
5
0.7162402
5
alpha: 0.05 ; Df Error: 24
Critical Range
2
3
4
5
6
3.370880 3.540437 3.649301 3.726194 3.783592
Means with the same letter are not significantly different.
Groups, Treatments and means
a
3DOk1
28.8
PLS205 2011
3.10
R Lab 3
b
3DOk5
3DOk7
Comp
3DOk4
3DOk13
c
c
d
d
23.94
19.88
18.7
14.6
13.26
2. SNK
# SNK.test from the “agricolae” package:
> SNK.test(model, "Culture")
Study:
Student Newman Keuls Test
for N_level
Mean Square Error:
Culture,
3DOk1
3DOk13
3DOk4
3DOk5
3DOk7
Comp
6.668833
means
N_level
28.80
13.26
14.60
23.94
19.88
18.70
std.err replication
1.5254508
5
0.6384356
5
1.3586758
5
1.2540335
5
1.1560277
5
0.7162402
5
alpha: 0.05 ; Df Error: 24
Critical Range
2
3
4
5
6
3.370880 4.078715 4.505521 4.811627 5.049923
Means with the same letter are not significantly different.
Groups, Treatments and means
a
3DOk1
28.8
b
3DOk5
23.94
c
3DOk7
19.88
c
Comp
18.7
d
3DOk4
14.6
d
3DOk13
13.26
PLS205 2011
3.11
R Lab 3
Culture
3DOk1
3DOk5
3DOk7
Comp
3DOk4
3DOk13
Least
Sig't
Difference
EER
Control
PLS205 2011
Significance Groupings
Tukey Scheffe Duncan
A
A
A
AB
AB
B
BC
BC
C
CD
BCD
C
DE
CD
D
E
D
D
LSD
A
B
C
C
D
D
Dunnett
***
***
3.371
4.402
5.05
5.912
fixed
fixed
fixed
no
yes
yes
***
SNK
A
B
C
C
D
D
REGWQ
A
B
BC
CD
DE
E
3.371
3.371
4.191
fixed
3.784
5.05
5.05
yes
no
EERC
only
yes
3.12
R Lab 3
Download