Document 10639593

advertisement
ANalysis Of VAriance
(ANOVA)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
1 / 58
Setup and Notation
y = Xβ + , ∼ N(0, σ 2 I)
Let X1 = 1, Xm = X, and Xm+1 = I.
Suppose X2 , . . . , Xm are matrices satisfying
C(X1 ) ⊂ C(X2 ) ⊂ · · · ⊂ C(Xm−1 ) ⊂ C(Xm ).
Let Pj = PXj and rj = rank(Xj ) ∀ j = 1, . . . , m + 1.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
2 / 58
The Total Sum of Squares
The total sum of squares (also known as the corrected total sum
of squares) is
n
X
i=1
0 

y1 − ȳ·
y1 − ȳ·
..
..
 
 = [y − ȳ· 1]0 [y − ȳ· 1]
= 
.
.
yn − ȳ·
yn − ȳ·

(yi − ȳ· )2
= [y − P1 y]0 [y − P1 y] = [Iy − P1 y]0 [Iy − P1 y]
= [(I − P1 )y]0 [(I − P1 )y] = y0 (I − P1 )0 (I − P1 )y
= y0 (I − P1 )(I − P1 )y = y0 (I − P1 )y.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
3 / 58
Partitioning the Total Sum of Squares
n
X
(yi − ȳ· )2 = y0 (I − P1 )y = y0 (Pm+1 − P1 )y
i=1
= y0
m+1
X
j=2
Pj −
m
X
!
Pj
y
j=1
0
= y (Pm+1 − Pm + Pm − Pm−1 + · · · + P2 − P1 )y
= y0 (Pm+1 − Pm )y + . . . + y0 (P2 − P1 )y
=
m
X
y0 (Pj+1 − Pj )y.
j=1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
4 / 58
The sums of squares in the equation
y0 (I − P1 )y =
m
X
y0 (Pj+1 − Pj )y
j=1
are often arranged in an ANOVA table.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
5 / 58
Some Additional Sum of Squares Notation
Sum of Squares
Sum of Squares
y0 (P2 − P1 )y
SS(2 | 1)
y0 (P3 − P2 )y
..
.
SS(3 | 2)
..
.
y0 (Pm − Pm−1 )y
SS(m | m − 1)
y0 (Pm+1 − Pm )y
SSE = y0 (I − PX )y
n
X
SSTo =
(yi − ȳ· )2
y0 (I − P1 )y
i=1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
6 / 58
Note that
SS(j + 1 | j) = y0 (Pj+1 − Pj )y
= y0 (Pj+1 − Pj + I − I)y
= y0 (I − Pj − I + Pj+1 )y
= y0 (I − Pj )y − y0 (I − Pj+1 )y
= SSEj − SSEj+1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
7 / 58
Thus, SS(j + 1 | j) is the amount the error sum of square
decreases when y is projected onto C(Xj+1 ) instead of C(Xj ).
SS(j + 1 | j), j = 1, . . . , m − 1 are called Sequential Sums of
Squares.
SAS calls these Type I Sums of Squares.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
8 / 58
Properties of the Matrices of the Quadratic Forms
The matrices of the quadratic forms in the ANOVA table have
several useful properties:
Symmetry
Idempotency
Rank=Trace
Zero Cross-Products
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
9 / 58
Symmetry and Idempotency
Note that ∀ j = 1, . . . , m
(Pj+1 − Pj )0 = P0j+1 − P0j = Pj+1 − Pj
and
(Pj+1 − Pj )(Pj+1 − Pj ) = Pj+1 Pj+1 − Pj+1 Pj − Pj Pj+1 + Pj Pj
= Pj+1 − Pj − Pj + Pj
= Pj+1 − Pj .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
10 / 58
Rank=Trace
Because rank is equal to trace for idempotent matrices, we have
rank(Pj+1 − Pj ) = tr(Pj+1 − Pj ) = tr(Pj+1 ) − tr(Pj )
= rank(Pj+1 ) − rank(Pj )
= rank(Xj+1 ) − rank(Xj )
= rj+1 − rj .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
11 / 58
Zero Cross-Products
∀j<`
(Pj+1 − Pj )(P`+1 − P` ) = Pj+1 P`+1 − Pj+1 P` − Pj P`+1 + Pj P`
= Pj+1 − Pj+1 − Pj + Pj
= 0.
Transposing both sides and using symmetry gives
(P`+1 − P` )(Pj+1 − Pj ) = 0.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
12 / 58
Distribution of Scaled ANOVA Sums of Squares
Because
Pj+1 − Pj
σ2
σ 2 I = Pj+1 − Pj
is idempotent,
y0 (Pj+1 − Pj )y
∼ χ2rj+1 −rj
σ2
β 0 X0 (Pj+1 − Pj )Xβ
2σ 2
for all j = 1, . . . , m.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
13 / 58
ANOVA Table with Degrees of Freedom
Sum of Squares
Degrees of Freedom
DF
y0 (P2 − P1 )y
rank(X2 ) − rank(X1 )
r2 − 1
y0 (P3 − P2 )y
..
.
rank(X3 ) − rank(X2 )
..
.
r3 − r2
..
.
y0 (Pm − Pm−1 )y
rank(Xm ) − rank(Xm−1 )
r − rm−1
y0 (Pm+1 − Pm )y
rank(Xm+1 ) − rank(Xm )
n−r
y0 (I − P1 )y
rank(Xm+1 ) − rank(X1 )
n−1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
14 / 58
Mean Squares
For j = 1, . . . , m − 1, define
MS(j + 1 | j) =
y0 (Pj+1 − Pj )y
SS(j + 1 | j)
=
.
rj+1 − rj
rj+1 − rj
These sums of squares divided by their degrees of freedom are
known as mean squares.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
15 / 58
ANOVA Table with Mean Squares
Sum of Squares Degrees of Freedom Mean Square
SS(2 | 1)
r2 − 1
MS(2|1)
SS(3 | 2)
..
.
r3 − r2
..
.
MS(3|2)
..
.
SS(m | m − 1)
r − rm−1
MS(m|m − 1)
SSE
n−r
MSE
SSTo
n−1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
16 / 58
Independence of ANOVA Sums of Squares
Because
(Pj+1 − Pj ) σ 2 I (P`+1 − P` ) = 0
for all j 6= `, any two ANOVA sums of squares (not including
SSTo) are independent.
It is also true that the ANOVA sums of squares (not including
SSTo) are mutually independent by Cochran’s Theorem, but that
stronger result is not usually needed.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
17 / 58
ANOVA F Statistics
For j = 1, . . . , m − 1 we have
Fj
y0 (Pj+1 − Pj )y/(rj+1 − rj )
MS(j + 1 | j)
=
=
MSE
y0 (I − PX )y/(n − r)
∼ Frj+1 −rj ,n−r
c
Copyright 2016
Dan Nettleton (Iowa State University)
β 0 X0 (Pj+1 − Pj )Xβ
2σ 2
.
Statistics 510
18 / 58
ANOVA Table with F Statistics
Sum of Squares Degrees of Freedom Mean Square F Stat
SS(2 | 1)
r2 − 1
MS(2|1)
F1
SS(3 | 2)
..
.
r3 − r2
..
.
MS(3|2)
..
.
F2
..
.
SS(m | m − 1)
r − rm−1
MS(m|m − 1)
SSE
n−r
MSE
SSTo
n−1
c
Copyright 2016
Dan Nettleton (Iowa State University)
Fm−1
Statistics 510
19 / 58
Relationship with Reduced vs. Full Model F Statistic
The ANOVA Fj statistic:
y0 (Pj+1 − Pj )y/(rj+1 − rj )
MS(j + 1 | j)
Fj =
=
0
y (I − PX )y/(n − r)
MSE
The reduced vs. full model F statistic:
F=
y0 (PX − PX0 )y/(r − r0 )
y0 (I − PX )y/(n − r)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
20 / 58
What do ANOVA F statistics test?
In general, an F statistic is used to test
H0 : “The non-centrality parameter of the F statistic is zero.”
vs.
HA : “The non-centrality parameter of the F statistic is not zero.”
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
21 / 58
What do ANOVA F statistics test?
The ANOVA F statistic
Fj =
y0 (Pj+1 − Pj )y/(rj+1 − rj )
MS(j + 1 | j)
=
y0 (I − PX )y/(n − r)
MSE
has non-centrality parameter
β 0 X0 (Pj+1 − Pj )Xβ
.
2σ 2
Thus, Fj can be used to test
H0j :
β 0 X0 (Pj+1 − Pj )Xβ
β 0 X0 (Pj+1 − Pj )Xβ
=
0
vs.
H
:
6= 0.
Aj
2σ 2
2σ 2
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
22 / 58
What do ANOVA F statistics test?
The following are equivalent ways to write the null and
alternative hypotheses tested by Fj .
H0j
HAj
β 0 X0 (Pj+1 − Pj )Xβ = 0
β 0 X0 (Pj+1 − Pj )Xβ 6= 0
(Pj+1 − Pj )Xβ = 0
(Pj+1 − Pj )Xβ 6= 0
Pj E(y) = Pj+1 E(y)
Pj E(y) 6= Pj+1 E(y)
Pj+1 E(y) ∈ C(Xj )
Pj+1 E(y) ∈ C(Xj+1 ) \ C(Xj )
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
23 / 58
What do ANOVA F statistics test?
H0j : (Pj+1 − Pj )Xβ = 0 vs. HAj : (Pj+1 − Pj )Xβ 6= 0
is of the form
H0j : C∗j β = 0 vs. HAj : C∗j β 6= 0,
where C∗j = (Pj+1 − Pj )X.
As written, H0j is not a testable hypothesis because C∗j has n
rows but rank rj+1 − rj < n (homework problem).
We can rewrite H0j as a testable hypothesis by replacing C∗j with
any matrix Cj whose q = rj+1 − rj rows form a basis for the row
space of C∗j .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
24 / 58
Example: Multiple Regression
X1 = 1
X2 = [1, x1 ]
X3 = [1, x1 , x2 ]
..
.
Xm = [1, x1 , . . . , xm−1 ]
SS(j + 1 | j) is the decrease in SSE that results when the
explanatory variable xj is added to a model containing an
intercept and explanatory variables x1 , . . . , xj−1 .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
25 / 58
Example: Polynomial Regression
X1 = 1
X2 = [1, x]
X3 = [1, x, x2 ]
..
.
Xm = [1, x, x2 , . . . , xm−1 ]
SS(j + 1 | j) is the decrease in SSE that results when the
explanatory variable xj is added to a model containing an
intercept and explanatory variables x, x2 , . . . , xj−1 .
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
26 / 58
An Example in R
>
>
>
>
>
#An example from "Design of Experiments: Statistical
#Principles of Research Design and Analysis"
#2nd Edition by Robert O. Kuehl
d=read.delim("http://....˜dnett/S510/PlantDensity.txt")
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
27 / 58
The Data
> d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
PlantDensity GrainYield
10
12.2
10
11.4
10
12.4
20
16.0
20
15.5
20
16.5
30
18.6
30
20.2
30
18.2
40
17.6
40
19.3
40
17.1
50
18.0
50
16.4
50
16.6
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
28 / 58
Renaming the Variables and Plotting the Data
> names(d)=c("x","y")
> head(d)
x
y
1 10 12.2
2 10 11.4
3 10 12.4
4 20 16.0
5 20 15.5
6 20 16.5
>
> plot(d[,1],d[,2],col=4,pch=16,xlab="Plant Density",
+
ylab="Grain Yield")
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
29 / 58
20
●
●
●
18
●
●
●
●
●
16
●
14
●
12
Grain Yield
●
●
●
●
●
10
20
c
Copyright 2016
Dan Nettleton (Iowa State University)
30
40
50
Plant Density
Statistics 510
30 / 58
Matrices with Nested Column Spaces














X1 = 












1
1
1
1
1
1
1
1
1
1
1
1
1
1
1




























 , X2 = 
























c
Copyright 2016
Dan Nettleton (Iowa State University)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
10
10
20
20
20
30
30
30
40
40
40
50
50
50




























 , X3 = 
























1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
10
10
20
20
20
30
30
30
40
40
40
50
50
50
100
100
100
400
400
400
900
900
900
1600
1600
1600
2500
2500
2500














,












Statistics 510
31 / 58
Matrices with Nested Column Spaces













X4 = 












1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
10
10
20
20
20
30
30
30
40
40
40
50
50
50
100
100
100
400
400
400
900
900
900
1600
1600
1600
2500
2500
2500
1000
1000
1000
8000
8000
8000
27000
27000
27000
64000
64000
64000
125000
125000
125000


























 , X5 = 
























c
Copyright 2016
Dan Nettleton (Iowa State University)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
10
10
10
20
20
20
30
30
30
40
40
40
50
50
50
100
100
100
400
400
400
900
900
900
1600
1600
1600
2500
2500
2500
1000
1000
1000
8000
8000
8000
27000
27000
27000
64000
64000
64000
125000
125000
125000
10000
10000
10000
160000
160000
160000
810000
810000
810000
2560000
2560000
2560000
6250000
6250000
6250000
Statistics 510


























32 / 58
Centering and Standardizing for Numerical Stability
It is typically best for numerical stability to center and scale a
quantitative explanatory variable prior to computing higher order
terms.
In the plant density example, we could replace x by (x − 30)/10
and work with the matrices on the next two slides.
Because these matrices have the same column spaces as the
original matrices, the ANOVA table entries are mathematically
identical for either set of matrices.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
33 / 58
Matrices with Centered and Scaled x













X1 = 












1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


























 , X2 = 
























c
Copyright 2016
Dan Nettleton (Iowa State University)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
−2
−2
−2
−1
−1
−1
0
0
0
1
1
1
2
2
2


























 , X3 = 
























1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
−2
−2
−2
−1
−1
−1
0
0
0
1
1
1
2
2
2
4
4
4
1
1
1
0
0
0
1
1
1
4
4
4













,












Statistics 510
34 / 58
Matrices with Centered and Scaled x













X4 = 












1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
−2
−2
−2
−1
−1
−1
0
0
0
1
1
1
2
2
2
4
4
4
1
1
1
0
0
0
1
1
1
4
4
4
−8
−8
−8
−1
−1
−1
0
0
0
1
1
1
8
8
8
c
Copyright 2016
Dan Nettleton (Iowa State University)


























 , X5 = 
























1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
−2
−2
−2
−1
−1
−1
0
0
0
1
1
1
2
2
2
4
4
4
1
1
1
0
0
0
1
1
1
4
4
4

−8 16
−8 16 

−8 16 

−1 1 

−1 1 

−1 1 

0 0 

0 0 

0 0 

1 1 

1 1 

1 1 

8 16 

8 16 
8 16
Statistics 510
35 / 58
Regardless of whether we center and scale x, the column space
of X5 is the same as the column space of the cell means model
matrix













X=












c
Copyright 2016
Dan Nettleton (Iowa State University)
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1













.












Statistics 510
36 / 58
ANOVA Table for the Plant Density Data
Source
x|1
x2 |1, x
x3 |1, x, x2
x4 |1, x, x2 , x3
Error
C. Total
Sum of Squares
y0 (P2 − P1 )y
y0 (P3 − P2 )y
y0 (P4 − P3 )y
y0 (P5 − P4 )y
y0 (I − P5 )y
y0 (I − P1 )y
c
Copyright 2016
Dan Nettleton (Iowa State University)
DF
2−1=1
3−2=1
4−3=1
5−4=1
15 − 5 = 10
15 − 1 = 14
Statistics 510
37 / 58
Creating the Matrices in R
> y=d$y
> x=(d$x-mean(d$x))/10
> x
[1] -2 -2 -2 -1 -1 -1 0 0 0 1 1 1 2 2
>
> n=nrow(d)
>
> x1=matrix(1,nrow=n,ncol=1)
> x2=cbind(x1,x)
> x3=cbind(x2,xˆ2)
> x4=cbind(x3,xˆ3)
> x5=matrix(model.matrix(˜0+factor(x)),nrow=n)
> I=diag(rep(1,n))
c
Copyright 2016
Dan Nettleton (Iowa State University)
2
Statistics 510
38 / 58
Creating the Projection Matrices in R
>
>
+
+
>
>
>
>
>
>
library(MASS)
proj=function(x){
x%*%ginv(t(x)%*%x)%*%t(x)
}
p1=proj(x1)
p2=proj(x2)
p3=proj(x3)
p4=proj(x4)
p5=proj(x5)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
39 / 58
Computing the Sums of Squares in R
> t(y)%*%(p2-p1)%*%y
[,1]
[1,] 43.2
> t(y)%*%(p3-p2)%*%y
[,1]
[1,]
42
> t(y)%*%(p4-p3)%*%y
[,1]
[1,] 0.3
> t(y)%*%(p5-p4)%*%y
[,1]
[1,] 2.1
> t(y)%*%(I-p5)%*%y
[,1]
[1,] 7.48
> t(y)%*%(I-p1)%*%y
[,1]
[1,] 95.08
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
40 / 58
The ANOVA Table in R
> o=lm(y˜x+I(xˆ2)+I(xˆ3)+I(xˆ4),data=d)
> anova(o)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq
x
1 43.20 43.200
I(xˆ2)
1 42.00 42.000
I(xˆ3)
1
0.30
0.300
I(xˆ4)
1
2.10
2.100
Residuals 10
7.48
0.748
--Signif. codes: 0 *** 0.001
c
Copyright 2016
Dan Nettleton (Iowa State University)
F value
Pr(>F)
57.7540 1.841e-05 ***
56.1497 2.079e-05 ***
0.4011
0.5407
2.8075
0.1248
** 0.01 * 0.05 . 0.1
1
Statistics 510
41 / 58
What do these ANOVA F statistics test?
1st line: Does a linear mean function fit the data significantly
better than a constant mean function?
2nd line: Does a quadratic mean function fit the data
significantly better than a linear mean function?
3rd line: Does a cubic mean function fit the data significantly
better than a quadratic mean function?
4th line: Does a quartic mean function fit the data
significantly better than a cubic mean function?
To answer each question, the error variance σ 2 is estimated from
the fit of the full model with one mean for each plant density.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
42 / 58
What do these ANOVA F statistics test?
In general, we have
H0j : (Pj+1 − Pj )Xβ = 0 vs. HAj : (Pj+1 − Pj )Xβ 6= 0
which, in testable form, is
H0j : Cj β = 0 vs. HAj : Cj β 6= 0,
where Cj is any matrix whose q = rj+1 − rj rows form a basis for
the row space of (Pj+1 − Pj )X.
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
43 / 58
First Line of the ANOVA Table as Test of H0 : Cβ = 0
> X=x5
> (p2-p1)%*%X
[,1] [,2] [,3] [,4] [,5]
[1,] 0.4 0.2
0 -0.2 -0.4
[2,] 0.4 0.2
0 -0.2 -0.4
[3,] 0.4 0.2
0 -0.2 -0.4
[4,] 0.2 0.1
0 -0.1 -0.2
[5,] 0.2 0.1
0 -0.1 -0.2
[6,] 0.2 0.1
0 -0.1 -0.2
[7,] 0.0 0.0
0 0.0 0.0
[8,] 0.0 0.0
0 0.0 0.0
[9,] 0.0 0.0
0 0.0 0.0
[10,] -0.2 -0.1
0 0.1 0.2
[11,] -0.2 -0.1
0 0.1 0.2
[12,] -0.2 -0.1
0 0.1 0.2
[13,] -0.4 -0.2
0 0.2 0.4
[14,] -0.4 -0.2
0 0.2 0.4
[15,] -0.4 -0.2
0 0.2 0.4
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
44 / 58
First Line of the ANOVA Table as Test of H0 : Cβ = 0
Because rank(P2 − P1 ) = rank(X2 ) − rank(X1 ) = 2 − 1 = 1, any
nonzero constant times any one nonzero row of P2 − P1 forms a
basis for the row space of P2 − P1 .
For example, we could choose C to be the following one-row
matrix:
> 5*((p2-p1)%*%X)[15,]
[1] -2 -1 0 1 2
Some text books would describe these as “the coefficients of a
contrast to test for linear trend.” (Note this is different than a test
for “lack of linear fit.”)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
45 / 58
We can add consecutive lines in an ANOVA table.
Source
x|1
x2 |1, x
x3 |1, x, x2
x4 |1, x, x2 , x3
Error
C. Total
Sum of Squares
y0 (P2 − P1 )y
y0 (P3 − P2 )y
y0 (P4 − P3 )y
y0 (P5 − P4 )y
y0 (I − P5 )y
y0 (I − P1 )y
c
Copyright 2016
Dan Nettleton (Iowa State University)
DF
2−1=1
3−2=1
4−3=1
5−4=1
15 − 5 = 10
15 − 1 = 14
Statistics 510
46 / 58
We can add consecutive lines in an ANOVA table.
Source
x|1
x2 , x3 , x4 , |1, x
Error
C. Total
Sum of Squares
y0 (P2 − P1 )y
y0 (P5 − P2 )y
y0 (I − P5 )y
y0 (I − P1 )y
c
Copyright 2016
Dan Nettleton (Iowa State University)
DF
2−1=1
5−2=3
15 − 5 = 10
15 − 1 = 14
Statistics 510
47 / 58
In this case, the combined rows test for lack of linear fit
relative to a model with one unrestricted mean for
each plant density.
Source
x|1
Lack of Linear Fit
Error
C. Total
c
Copyright 2016
Dan Nettleton (Iowa State University)
Sum of Squares
y0 (P2 − P1 )y
y0 (P5 − P2 )y
y0 (I − P5 )y
y0 (I − P1 )y
DF
2−1=1
5−2=3
15 − 5 = 10
15 − 1 = 14
Statistics 510
48 / 58
>
>
>
>
>
>
>
>
#Let’s add the best fitting simple linear regression
#line to our plot.
o=lm(y˜x,data=d)
u=seq(0,60,by=.01) #overkill here but used later.
lines(u,coef(o)[1]+coef(o)[2]*u,col=2)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
49 / 58
20
●
●
●
18
●
●
●
●
●
16
●
14
●
12
Grain Yield
●
●
●
●
●
10
20
c
Copyright 2016
Dan Nettleton (Iowa State University)
30
40
50
Plant Density
Statistics 510
50 / 58
> #The linear fit doesn’t look very good.
> #Let’s formally test for lack of fit.
>
> o=lm(y˜x+factor(x),data=d)
> anova(o)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value
Pr(>F)
x
1 43.20 43.200 57.754 1.841e-05 ***
factor(x) 3 44.40 14.800 19.786 0.0001582 ***
Residuals 10
7.48
0.748
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
c
Copyright 2016
Dan Nettleton (Iowa State University)
1
Statistics 510
51 / 58
> #It looks like a linear fit is inadequate.
> #Let’s try a quadratic fit.
>
> o=lm(y˜x+I(xˆ2)+factor(x),data=d)
> anova(o)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq
x
1 43.20 43.200
I(xˆ2)
1 42.00 42.000
factor(x) 2
2.40
1.200
Residuals 10
7.48
0.748
--Signif. codes: 0 *** 0.001
c
Copyright 2016
Dan Nettleton (Iowa State University)
F value
Pr(>F)
57.7540 1.841e-05 ***
56.1497 2.079e-05 ***
1.6043
0.2487
** 0.01 * 0.05 . 0.1
1
Statistics 510
52 / 58
>
>
>
>
>
>
>
>
>
#It looks like a quadratic fit is adequate.
#Let’s estimate the coefficients for the best
#quadratic fit.
b=coef(lm(y˜x+I(xˆ2),data=d))
#Let’s add the best fitting quadratic curve
#to our plot.
lines(u,b[1]+b[2]*u+b[3]*uˆ2,col=3)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
53 / 58
20
●
●
●
18
●
●
●
●
●
16
●
14
●
12
Grain Yield
●
●
●
●
●
10
20
c
Copyright 2016
Dan Nettleton (Iowa State University)
30
40
50
Plant Density
Statistics 510
54 / 58
> #Let’s add the treatment group means to our plot.
>
> trt.means=tapply(d$y,d$x,mean)
>
> points(unique(d$x),trt.means,pch="X")
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
55 / 58
20
●
●
X
●
18
●
X
●
●
16
X
●
●
●
X
14
●
12
Grain Yield
●
●
●
●
X
●
10
20
c
Copyright 2016
Dan Nettleton (Iowa State University)
30
40
50
Plant Density
Statistics 510
56 / 58
>
>
>
>
>
>
#The quartic fit will pass through the treatment
#means.
b=coef(lm(y˜x+I(xˆ2)+I(xˆ3)+I(xˆ4),data=d))
lines(u,b[1]+b[2]*u+b[3]*uˆ2+b[4]*uˆ3+b[5]*uˆ4,col=1)
c
Copyright 2016
Dan Nettleton (Iowa State University)
Statistics 510
57 / 58
20
●
●
X
●
18
●
X
●
●
16
X
●
●
●
X
14
●
12
Grain Yield
●
●
●
●
X
●
10
20
c
Copyright 2016
Dan Nettleton (Iowa State University)
30
40
50
Plant Density
Statistics 510
58 / 58
Download