Analysis of Variance Analysis of Variance (AOV) was originally devised within y 1

advertisement
Analysis of Variance
Analysis of Variance (AOV) was originally devised within
the realm of agricultural statistics for testing the yields of 1
various crops under different nutrient regimes. Typically,
2
a field is divided into a regular array, in row and column
3
format, of small plots of a fixed size. The yield yi, j
within each plot is recorded.
y1, 1
y1, 2
y1, 3
y2, 1
y2, 2
y2, 3
y3, 1
y3, 2
y3, 3
y1, 4
y1, 5
If the field is of irregular width, different crops can be grown in each row and we can regard
the yields as replicated results for each crop in turn. If the field is rectangular, we can grow
different crops in each row and supply different nutrients in each column and so study the
interaction of two factors simultaneously. If the field is square, we can incorporate a third
factor. By replicating the sampling over many fields, very sophisticated interactions can be
studied.
One - Way Classification
Model:
where
yi, j =  +  i +
 i, j
,
 i ,j -> N (0, 1)
 = overall mean
 i = effect of the ith factor
 i, j = error term.
Hypothesis:
H0:
 1 = 2 = …
=
m
Factor (1) y1, 1 y1, 2 y1, 3
y1, n1
(2) y2, 1 y2,, 2 y2, 3 y1, n2
(m) ym, 1 ym, 2 ym, 3
ym, nm
y =   yi, j / n,
Overall mean
Totals
T1 =  y1, j
Means
y1. = T1 / n1
T2 =  y2, j
y2. = T2 / n2
Tm =  ym, j
ym. = Tm / nm
where n =  ni
Decomposition of Sums of Squares:
2
 ni (yi . - y )2
=
  (yi, j - y )
+   (yi, j - yi. )2
Total Variation (Q) = Between Factors (Q1) + Residual Variation (QE )
Under H0:
Q / (n-1) -> 
Q1 / ( m - 1 )
QE / ( n - m )
AOV Table: Variation
2
n - 1,
Q1 / (m - 1) -> 
2
m - 1,
QE / (n - m) ->  2n - m
-> Fm - 1, n - m
D.F.
Sums of Squares
Mean Squares
Between
m -1
Q1=  ni(yi. - y )2 MS1 = Q1/(m - 1)
Residual
n-m
QE=   (yi, j - yi .)2 MSE = QE/(n - m)
Total
n -1
Q =   (yi, j. - y )2
Q /( n - 1)
F
MS1/ MSE
Two - Way Classification
Factor I
Factor II y1, 1 y1, 2 y1, 3
y1,
ym, 1 ym, 2 ym, 3
Means
y1.
n
ym, n
ym .
Means
y. 1 y. 2 y. 3
y .n
y
Decomposition of Sums of Squares:
  (yi, j - y )2 = n (yi . - y )2 + m  (y. j - y )2 +   (yi, j - yi . - y. j + y)2
Total
Between
Between
Residual
Variation
Rows
Columns
Variation
Model:
H0:
yi, j =  +
 i +  j+
All  i are equal and all
AOV Table: Variation
Between
Rows
Between
Columns
Residual
Total
D.F.
 i, j ,

i, j
-> n ( 0, 1)
 j are equal
Sums of Squares
Mean Squares
F
m -1
Q1= n  (yi. - y )2
MS1 = Q1/(m - 1)
MS1/ MSE
n -1
Q2= m  (y.j - y )2
MS2 = Q2/(n - 1)
MS2/ MSE
(m-1)(n-1)
mn -1
QE=  (yi, j - yi . - y. j + y)2 MSE = QE/(m-1)(n-1)
Q = 
(yi, j. - y )2
Q /( mn - 1)
Two - Way AOV [Example]
Factor I
Factor II 1
2
3
4
Totals
Means
1
20
19
23
17
79
19.75
2
3
18
21
18
17
21
22
16
18
73
78
18.25 19.50
4
23
18
23
16
80
20.00
5 Totals Means
20
102 20.4
18
90 18.0
20
109 21.8
17
84 16.8
75
385
18.75
19.25
Variation d.f. S.S.
F
Rows
3 76.95 18.86**
Columns 4
8.50 1.57
Residual 12 16.30
Total
19 101.75
Note that many statistical packages, such as SPSS, are designed for analysing data that is
recorded with variables values in columns and individual observations in the rows.Thus the
AOV data above would be written as a set of columns or rows, based on the concepts shown:
Value of
Variable
Factor 1
Factor 2
20 18 21 23 20 19 18 17 18 18 23 21 22 23 20 17 16 18 16 17
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
Normal Regression Model ( p independent variables) - AOV
p
Model:
y =  0 +   i x i+
1
SSR =  ( yi - y ) 2
SSE =  ( yi - yi ) 2
SST =  ( yj - y ) 2
 ,
-> n (0, s)
Source
d.f. S.S. M.S.
F
Regression p SSR MSR MSR/MSE
Error
n-p-1 SSE MSE
Total
n -1 SST
-
-
Latin Squares
We can incorporate a third source of variation in our
models by the use of latin squares. A latin square is a
design with exactly one instance of each “letter” in
each row and column.
Model:
yi, j =  +

i
+ 
j
+ 
l
+

i, j
A
B
C
D
B
D
A
C
C
A
D
B

,
i, j
D
C
B
A
-> n ( 0, 1)
Latin Square Component
Column Effects
Row Effects
Decomposition of Sums of Squares (and degrees of freedom) :
  (yi, j - y )2 = n  (yi . - y )2 + n  (y. j - y )2 + n (y. l - y )2
+   (yi, j - yi . - y. j - yl + 2 y)2
Total
Between
Between
Latin Square
Residual
Variation
Rows
Columns
Variation
Variation
(n2 - 1)
(n - 1)
(n -1)
(n - 1)
(n - 1) (n - 2)
H0:
All
 i are equal, all

i
are equal and all 
i are
equal.
Experimental design is used heavily in management, educational and sociological
applications. Its popularity is based on the fact that the underlying normality conditions are
easy to justify, the concepts in the model are easy to understand and reliable software is
available.
Download