3.0 ANOVA - ANALYSIS OF VARIANCE

advertisement
3.0 ANOVA - ANALYSIS OF VARIANCE
(updated Spring 2001)
ANOVA is extensively used in the science and engineering.
Study the coagulation time for samples of blood drawn from 24 animals
receiving 4 different diets. Does diet affect blood coagulation time? Twenty
four (24) animals were randomly allocated to four diets (A,B,C, and D).
The blood samples were taken and tested in random order.
The analysis that follows is exactly justified if the data are random samples
from four normal populations, and is approximately justified based on the
randomization distn idea presented previously.
2
We will assume that the four populations have equal variance, σ y . Is there
evidence to indicate real differences between the diets/treatments?
H0: µ A = µ B = µ C = µ D .
HA: At least one of the means differs from the others. In the following
table, test order is listed in the parenthesis.
DIET (Treatment)
A
B
C
D
----------------------------------------------------------------------63 (12)
62 (20)
68 (16)
56 (23)
67 (9)
60 (2)
66 (7)
62 (3)
71 (15)
63 (11)
71 (1)
60 (6)
64 (14)
59 (10)
67 (17)
61 (18)
65 (4)
68 (13)
63 (22)
66 (8)
68 (21)
64 (19)
63 (5)
59 (24)
nA =6
Treat. Avg. Y A = 66
Grand Avg. 64
nB = 4
nC = 6
nD = 8
Y B = 61
Y C = 68
Y D = 61
Are the differences between the treatment averages greater than expected
given the variation within treatments?
22
• Variation within treatments
NA
∑ ( Y Aj – Y A )
2
Sum of squares within treatment A
S
2
j=1
- = -----A- =
s A = -------------------------------------na – 1
νA
Degree of freedom for treatment A
2
2
2
S
2
( 63 – 66 ) + ( 67 – 66 ) + … + ( 66 – 66 )
40
s A = ----------------------------------------------------------------------------------------------------- = -----A- = ------ = 8
vA
6–1
5
Similarly,
S
2
10
s B = -----B- = ------ = 3.33
3
νB
S
2
14
s C = -----C- = ------ = 2.8
νC
5
S
2
48
s D = -----D- = ------ = 6.857
νD
7
2
2
2
2
Since s A , s B , s C , and s D all provide estimates of the unknown variance,
2
σ y . They may be combined to provide a pooled estimate.
2
2
sP
2
2
2
S A + SB + SC + SD
ν A s A + νB sB + νC sC + νD sD
= -------------------------------------------------------------------= -------------------------------------------ν A + νB + νC + νD
ν A + νB + νC + νD
We’ll refer to this sample variance as the within treatment sample variance.
SW
2
40 + 10 + 14 + 48
112
- = ------------------------------------------- = --------s w = -----νW
5+3+5+7
20
Within Treatment Sum of Squares
= ------------------------------------------------------------------------------------------Within Treatment Degree of Freedom
2
s w = 5.6. It is the within treatment mean square estimate of the true
2
variance, σ y .
One-Way ANOVA
In general,
23
nI
k
∑ ∑ ( Y ij – Y i )
Σ WithinTreatment =
2
i=1j=1
S WithinTreatment
2
sw =
Total number of observations - Number of treatment
Variation Between Treatments
2
If there is no difference between treatment means, a second estimate of σ y
can be obtained based on the variation of the treatment means about y .
=
( Y treatment – Y )
∑
OverAllTreatments
If we calculate ---------------------------------------------------------------------------------- , it can be used as a
#Treatment - 1
2
2
estimate of σ y .
2
2
2
But we want an estimate of σ y . Since σ y = n σ y , let’s scale
=
2
( y treatment – y ) by ntreat.
So, in general,
k
=
∑ ni ( Y treatment – Y
)
2
S
2
--------------------------------------------------------- = s BT = -----Bk–1
νB
.
i=1
where k is the number of treatments. It is the Between Treatment estimate
2
of σ y , or, the Between Treatment Mean Square.
For our Problem,
Treatment
Yi
=
Y
=
(Y i – Y )
ni
A
66
B
61
C
68
D
61
64
64
64
64
2
6
-3
4
4
6
-3
8
24
S B = 6(2)2 + 4(-3)2 +6(4)2 +8(-3)2 = 228, νB = 3,
2
228
S
and s B = ------ = --------- = 76.0
3
νB
Summary:
Within Treatments:
Between Treatments:
2
Sum of Squares:
Degrees of Freedom:
SW = 112
νW = 20
Mean Square:
s W = 5.6
Sum of Square
Degrees of Freedom
SB = 228
νB = 3
Mean Square
s B = 76.0
2
2
2
2
Under H0, both s W and s B are estimates of σ y
If there is a difference between treatment means, the between treatment
variance will be inflated. If a sample of size n is drawn from a normal
2
population with variance σ , The quantity,
2
s (n – 1)
--------------------2
σ
2
2
2
2
s (n – 1)
- can be
follows the χ distribution. If σ is postulated, χ calc = --------------------2
σ
2
computed and compared to the existing values. If σ is not postulated, a
2
100(1- α )% confidence interval for σ can be calculated.
2
2
s (n – 1)
2 s (n – 1)
--------------------- ≤ σ ≤ --------------------2
2
χ
χ
α
α
ν, 1 – --2
ν, --2
25
χ
χ
2
χ
α
ν, --2
2
2
α
ν, 1 – --2
• Sampling distn of Two Variances
2
If sample of size n1 is drawn from normal distn with a variance of σ 1 . and
2
sample of size n2 is drawn from normal distn with a variance of σ 2 ,
2
2
estimates of the population variances may be calculated: s 1 and s 2 .
2
2
2
s1
χ ν1
s2
The quantity -----2- is -------- distributed and -----2- is
ν1
σ1
σ2
2
χ v2
-------- distributed. The ratio
ν2
2
χ ν1 ⁄ ν 1
----------------- follows the F distn with ν1, ν2 degrees of freedom.
2
χ ν2 ⁄ ν 2
Therefore,
2
2
s1 ⁄ σ1
--------------~F
ν1, ν 2
2
2
s2 ⁄ σ2
or
2 2
s1 σ2
----2 -----2- ∼ F ν1, ν2
s2 σ1
2
2
If σ 1 = σ 2 , then
2
s1
----~F
ν1, ν2
2
s2
2
For the blood coagulation effect problem, s W = 5.6 (νw=20) and
2
s B = 76. 0 (νB=3)
2
2
2
Under H0, both are estimates of σ y , and F calc = s B ⁄ s W = 13.57 . Note
26
that this ratio should be distributed according to F ν1 = 3, ν2 = 20 under H0.
From a F-distn table, it is found that F (3,20,.95) = 3.10 and F(3,20,.99) =
4.94. Since both numbers are less than Fcalc, strong evidence shows that
2
Fcalc is not a typical value. This means that s B is inflated by differences
between µ A , µ B , µ C , and µ D . Therefore, reject H0. At least one of the
means is different.
To confirm ANOVA results, we would like to display Y’s on their reference
distn. It may be difficult, since each Y is based on different Degrees of
2
Freedom ( σ y differs from each other).
=
Y
2
= 64 , s y = 5.6 , sy = 2.366. We need to calculate a t for each Y.
A
B
C
D
---------------------------------------------------------------66
61
68
61
6
4
6
8
0.966
1.183
0.966
0.837
2.070
-2.536
4.141
-3.584
Y
n
sY
tcalc
t20
-2.54
2.07
-3.58
-4
-3
-2
-1
0
1
2
4.14
3
4
5
The four calculated t points are plotted on a t20 distribution plot. From the
plot, we can ask if these four t values be drawn randomly from a t20 distn.
Decomposition of the variance
The coagulation time for the ith treatment and jth trial within the treatment
is Yij where i goes from 1 to k (k is the number of treatments/diets) and j
goes from 1 to ni (ni is the number of trials for the ith treatment).
The coagulation time, Yij, may be expressed as
27
=
=
Yij = Y + ( Y i – Y ) + ( Y ij – Y i )
Squaring both sides and summing over all trials and treatments, and after
simplification,
ni
k
∑∑
ni
k
2
Y ij
=
∑ ∑Y
ni
∑∑
2
Y ij
= 2
= NY
i=1j=1
k
STOT =
k
ni
=
∑ ∑ (Y i – Y
+
i=1j=1
= 1 j= 1
k
= 2
2
) +
i=1j=1
k
+
k
ni
∑ ∑ ( Y ij – Y i )
2
i=1j=1
=
∑ ni ( Y i – Y
2
) +
k
ni
∑ ∑ ( Y ij – Y i )
2
i=1j=1
i=1
ni
∑ ∑ Y ij , is the Total Sum of Squares
2
i=1j=1
= 2
SAVG = N Y
k
SB =
=
∑ ni ( Y i – Y
i=1
ni
k
SW =
, is the Sum of Squares Due to the Mean
2
) , is the Between Treatment Sum of Squares.
∑ ∑ ( Y ij – Y i )
2
, is the Within Treatment Sum of Squares.
i=1j=1
For our blood coagulation time example, STOT = 98644, SAVG = 98304, SB
= 228, SW = 112.
We observe that in fact SAVG + SB + SW = STOT
Additionally, νTOT = νAVG + νB +νW
(24) = (1) + (3) + (20)
What we know can be summarized in an ANOVA Table
Source of Var.
SS.
D.of F.
Mean Sqr.
Fcalc
------------------------------------------------------------------------------------Average
98304
1
98304
17554.3
Between Treat. 228
3
76
13.57
Within Treat.
112
20
5.6
Total
98644
24
In the above table, the average mean square error (98304) is an estimate of
2
σ y under the assumption that µy=0. The Between Treatment Mean Square
2
Error (76) is an estimate of σ y under the assumption that
28
µ A = µB = µC = µD .
To test the hypothesis that µ A = µ B = µ C = µ D , compare Between/
Within Treatment Mean Square
2
sB
----- = 13.57 = F calc
2
sW
to F ν1 = 3, ν2 = 20 . F(3,20,.95) = 3.10 and F (3,20,.99) = 4.94.
To test the hypothesis that µ y = 0 , compare average and within treatment
mean squares
2
s AVG
----------- = 17554.3 to F V 1 = 3, V 2 = 20 . F (1,20,.95) = 4.35 and F (1,20,.99) =
2
sW
8.10.
So, the average is significant, so as are the treatments.
Earlier, we expressed the response as:
=
=
Y ij = Y + ( Y i – Y ) + ( Y ij – Y i )
=
=
where Y is the grand mean. ( Y i – Y ) is treatment effect, and ( Y ij – Y i )
is experimental error.
y = η + τi + ε
=
y is an estimate of η . A model for the response is:
ŷ = η̂ + τˆi
In blood coagulation experiment, we studied one variable, diet. We will
now look at two variables (1 blocking variables and 1 treatment variable or
2 blocking variables).
29
Download