An Introduction to Analysis of Variance

advertisement
12.4 An Introduction to Analysis of Variance
Analysis of Variance (ANOVA): a statistical technique can be used to test
the hypothesis that the means of 3 or more populations are equal.
Motivating example:
Objective:
we want to compare the mean scores of the employees at 3 different
plants.
Let
1 : the mean score for plant 1
 2 : the mean score for plant 2
 3 : the mean score for plant 3
n1
x1 
6
x
i 1
1, i
n1

n2
x2 
x
2,i
i 1
n2
x3 
i 1
n3
3, i
1, i
i 1
6
 79 : the sample mean score for plant 1
6

n3
x
x
x
2,i
i 1
6
 74 : the sample mean score for plant 2
6

x
i 1
6
3, i
 66 : the sample mean score for plant 3
n  n1  n2  n3 , nT  n1  n2  n3  18 .
1
 x
6
s12 
i 1
 x1 
2
1, i
 x
6
s22 
i 1
 x2 
2
2,i
 20 : the sample variance of the scores for plant 2.
5
 x
6
s32 
 34 : the sample variance of the scores for plant 1.
5
i 1
 x3 
2
3, i
5
 32 : the sample variance of the scores for plant 1
We want to test
H 0 : 1   2  3 vs. H a : not all population means are equal
  0.05 .
with
3 assumptions for the above problem:
1. The scores for the employees in each plant are normally distributed.
2. The variance of the scores for 3 plants are the same.
3. The score for each employee must be independent of the scores for any
other employees.
Intuitively, as H 0 is true, the scores for the employees in 3 plants have the same
distributions since they have the same means and variances. Thus,
x1 , x2 , x3
use
can be considered as 3 possible values of
x1 , x2 , x3
as sample values of
X
X
. Furthermore, we can
. Then, the variance of
X
,
can be estimated by
3
s X2 
Since
 x
i 1
 x
2
i
3 1
 43, x 
x1  x 2  x3
3
2
 
  2  n X2 , the estimate of  2 is
n
2
X
2
.
 X2 ,
ns X2  6  43  258 .
ns X2 is referred to as the between-samples estimate of  2 .
2
2
Note: s X is only accurate as H 0 is true. That is , s X is not a
good estimate of
 X2 . As H 0 is not true, s X2 will be larger
(overestimate) than
 X2 . Thus, ns X2 might not be accurate as H 0
is not true.
The other estimate of
 2 , called the within-samples estimate of  2 , is
n1  1s12  n2  1s 22  n3  1s32
n1  1  n2  1  n3  1
5s12  5s 22  5s32 s12  s 22  s32


555
3
34  20  32

 28.67
3
.
2
Note: within-samples estimate of  is unbiased (accurate) no matter
H 0 is true or not. Within-samples estimate of  2 is in fact the pooled
estimate of
 2.
The statistic
between - samples estimate of  2   1
f 

within - samples estimate of  2  1
can be used to test
H 0 . Thus, in this example,
3
as H 0 is true
as H 0 is not true
,
ns X2
f 
3
 n  1s
i 1
2
i
i

258
9
28.67
nT  3
General Case:
Suppose there are K populations. The data are the following
Populations
Samples
1
x1,1 , x1, 2 ,  , x1, n1
2
x2 ,1 , x2 , 2 , , x2, n2


k
xk ,1 , xk , 2 ,  , xk , nk
Let
nT  n1  n2    nk
x j ,i , i  1,, n j ; j  1, , k : the i’th sample value form population k.
nj
xj 
x
i 1
nj
k
x
j ,i
nj
 x
j 1 i 1
 x
i 1
j ,i
: the overall mean.
nT
nj
s 2j 
, j  1,, k : the sample mean for population j.
 xj 
2
j ,i
nj 1
, j  1,, k : the sample variance for
population j.
4
Two estimate of
 2 , Mean Square Between (MSB) and Mean Square
Within (MSW), can be used. MSB is the between-samples estimate
2
2
of  while MSW is the within-samples estimate of  .
MSB and MSW are
 n j x j  x 
k
MSB 
2
j 1
k 1
and
 n
k
MSW 
j 1
j
 1s 2j
n1  1  n2  1    nk  1
 x
k
nj
j 1 i 1

 xj 
2
j ,i
.
nT  k
As H 0 is not true, MSB might not be unbiased (accurate) On the other
hand, MSW is an accurate estimate of
 2 no matter H 0 is true or
not. Thus,
between - samples estimate of  2
MSB
f 

within - samples estimate of  2
MSW
 n x
k
j 1

j
 n
k
j 1
j
 x
2
j
 1s 2j
k 1
nT  k
5
can be used to test H 0 .
MSB
f 
MSW
as H 0 is true
 1

 1 as H 0 is not true
Next question: how large f must be to reject H 0 .
6
.
Download