One-way Anova: Inferences about More than Two Population Means

advertisement
Always be mindful of the kindness and not the faults of others.
1
One-way Anova: Inferences about
More than Two Population Means
What
is Anova?
One-Way Anova; F tests
Pairwise comparisons:
Bonferroni procedure
2
Analysis of Variance & One
Factor Designs
Y= DEPENDENT VARIABLE
(“yield”)
(“response variable”)
(“quality indicator”)
X = INDEPENDENT VARIABLE
(A possibly influential FACTOR)
3
OBJECTIVE: To determine the impact of X on Y
Mathematical Model:
Y = f (x, ) , where  = (impact of) all
factors other than X
Ex:
Y = Forced expiratory volume in one second (liters)
X = Medical center
(John Hopkins, Rancho Los Amigos, St. Louis)
 = Many other factors (possibly, some we’re
unaware of)
4
Statistical Model
“LEVEL” OF Center
1
1
2
•
•
•
•
n
2 ••• • • •••C
Y11 Y12 • • • • • • •Y1c
Y21
•
•
•
•
•
•
Yij
•
•
•
•
YnI
(Brand is, of course, represented as
“categorical”)
• • • •
•
•
•
Yij = + j + ij
i = 1, . . . . . , nj
j = 1, . . . . . , C
•
•Ync
5
Where
= OVERALL AVERAGE
j = index for FACTOR (center) LEVEL
i = index for “replication”
j = Differential effect (response)
associated with jth level of X
and
ij = “noise” or “error” associated with the
(particular) (i,j)th data value.
Let j = AVERAGE associated with jth level of X
 j = j –  and  = AVERAGE of j .
6
1
Y11
2
3 ••••• C
Y12 • • • • • •Y1c
Y21
•
•
•
•
•
•
•
•
YRI
• • • • • • • • •
Y 1 Y 2•
• •
(Y j)
YRc
• •
Yc
Y1, Y2, etc., are Column Means
7
c
Y • = Y j C = “GRAND MEAN”
j=1
/
(assuming same # data points in each column)
(otherwise, Y • = mean of all the data)
8
MODEL:
Y•
Yj - Y •
Yij =  + j + ij
estimates 
estimates j (= j – )
(for all j)
These estimates are based on Gauss’ (1796)
PRINCIPLE OF LEAST SQUARES
and on COMMON SENSE
9
MODEL:
Yij =  + j + ij
If you insert the estimates into the MODEL,
<
(1)
Yij = Y • + (Yj - Y • ) + ij.
it follows that our estimate of ij is
(2)
ij = Yij - Yj
10
Then, Yij = Y• + (Yj - Y• ) + ( Yij - Yj)
{
{
{
or, (Yij - Y• ) = (Yj - Y•) + (Yij - Yj )
(3)
VARIABILITY
in Y
Variability
Variability
TOTAL
=
in Y
+
in Y
associated
associated
with X
with all other
factors
11
If you square both sides of (3), and double sum both sides
(over i and j), you get, [after some unpleasant algebra, but
lots of terms which “cancel”]
C nj
C
2
2
C nj
(Yij - Y• ) =  nj(Yj - Y•) + (Yij - Yj)
j=1
j=1 i=1
{
{
{
j=1 i=1
2
(
(
TSS
TOTAL SUM OF
SQUARES
=
SSB
+
=
SUM OF
SQUARES
BETWEEN
COLUMNS
+
(
SSW (SSE)
(
SUM OF SQUARES
WITHIN COLUMNS
12
ANOVA TABLE
SOURCE OF
VARIABILITY
SSQ
Between
Columns
(due to center) SSB
DF
Mean
(M.S.)
square
C-1
SSB
= MSB
C-1
SSW
N-C
Within
Columns
(due to other
factors)
SSW
N-C
TOTAL
TSS
N-1
= MSW
13
ANOVA TABLE
Source of
Variability
SSQ
df
CENTER
1.583
2
M.S.
0.791
= 3-1
ERROR
14.480
57
0.254
= 59 - 2
TOTAL
115.84
59
= 60 -1
14
We can show:
E ( MSB ) = 2 + VCOL
E ( MSW ) = 2
This suggests that
if
MSB
MSW
if
MSB
MSW
>
<
There’s some
evidence of non1 , zero V , or “level
COL
of X affects Y”
No evidence that
1,
VCOL > 0, or that
“level of X affects Y”
15
With HO:
HI:
Level of X
has no impact
on Y
Level of X
does have
impact on Y,
We need
MSB
MSW
>>1
to reject HO.
16
More Formally,
HO: 1 = 2 = • • • c = 0
HI: not all j = 0
OR
HO: 1 = 2 = • • • • c
(All column
means are equal)
HI: not all j are EQUAL
17
The distribution of
MSB
MSW
= “Fcalc” , is
The F - distribution with (dfB, dfw) degrees of freedom
Assuming
a
HO true.
Ca = Table Value
18
In our problem:
ANOVA TABLE
Source of
Variability
SSQ
df
CENTER
1.583
2
M.S.
0.791
= 3-1
ERROR
14.480
57
Fcalc
3.12=
0.791/
0.254
0.254
= 59 - 2
TOTAL
115.84
59
= 60 -1
19
F table: Table A-5
a = .05
C0.5 = 3.15
Fcal =3.12
(2, 57 DF)
20
Hence, at a = .05, Do Not Reject Ho ,
i.e., Conclude that centers don’t differ
significantly on FEV1 at 5% level. Pvalue is .052, so it is significant at 6%
level
21
Multiple Comparison Procedures
Once we reject H0: ==...c in favor of
H1: NOT all ’s are equal, we don’t yet
know the way in which they’re not all
equal, but simply that they’re not all the
same. If there are 4 columns, are all 4 ’s
different? Are 3 the same and one
different? If so, which one? etc.
Overall Type I Error Rate
We set up “a” as the significance level for a
hypothesis test. Suppose we test 3 independent
hypotheses, each at a= .05; each test has type I
error (rej H0 when it’s true) of .05. However,
P(at least one type I error in the 3 tests)
all ) = 1 - (.95)3  .14
= 1-P( accept
3, given true
Pairwise Comparisons
Bonferroni Correction:
Do a series of pairwise t-tests, each with
specified a value divided by # of comparisons.
MINITAB INPUT
center
fev1
1
1
1
1
3.23
3.47
1.86
2.47
.
.
.
2.85
2.43
3.20
3.53
.
.
.
3
3
3
3
25
ONE FACTOR ANOVA (MINITAB)
MINITAB: STAT>>ANOVA>>ONE-WAY
Click for comparisons
26
27
Minitab Outputs
Fisher 98.3% Individual Confidence Intervals
All Pairwise Comparisons among Levels of center
Simultaneous confidence level = 95.58%
center = 1 subtracted from:
center Lower Center Upper ------+---------+---------+---------+--2
-0.0049 0.4063 0.8176
(-----------*----------)
3
-0.1215 0.2525 0.6266
(---------*----------)
------+---------+---------+---------+---0.35
0.00
0.35
0.70
center = 2 subtracted from:
center Lower Center Upper ------+---------+---------+---------+--3
-0.5572 -0.1538 0.2496 (-----------*----------)
------+---------+---------+---------+---0.35
0.00
0.35
0.70
28
Download