pptx

advertisement
Analysis of Variance
11/6
Comparing Several Groups
Section:
M 8-10 M 12-2 T 9-11
W 8-10 R 9-11
Exam 1 Mean:
76.6%
79.5%
82.5%
83.7%
82.9%
• Do the group means differ?
• Naive approach
– Independent-samples t-tests of all pairs
– Each test doesn't use all data  Less power
– 10 total tests  Greater chance of Type I error
• Analysis of Variance (ANOVA)
–
–
–
–
Single test for any group differences
Null hypothesis: All means are equal
Works using variance of the sample means
Also based on separating explained and unexplained variance
Variance of Sample Means
• Say we have k groups of n subjects each
– Want to test whether mi are all equal for i = 1,…,k
• Even if all population means are equal, sample means will vary
–
var(M)
s
2
or
n × var(M)
n
s
0.8
n
0.4
n
s2
0.0
• Compare actual variance of sample means
to amount expected by chance
s2
p
– Central limit theorem tells how much: s =
2
M
-4
-2
m
0
2
4
z
2
– Estimate s2 using MSresidual
n × var(M)
– Test statistic: F =
MSresidual
• If F is large enough
– Sample means vary more than expected by chance
– Reject null hypothesis
M1
M3 M2
Mk
Residual Mean Square
• Best estimate of population variance, s2
• Based on remaining variability after removing group
differences
• Extends MS for independent-samples t-test
å ( X 1 - M 1) + å
2
2 groups: MSresidual =
group 1
group 2
( X 2 - M 2)
n1 + n2 - 2
å ( X 1 - M 1 ) +… + å
2
k groups:
MSresidual
=
2
group 1
group k
n1 +… + nk - k
( Xk - Mk )
2
Example: Lab Sections
• Group means:
M = [76.6 82.5 83.7 79.5 82.9]
• Variance of group means:
mean(M ) =
åi M i
5
= 81.0
s2
22
• Var(M) expected by chance:
var( M ) =
var(M ) vs.
– Instead:
åi ( M i - 81.0)
5 -1
= 8.69
s2
22
22 × var(M ) vs. s 2
191.1 vs. s2
å ( X - M 1) + å ( X - M 2 ) + å ( X - M 3) + å ( X - M 4 ) + å ( X - M 5)
2
• Estimate of s2:
2
MS = Sec 1
2
Sec 2
2
Sec 3
2
Sec 4
110 - 5
Sec 5
p
p = .44
0.002
0.004
0.004
0.002
0.000
0.000
p
F4, 105 =
0.006
0.006
• Ratio of between-group variance to amount expected by chance:
P(F4, 105)
0
1
F = .96
2
3
Fcrit = 2.46
f[1:500]
4
5
0
1
F = .96
2
2
3
f[1:500]
4
5
= 199.1
191.1
= .96
199.1
Partitioning Sum of Squares
•
General approach to ANOVA uses sums of squares
– Works with unequal sample sizes and other complications
– Breaks total variability into explained and unexplained parts
•
SStotal
– Sum of squares for all data, treated as a single sample
– Based on differences from grand mean, M
SStotal =
å
all groups
•
(X - M)
2
SSresidual (SSwithin)
– Variability within groups, just like with independent-samples t-test
– Not explainable by group differences
æ
ö
2
2
2
SSresidual = å çç å ( X i - M i ) ÷÷ =
å ( X 1 - M 1) +… + å ( X k - M k )
i=1 to k è group i
group 1
group k
ø
•
SStreatment (SSbetween)
– Variability explainable by difference among groups
– Treat each datum as its group mean and take difference from grand mean
æ
2ö
SStreatment = å ç ni × ( M i - M ) ÷
è
ø
i=1 to k
Magic of Squares
SStotal = SStreatment + SSresidual
M1
×5 M
×6
M2
SStotal = S(X – M)2
×5
×4
M3
SSresidual = Si(S(Xi – Mi)2)
M4
SStreatment = Si(ni(Mi – M)2)
Test Statistic for ANOVA
• Goal: Determine whether group differences account
for more variability than expected by chance
• Same approach as with regression
• Calculate Mean Square: MStreatment
• H0: MStreatment ≈ s2
• Estimate s2 using MSresidual =
MStreatment
• Test statistic: F =
MSresidual
SStreatment
=
df treatment
SSresidual
df residual
Lab Sections Revisited
• Two approaches
– Variance of section means
– Mean Square for treatment (lab section)
• Variance of group means
– Var(76.6 82.5 83.7 79.5 82.9) = 8.69
– Expected by chance: s2/22
– Var(M) vs. s2/22, or 22*var(M) vs. s2
• Mean Square for treatment
22
MStreatment =
mean(M)
åi n`i ( M i - M )
2
= 22× var( M )
5 -1
22× ( 76.6 - 81.0) + 22× ( 82.5 - 81.0) + 22× ( 83.7 - 81.0) + 22× (79.5 - 81.0) + 22× (82.9 - 81.0)
2
=
F=
191.1
191.1
=
= .96
MSresidual 199.1
2
2
4
2
2
= 191.1
Two Views of ANOVA
s2
0.8
• Simple: Equal sample sizes
var(M)
s
2
=
n×var(M)
0.4
F=
s2
0.0
p
n
n
-4
-2
n×var(M)
=
MSresidual
m
0
2
4
z
M1
M3 M2
Mk
• General: Using sums of squares
F=
MStreatment
MSresidual
MStreatment
If sample sizes equal:
SStreatment åi ni ( M i - M )
=
=
df treatment
k -1
2
åi ( M i - M )
2
MStreatment = n×
k -1
= n×var(M)
Degrees of Freedom
SStreatment » SS(M )
dftreatment = k – 1
SStotal
= å( X - M )
2
dftotal = Sni – 1
SS residual = å SS( X i )
i
dfresidual = S(ni – 1)
= Sni – k .
Download