11 comparing cell means

advertisement
Comparing Cell Means
Planned Comparisons & Post Hoc
Tests
Questions






What is the main difference between planned
comparisons and post hoc tests?
Generate numbers (like 0 1, -1 or 1 –1/2, -1/2) to
create a contrast appropriate for a given problem.
How many independent comparisons can be made
in a given design?
What is the difference between a per comparison
and a familywise error rate?
How does Bonferroni deal with familywise error
rate problems?
What is the studentized range statistic? How is it
used?
Questions (2)




What is the difference between the Tukey HSD and
the Newman-Keuls?
What are the considerations when choosing a post
hoc test (what do you need to trade-off)?
Describe (make up) a concrete example where you
would use planned comparisons instead of an
overall F test. Explain why the planned
comparison is the proper analysis.
Describe (make up) a concrete example where you
would use a post hoc test. Explain why the post
hoc test is needed (not the specific choice of post
hoc test, but rather why post hoc test at all).
Planned vs. Post Hoc
• Planned Comparisons or Contrasts
– Use instead of overall F test. Planned
before the study.
• Post Hoc or Incidental tests.
– Use after significant overall F test to
investigate specific means. No specific
plan before study.
Control
Comp
Tutor
Comp
Tutor+
Lab
Comp
Tutor +
lab +
quiz
Planned Comparisons (1)
Population Comparison:
J
  c11  c2 2  ...  cJ  J   c j  j
j 1
Weights are real numbers not all zero. Sum of weights
must equal zero.
J
c
j 1
j
0
Sample Comparison:
J
est.  ˆ  c1 y1  c2 y2  ...  cJ y J   c j y j
j 1
J
c
j 1
j
0
Planned Comparison (2)
(3 possible comparisons)
(Data)
yj 
A1 A2 A3 A4
Comparison A1
A2
A3
22
26
28
21
1
1/2
1/2
-1/2 -1/2
15
27
31
21
17
24
27
26
2
1
-1
0
0
18
23
26
20
3
0
0
1
-1
18
25
28
22
Source SS df
(Summary Table)
Cells
(A1-A4)
219
3
Error
72
12
Total
291
15
M F
S
73
6
12.17
A4
ˆ1  (.5 *18)  (.5 * 25)
 (.5 * 28)  (.5 * 22)  3.5
ˆ 2  18  25  7
ˆ 3  (0 *18)  (0 * 25)
 (28)  (22)  6
Sampling Variance of Planned
Comparisons
The sample comparison is an unbiased estimate of the
population comparison. E(ˆ )  
The variance of the sampling distribution of the comparison:
Var (ˆ )   c 2j var( y j )   e2 
j
j
c 2j
nj
Sampling variance will be large when within cells
variance is large, the weights are large, and the number
of people in each cell is small. Estimated by:
c 2j We substitute MS error
est. Var (ˆ )  ( MS error )
2

for e
j nj
Significance Test
A1 A2 A3 A4
22
26
28
21
15
27
31
21
17
24
27
26
18
23
26
20
18
25
28
22
ˆ1  (.5 *18)  (.5 * 25)
 (.5 * 28)  (.5 * 22)  3.5
est. Var (ˆ )  ( MS error )
j
nj
.52  .52  (.5) 2  (.5) 2
est. Var (ˆ )  6
 3/ 2
4
Source SS df
M
S
F
Cells
(A1-A4)
219
73
12.17
Error
72
12
Total
291
15
3
c 2j
est . SE (ˆ )  3 / 2  1.2247
ˆ
 3.5
t

 2.86
est. var(ˆ ) 1.2247
df=N-J; 16-4=12=dfe.
6
t( crit,  .05, df 12)  2.18
t(12) =-2.86, p < .05
Significance Test
A1 A2 A3 A4
ˆ 2  A1 vs. A2  18  25  7
22
26
28
21
15
27
31
21
17
24
27
26
est. Var (ˆ )  ( MS error )
18
23
26
20
est. Var(ˆ ) 
18
25
28
22
j
Source SS df
M
S
F
Cells
(A1-A4)
219
73
12.17
Error
72
12
Total
291
15
3
c 2j
nj
est. SE(ˆ ) 
t
df=N-J=
6
tcrit 
Review
What is the main difference between planned
comparisons and post hoc tests?
Suppose I do a blind orange juice taste test
and discover that my means are:
Tropicana
7.3
Florida
Fresh
5.5
Pulpmaster
6.4
If my hypothesis is that Tropicana is better than
all others, what are my contrast weights?
Independence of Planned
Comparisons
You can make several planned comparisons on the same
data.
Some of these comparisons are independent; some are
dependent. We want them independent. Two
comparisons from a normal population with equal
sample sizes in each cell are independent if the sum of
the products of weights is zero.
j c1 j c2 j  0
With unequal sample sizes, it’s:

j
c1 j c2 j
nj
0
c
Independence (2)
j
Comparison
A1
A2
A3
A4
1
-1/3
1
-1/3
-1/3
2
-1/2
0
-1/2
1
3
1/2
1/2
-1/2
-1/2
c
c 0
1j 2 j
c  (1 / 3 * 1 / 2)  (1* 0) 
1j 2 j
j
(1 / 3 * 1 / 2)  (1 / 3 *1)  0
c
c  (1 / 3 *1 / 2)  (1*1 / 2) 
1j 3j
j
(1 / 3 * 1 / 2)  (1 / 3 * 1 / 2)  4 / 6  2 / 3
One and two are orthogonal; one and three are not.
There are J-1 orthogonal comparisons. Use only what
you need.
Choosing Comparisons
Usually done on basis of theory. But there are methods
to generate all possible orthogonal comparisons.
Group
1
2
3
4
5
Comparison 1
4
-1
-1
-1
-1
2
0
3
-1
-1
-1
3
0
0
2
-1
-1
4
0
0
0
1
-1
Error Rates
• With 1 test, we set alpha = Type I error rate.
• With multiple tests, original (nominal) alpha
is called the per comparison error rate 
( PC ).
• With comparisons, we have a family of tests
on the same data. Want to know the
probability of at least 1 Type I error in the
family of tests. Such a probability is called
familywise error rate 
( FW ).
• For independent tests,  FW  1  (1   PC ) K
• E.g., 10 tests:
 FW  1  (1  .05)10  1  .9510  1  .6  .40
Bonferroni Tests
• Familywise error
depends on the
*
 FW
number of tests (K)
 BPC 
and the nominal
K
alpha,  PC .
*
• Bonferroni’s
Where  FW is an aspiration level.
solution is to set:
.05
• Suppose we want
 BPC 
 .0125
FW error to be .05
4
and we will have 4
We use the adjusted alpha
comparisons. Then
(.0125) for each of the 4
tests.
Bonferroni Test (2)
• Use the adjusted alpha (e.g., .0125) for
each comparison.
• Look at the p value on the printout (use
.0125 instead of .05).
• Use a statistical function (e.g., Excel,
SAS) if you want to find the critical
value.
• E.g., Excel function TINV says with
p=.0125 and df=12, t is 2.93.
Review
How many independent comparisons can be
made in a given design?
What is the difference between a per
comparison and a familywise error rate?
How does Bonferroni deal with familywise
error rate problems?
Post Hoc Tests
• Given a significant F, where are the
mean differences?
• Often do not have planned
comparisons.
• Usually compare pairs of means.
• There are many methods of post hoc
(after the fact) tests.
Scheffé
• Can use for any contrast. Follows same
calculations, but uses different critical
values. t  ˆ
est. var(ˆ )
• Instead of comparing the test statistic to
a critical value of t, use:
S  ( J  1) F
Where the F comes from the overall F test (J-1 and N-J df).
Scheffé (2)
(Data from earlier
problem.)
ˆ1  (.5 *18)  (.5 * 25)
Source SS df
M
S
F
Cells
(A1-A4)
219
3
73
12.17
Error
72
12
6
291
15
 (.5 * 28)  (.5 * 22)  3.5 Total
.52  .52  (.5) 2  (.5) 2
est. Var (ˆ )  6
 3/ 2
4
ˆ
 3.5
t

 2.86 F( .05, 3,12)
est. var(ˆ ) 1.2247
 3.49
S  ( J  1) F  (4  1)3.49  3.24
The comparison is not significant because |-2.86|<3.24.
Paired comparisons
Newman Keuls and Tukey HSD are two (of many) choices.
Both depend on q, the studentized range statistic. Suppose
we have J independent sample means and we find the
largest and the smallest.
y  ymin
q  max
MS error / n
MSerror comes from the ANOVA we
did to get the J means. The n refers to
sample size per cell. If two cells are
unequal, use 2n1n2/(n1+n2).
The sampling distribution of q depends on k, the number
of means covered by the range (max-min), and on v, the
degrees of freedom for MSerror.
Tukey HSD
HSD = honestly significant difference. For HSD, use k
= J, the number of groups in the study. Choose alpha,
and find the df for error. Look up the value qα. Then
find the value:
MS error
HSD  q
n
Compare HSD to the absolute value of the difference
between all pairs of means. Any difference larger than
HSD is significant.
HSD 2
Grp ->
1
2
3
4
5
M ->
63
82
80
77
70
Source
SS
df
MS
F
p
Grps
2942.4
4
725.6
4.13
<.05
Error
9801.0
55
178.2
K = 5 groups; n=12 per group, v has 55 df. Tabled value
of q with alpha =.05 is 3.98.
HSD  q
MS error
178.2
 3.98
 15.34
n
12
Group
1
5
4
3
2
1
63
0
7
14
17*
19*
5
70
0
7
10
12
4
77
0
3
5
3
80
0
2
2
82
0
Newman-Keuls
Group
1
2
3
4
5
1 63
0
7
14*
17*
19*
0
7
10
12
0
3
5
0
2
2 70
3 77
4 80
5 82
Layer refers to how
many means apart.
Layer 4
Layer 3
Layer 2
Layer 1
0
Same as HSD except the value of q changes with layers. For
layer k-1 (here 4), use HSD. For each layer down, subtract 1
from the value of k for the tabled value of q.
NK 4  q
MS error
178.2
 3.98
 15.34
n
12
NK 2  3.4
178.2
 13.09
12
NK 3  3.74
NK1  2.83
178.2
 14.40
12
178.2
 10.90
12
Comparing Post Hoc Tests
The Newman-Keuls found 3 significant differences in
our example. The HSD found 2 differences. If we had
used the Bonferroni approach,we would have found an
interval of 15.91 required for significance (and therefore
the same two significant as HSD). Thus, power
descends from the Newman-Keuls to the HSD to the
Bonferroni. The type I error rates go just the opposite,
the lowest to Bonferroni, then HSD and finally
Newman-Keuls. Do you want to be liberal or
conservative in your choice of tests? Type I error vs
Power.
Review
 What is the studentized range statistic? How is it
used?
 What is the difference between the Tukey HSD and
the Newman-Keuls?
 What are the considerations when choosing a post
hoc test (what do you need to trade-off)?
 Describe (make up) a concrete example where you
would use planned comparisons instead of an overall
F test. Explain why the planned comparison is the
proper analysis.
 Describe (make up) a concrete example where you
would use a post hoc test. Explain why the post hoc
test is needed (not the specific choice of post hoc test,
but rather why post hoc test at all).
Download