Multiple Comparisons

advertisement
Everyday is a new beginning in life.
Every moment is a time for self vigilance.
Multiple Comparisons
Error
rate of control
Pairwise comparisons
Comparisons to a control
Linear contrasts
Multiple Comparison Procedures
Once we reject H0: ==...c in favor of
H1: NOT all ’s are equal, we don’t yet
know the way in which they’re not all
equal, but simply that they’re not all the
same. If there are 4 columns, are all 4 ’s
different? Are 3 the same and one
different? If so, which one? etc.
These “more detailed” inquiries into the
process are called MULTIPLE
COMPARISON PROCEDURES.
Errors (Type I):
We set up “” as the significance level for a
hypothesis test. Suppose we test 3 independent
hypotheses, each at = .05; each test has type I
error (rej H0 when it’s true) of .05. However,
P(at least one type I error in the 3 tests)
= 1-P( accept all ) = 1 - (.95)3  .14
3, given true
In other words, Probability is .14 that at
least one type one error is made. For 5
tests, prob = .23.
Question - Should we choose = .05,
and suffer (for 5 tests) a .23
Experimentwise Error rate (“a” or E)?
OR
Should we choose/control the overall
error rate, “a”, to be .05, and find the
individual test  by 1 - (1-)5 = .05, (which
gives us  = .011)?
The formula 1 - (1-)5 = .05
would be valid only if the tests are
independent; often they’re not.
1
2
3
[ e.g., 1=22= 3, 1= 3
IF 1 accepted & 2 rejected, isn’t
it more likely that 3 rejected? ]
Error Rates
When the tests are not independent, it’s
usually very difficult to arrive at the
correct for an individual test so that a
specified value results for the
experimentwise error rate (or called
family error rate).
There are many multiple comparison
procedures. We’ll cover only a few.
Pairwise Comparisons
Method 1: (Fisher Test) Do a series of
pairwise t-tests, each with specified  value
(for individual test).
This is called “Fisher’s LEAST SIGNIFICANT
DIFFERENCE” (LSD).
Example: Broker Study
A financial firm would like to determine if brokers they use to
execute trades differ with respect to their ability to provide a
stock purchase for the firm at a low buying price per share. To
measure cost, an index, Y, is used.
Y=1000(A-P)/A
where
P=per share price paid for the stock;
A=average of high price and low price per share, for
the day.
“The higher Y is the better the trade is.”
CoL: broker
1
12
3
5
-1
12
5
6
2
7
17
13
11
7
17
12
3
8
1
7
4
3
7
5
4
21
10
15
12
20
6
14
5
24
13
14
18
14
19
17
}
R=6
Five brokers were in the study and six trades
were randomly assigned to each broker.
Source SSQ df MSQ F
Col 640.8 4 160.2 7.56
Error
530
25 21.2
“MSW”
 = .05, FTV = 2.76
(reject equal column MEANS)
For any comparison of 2 columns,
/2
/2
Yi -Yj
CL
0
Cu
AR: 0+ t/2 x MSW x 1n + 1n
dfw
MSW :
i
j
(ni = nj = 6, here)
Pooled Variance, the estimate
for the common variance
In our example, with=.05
0  2.060 (21.2 x 16 + 16 )
0 5.48
This value, 5.48 is called the Least
Significant Difference (LSD).
When same number of data points, R,
in each column, LSD = t/2 x 2xMSW
R .
Underline Diagram
Summarize the comparison results. (p. 443)
1. Now, rank order and compare:
Col: 3
1
2
4
5
5
6
12 14 17
Step 2: identify difference > 5.48, and
mark accordingly: 3 1 2 4 5
5
3:
6 12 14 17
compare the pair of means within
each subset:
Comparison difference vs. LSD
<
3 vs. 1
*
<
2 vs. 4
*
5
<
2 vs. 5
<
4 vs. 5
*
* Contiguous; no need to detail
Conclusion : 3, 1 2, 4, 5
Can get “inconsistency”: Suppose col 5
were 18:
3 1 2 4 5
5
6
12 14
18
Now: Comparison |difference| vs. LSD
<
3 vs. 1
*
<
2 vs. 4
*
6
2 vs. 5
4 vs. 5
<
*
Conclusion : 3, 1 2 4 5 ???
>
Conclusion : 3, 1 2 4 5
• Broker 1 and 3 are not significantly
different but they are significantly
different to the other 3 brokers.
• Broker 2 and 4 are not significantly different,
and broker 4 and 5 are not significantly
different, but broker 2 is different to (smaller
than) broker 5 significantly.
MULTIPLE COMPARISON TESTIN G
AFS
BROKER ---->
TRADE 1
2
3
4
5
6
COLUMN MEAN
BROKER STUD Y
1
2
3
12
7
8
3
17
1
5
13
7
-1
11
4
12
7
3
5
17
7
6
12
5
4
21
10
15
12
20
6
14
5
24
13
14
18
14
19
17
AN OVA TABLE
SOURCE
SSQ
DF
MS
Fcalc
BROKER
640.8
4
160.2
7.56
ERROR
530
25
21.2
Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”.
Fisher's pairwise comparisons (Minitab)
Family error rate = 0.268
Individual error rate = 0.0500
Critical value = 2.060  t_/2 (not given in version 16.1)
Intervals for (column level mean) - (row level mean)
1
2
2
3
4
-11.476
-0.524
Col 1 < Col 2
3
4
5
-4.476
1.524
6.476
12.476
-13.476
-7.476
-14.476
-2.524
3.476
-3.524
-16.476
-10.476
-17.476
-8.476
-5.524
0.476
-6.524
2.476
Col 2 = Col 4
Minitab Output for Broker Data
• Grouping Information Using Fisher Method
•
•
•
•
•
•
broker
5
6
4
6
2
6
1
6
3
6
N Mean Grouping
17.000 A
14.000 A
12.000 A
6.000 B
5.000 B
• Means that do not share a letter are significantly
different.
Pairwise comparisons
Method 2: (Tukey Test) A procedure which
controls the experimentwise error rate is
“TUKEY’S HONESTLY SIGNIFICANT
DIFFERENCE TEST ”.
Tukey’s method works in a similar way
to Fisher’s LSD, except that the “LSD”
counterpart (“HSD”) is not
t/2 x MSW x  1n + 1n
i
(
or, for equal number
of data points/col
but tuk /2
X 2xMSW
R
)
j
= t/2 x 2xMSW ,
R
,
where tuk has been computed to take
into account all the inter-dependencies
of the different comparisons.
HSD = tuk/2x2MSW
R
_______________________________________
A more general approach is to write
HSD = qxMSW
where
q = tuk

/2
R

x 2
--- q = (Ylargest - Ysmallest) / MSW R
---- probability distribution of q is called the
“Studentized Range Distribution”.
--- q = q(c, df), where c =number of columns,
and df = df of MSW
With c = 5 and df = 25,
from table (or Minitab):
q = 4.15
tuk = 4.15/1.414 = 2.93
Then,
HSD = 4.15 ./6=7.80
also.93x./6=7.80
In our earlier
example:
3
1
2
4
5
5
6 12 14 17
Rank order:
(No differences [contiguous] > 7.80)
Comparison |difference| >or< 7.80
<
3 vs. 1
(contiguous)
*
7
<
3 vs. 2
>
9
3 vs. 4
>
12
3 vs. 5
<
*
1 vs. 2
>
8
1 vs. 4
>
11
1 vs. 5
<
*
2 vs. 4
<
5
2 vs. 5
<
*
4 vs. 5
3, 1, 2 4, 5
2 is “same as 1 and 3, but also same as 4 and 5.”
Minitab: Stat>>ANOVA>>One-Way Anova then click “comparisons”.
Tukey's pairwise comparisons (Minitab)
Family error rate = 0.0500
Individual error rate = 0.00706
Critical value = 4.15  q_(not given in version 16.1)
Intervals for (column level mean) - (row level mean)
2
3
4
5
1
-13.801
1.801
-6.801
8.801
-15.801
-0.199
-18.801
-3.199
2
-0.801
14.801
-9.801
5.801
-12.801
2.801
3
-16.801
-1.199
-19.801
-4.199
4
-10.801
4.801
Minitab Output for Broker Data
• Grouping Information Using Tukey Method
•
•
•
•
•
•
broker
5
6
4
6
2
6
1
6
3
6
N Mean Grouping
17.000 A
14.000 A
12.000 A B
6.000
B
5.000
B
• Means that do not share a letter are significantly
different.
Special Multiple Comp.
Method 3: Dunnett’s test
Designed specifically for (and incorporating
the interdependencies of) comparing several
“treatments” to a “control.”
Example:
CONTROL
Col
1
2
6 12
Analog of LSD
(=t/2 x 2 MSW
)
R
3
4
5
5 14 17
} R=6
D = Dut/2 x 2 MSW
R
From table or Minitab
D= Dut/2 x 2 MSW/R
CONTROL
= 2.61 (2(21.2)
)
6
= 6.94
1 2 3 4 5
In our example:
6 12 5 14 17
Comparison |difference| >or< 6.94
1 vs. 2
1 vs. 3
1 vs. 4
1 vs. 5
6
1
8
11
- Cols 4 and 5 differ from the control [ 1 ].
- Cols 2 and 3 are not significantly different
from control.
<
<
>
>
Minitab: Stat>>ANOVA>>General Linear Model then click “comparisons”.
Dunnett's comparisons with a control (Minitab)
Family error rate = 0.0500  controlled!!
Individual error rate = 0.0152
Critical value = 2.61  Dut_/2
Control = level (1) of broker
Intervals for treatment mean minus control mean
Level
2
3
4
5
Lower
-0.930
-7.930
1.070
4.070
Center
6.000
-1.000
8.000
11.000
Upper --+---------+---------+---------+----12.930
(---------*--------)
5.930 (---------*--------)
14.930
(--------*---------)
17.930
(---------*---------)
--+---------+---------+---------+-----7.0
0.0
7.0
14.0
What Method Should We Use?

Fisher procedure can be used only after the
F-test in the Anova is significant at 5%.

Otherwise, use Tukey procedure. Note that
to avoid being too conservative, the
significance level of Tukey test can be set
bigger (10%), especially when the number
of levels is big.
Contrast
1
Example 1
Placebo
2
3
4
Sulfa
Sulfa
Type
Type
S1
Antibiotic
Type A
S2
Suppose the questions of interest are
(1) Placebo vs. Non-placebo
(2) S1 vs. S2
(3) (Average) S vs. A
In general, a question of interest can be
expressed by a linear combination of
column means such as
C =  a j Y. j
j
with restriction that Saj = 0.
Such linear combinations are called
contrasts.
Test if a contrast has mean 0
The sum of squares for contrast Z is
SSC = R  C /  a
2
2
j
j
where R is the number of rows (replicates).
The test statistic Fcalc = SSC/MSW is distributed
as F with 1 and (df of error) degrees of freedom.
Reject E[C]= 0 if the observed Fcalc is too large
(say, > F0.05(1,df of error) at 5% significant level).
Example 1 (cont.): aj’s for the 3 contrasts
P
S1
S2
A
1
2
3
4
P vs. P: C1 -3
1
1
1
S1 vs. S2:C2
0
-1
1
0
S vs. A: C3
0
-1
-1
2
Calculating
a
2
j
j
top row
middle row
bottom row
3= 
00= 
0= 6
5
Y.1
6
P
Y.2
7
S1
Y.3
S2
10
C 2 /  a 2j
Y.4
A
j
Placebo
vs. drugs
-3
1
1
1
5.33
S1 vs. S2
0
-1
1
0
0.50
0
-1
-1
2
Average S
vs. A
8.17
14.00
SSC :
C /a
2
2
j
8 C /  a
2
j
j
5.33
42.64
.50
4.00
8.17
65.36
2
j
Tests for Contrasts
Source SSQ
C1
C2
C3
Error
df
42.64
4.00
65.36
140
MSQ
1
1
1
28
42.64
4.00
65.36
F
8.53
.80
13.07
5
F1-.05(1,28)=4.20
Example 1 (Cont.): Conclusions



The mean response for Placebo is
significantly different to that for Non-placebo.
There is no significant difference between
using Types S1 and S2.
Using Type A is significantly different to using
Type S on average.
Download