252solnF3 11/07/03 (Open this document in 'Page Layout' view!)

advertisement
252solnF3 11/07/03
(Open this document in 'Page Layout' view!)
F. ANALYSIS OF VARIANCE
1. 1-Way Analysis of Variance
Text 11.1-11.6, 11.7**, 11.8 [11.1- 11.7, 11.8*] (11.1- 11.7, 11.8* (Same problem, different numbers – both answers will be posted)
2. 2 -Way Analysis of Variance
Text 11.15-11.18, 11.23, 11.29-11.32, 11.36 [11.15-11.18, 11.23, 11.28-11.30, 11.34] (11.15-11.18, 11.23, 11.28-11.30, 11.34), F1,
F2, F4
3. More than 2-Way analysis of Variance
F3
4. Kruskal-Wallis Test
Text 12.86-12.87, 12.89 [11.39-11.40, 11.42] (11.39-11.40, 11.42), Downing and Clark 18-12, 18-13 (in chapter 17 in D&C 3rd
edition),
5. Friedman Test
Text 12.93-12.95 [11.46-11.48] (11.65-11.67 on CD) Downing and Clark 18-4, 18-6 (in chapter 17 in D&C 3rd edition),
Graded Assignment 4 (Will be posted)
This document includes Problem F3, all problems in Chapter 12 and the four problems in Downing and
Clark.
-------------------------------------------------------------------------------------------------------------------------- -------
3-Way ANOVA Problem.
Problem F3: 48 measurements describe the time it took a group of truckers to get from their terminal to a
destination. The trip times were characterized by driver’s experience (Factor A – 2 levels), route (Factor B
– 3 levels) and season (Factor C – 2 levels). For each combination of factors there are 4 measurements. Set
up the ‘degrees of freedom’ column of and ANOVA table showing all interactions.
Solution: If we multiply the levels of the factors together and then multiply by the number of
measurements per cell, we find a total of 2  3  2  4  48 measurements.
Source
SS
DF MS F
F
.05
Experience (A)
500
1
Route (B)
400
2
Season (C)
300
1
Interaction (AB)
50
2
Interaction (AC)
60
1
Interaction (BC)
70
2
Interaction (ABC)
2
Within
100
36
Total
1600
47
Question: I have put some numbers, pretty much at random, in the SS column. Are you ready to (i)
Calculate the missing number in the SS column? (ii) Compute the MS column? (iii) Get all the values in the
F column by dividing the within (error) mean square into the other mean squares? (iv) Look up the
appropriate values of F on the table? List the seven hypotheses that would be tested by these F tests and to
say which ones should be rejected.
1
252solnF3 11/07/03
Kruskal-Wallis Test Problems
Exercise 12.86 [11.39 in 8th and 9th]: Solutions are repeated, edited, from the Instructor’s Solution Manual
2
11.39 For the 0.01 level of significance and 5 degrees of freedom, U  15.086 .
Exercise 12.87 [11.40 in 8th and 9th]: Assume that each group is too large for the K-S table.
11.40
(a)
(b)
Decision rule: If H > U  15.086 , reject H0.
Decision: Since Hcalc = 13.77 is below the critical bound of 15.086, do not reject H0.
2
Exercise 12.88 [11.41 in 8th and 9th]: This wasn’t assigned, but the Minitab printout should give you some
practice. NOBS means number of observations.
11.41
H0:  A   B   C
H1: At least one of the medians differs.
Decision rule: If H   = 9.210, reject H0.
Test statistic: H = 0.64
Decision: Since Hcalc = 0.64 is below the critical bound of 9.210 or because the p-value is above
  .01 , do not reject H0. There is insufficient evidence to show any real difference in the median
reaction times for the three learning methods.
Minitab Output
Kruskal-Wallis Test
2
U
LEVEL
NOBS
1
9
2
8
3
8
OVERALL 25
MEDIAN AVE. RANK Z VALUE
10.00
11.6
-0.74
15.50
13.3
0.12
12.50
14.4
0.64
13.0
H = 0.64 d.f. = 2 p = 0.728
Exercise 12.89 [11.42 in 8th and 9th]:
11.42 (a)
H0:  1   2   3   4 Where 1 is Low, 2 is Normal, 3 is High and 4 is very high.
H1: At least one of the medians differs.
First we rank the data. The data appears below in columns marked x1 to x 4 and the
ranks are in columns marked r1 to r4 .
Row
Low
x1
1
2
3
4
5
8.0
8.1
9.2
9.4
11.7
Normal
r1
SRi
11
12
15
16
19
73
ni
5
x2
7.6
8.2
9.8
10.9
12.3
High
r2
x3
8
13
17
18
20
76
6.0
6.3
7.1
7.7
8.9
5
Very High
r3
4
5
7
9
14
39
5
x4
r4
5.1
5.6
5.9
6.7
7.8
1
2
3
6
10
22
5
2
252solnF3 11/07/03
To check the ranking, note that the sum of the four rank sums is 73 + 76 + 39 + 22 = 210,
and that the sum of the first n1  n2  n3  n4  5  5  5  5  n  20 numbers is
nn  1 20 21

 210 .
2
2
Now, compute the Kruskal-Wallis statistic
 12
H 
 nn  1

i
 SRi 2

 ni


  3n  1


 12  73 2 76 2 39 2 22 2 

  321  .028571  13110   63  11 .914 . If we








20
21
5
5
5
5
 5 



look up this result in the Kruskal-Wallis table (Table 9), we find that the problem is too
large for the table. If the size of the problem is larger than those shown in Table 9, use the
 2 distribution, with df  m  1 , where m is the number of columns. Since there are
m  4 columns, we have 3 degrees of freedom. If we try to locate H  11 .914 on the chi3  11 .3449 and  23  12 .8382 , so the p-value is
squared table, we find that  .2010
.005
between .01 and .005. In particular if our significance level is 5%, compare H with
 .2053  7.8143 . Since H calc is larger than  .205 , reject the null hypothesis.
This data set was run on Minitab with the following results.
————— 11/7/2003 6:36:24 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Berenson\Data_Files-9th\Minitab\BATFAIL.MTW".
Retrieving worksheet from file: C:\Berenson\Data_Files-9th\Minitab\BATFAIL.MTW
# Worksheet was saved on Tue Mar 31 1998
Results for: 252BATFAIL.MTW
MTB > print c1 c2
Data Display
Row
Time
Pressure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
8.0
8.1
9.2
9.4
11.7
7.6
8.2
9.8
10.9
12.3
6.0
6.3
7.1
7.7
8.9
5.1
5.6
5.9
6.7
7.8
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
4
MTB > Kruskal-Wallis c1 c2.
3
252solnF3 11/07/03
Kruskal-Wallis Test: Time versus Pressure
Kruskal-Wallis Test on Time
Pressure
1
2
3
4
Overall
N
5
5
5
5
20
Median
9.200
9.800
7.100
5.900
H = 11.91
DF = 3
Ave Rank
14.6
15.2
7.8
4.4
10.5
Z
1.79
2.05
-1.18
-2.66
P = 0.008
The p – value of .008 is below the significance level, so we reject the null hypothesis.
(b)
According to the Instructor’s Solution Manual, there is sufficient evidence to show there
is a significant difference in the four pressure levels with respect to median battery life. The
warranty policy should exploit the highest median battery life and explicitly specify that such
median battery life level can only be warranted when the batteries are operated under normal
pressure level.
4
252solnF3 4/15/02
Downing and Clark, Chapter 17,Application 12: The benefits paid to employees of three yo-yo
manufacturers appear below. Test the hypothesis that the benefits expenditures of the three companies have
the same distribution.
Solution: The original data appears in the left three columns and the rankings appear in the next three.
Company
A
x1
10
26
29
21
17
23
30
31
39
33
Original Data
Company
Company
B
C
x2
x3
25
12
20
11
27
19
14
38
32
36
Company
A
r1
16
24
13
22
15
28
18
35
37
34
1
17
20
12
8
14
21
22
30
24
169
SRi
Ranks of Data
Company
Company
B
C
r2
r3
16
3
11
2
18
10
5
29
23
27
144
7
15
4
13
6
19
9
26
28
25
152
10
10
10
ni
The null hypothesis is H 0 : Columns from same distribution or, if the parent distributions are
assumed non-normal, H 0 : 1   2   3 . We use a Kruskal-Wallis test instead of a Friedman test because
the data appear to be three independent random samples.
To check the ranking, note that the sum of the three rank sums is 169 + 144 + 152 = 465, and that
nn  1 30 31

 465
the sum of the first n1  n2  n3  n  30 numbers is
2
2
 12
 SRi 2 

  3n  1
Now, compute the Kruskal-Wallis statistic H  
 nn  1 i  ni 

 12  169 2 144 2 152 2 

  331  .01290 7240 .10   93  0.4206 . If we look up this result in




10
10 
 30 31  10
the Kruskal-Wallis table (Table 9) , we find that the size of the data set is too large for the table. If the size
of the problem is larger than those shown in Table 9, use the  2 distribution, with df  m  1 , where m is
the number of columns. Since there are m  3 columns, we have two degrees of freedom. If we try to
locate H  0.4206 on the chi-squared table, we find that  .2102   4.6052 and  .2902   0.2107 , so the pvalue is between .10 and .90. In particular if our significance level is 5%, compare H with
 .2052   5.9915 . Since H is smaller than  .205 , do not reject the null hypothesis.
5
252solnF3 4/15/02
Downing and Clark, Chapter 17,Application 13: Four experimental precision scales (A, B, C, D) are
tested on a fixed weight with the results below. Test the null hypothesis that the distributions of values
given by the four scales are the same. (The text uses   .10 for this problem.)
Solution: The null hypothesis is H 0 : Columns from same distribution or, if the parent distributions are
assumed non-normal, H 0 : 1   2   3   3 . We use a Kruskal-Wallis test instead of a Friedman test
because the data appear to be four independent random samples. In this case, there is no way the data could
be cross-classified, since the column lengths are unequal.
Scale
A
x1
103
121
106
120
114
128
116
Original Data
Scale
Scale
B
C
x2
x3
112
105
132
136
109
138
135
126
124
117
131
104
130
108
123
119
113
133
127
134
125
115
Ranks of Data
Scale
Scale
B
C
r2
r3
Scale
D
x4
Scale
A
r1
129
111
122
137
107
110
139
118
1
4
12
14
18
19
26
3
7
10
15
22
24
30
33
34
36
94
214
SRi
2
6
11
13
17
21
23
25
28
29
31
32
238
Scale
D
r4
5
8
9
16
20
27
35
37
157
7
10
12
8
ni
To check the ranking, note that the sum of the four rank sums is 94 + 214 + 238 + 157 = 703, and
nn  1 37 38 

 703 .
that the sum of the first n1  n2  n3  n4  7  10  12  8  n  37 numbers is
2
2
 12
 SRi 2 

  3n  1
Now, compute the Kruskal-Wallis statistic H  
 nn  1 i  ni 
 12  94 2 214 2 238 2 157 2 

  338   .00853 13643 .344   114  2.444 . If we look up this





10
12
8 
 37 38   7
result in the Kruskal-Wallis table (Table 9) , we find that the size of the data set is too large for the table. If
the size of the problem is larger than those shown in Table 9, use the  2 distribution, with df  m  1 ,
where m is the number of columns. Since there are m  4 columns, we have three degrees of freedom.

Since our significance level is 10%, compare H with  .2103  6.2514 . Since H is smaller than  .210 , do not
reject the null hypothesis.
6
252solnF3 11/07/03
Friedman Test Problems
Exercise 12.93[11.46 in 9th] (11.65 on CD in 8th edition): Solutions are repeated, edited, from the
Instructor’s Solution Manual
11.46
d.f. = 5,

= 0.1,
U2  9.2363
Exercise 12.94 [11.47 in 9th edition] (11.66 on CD in 8th edition):
11.47 (a)
H0:  1   2   3   4   5   6 H1: At least one of the medians differs.
If the appropriate values cannot be found on the Friedman table, use  2 and reject H0 if
 F2 > 9.2363.
(b)
Since  F2 = 11.56 > 9.2363, reject H0. There is enough evidence that the medians are
different.
Exercise 12.95 [11.48 in 9th] (11.67 on CD in 8th edition):
11.48
(a)
H0:  1   2   3   4 Where 1 is A, 2 is B, 3 is C and 4 is D.
H1: At least one of the medians differs.
First we rank the data within rows. The data appears below in columns marked x1 to x 4
and the ranks are in columns marked r1 to r4 .
Row
Brand A
x1
1
2
3
4
5
6
7
8
9
24
27
19
24
22
26
27
25
22
Brand B
r1
x2
2
3.5
2
2
2.5
3
4
3
3
25
26
27
22
27
25
27
26
27
23
Brand C
r2
4
3.5
4
4
4
4
3
4
4
34.5
Brand D
x3
r3
x4
r4
25
26
20
25
22
24
22
24
20
3
2
3
3
2.5
1.5
1
2
2
20
22
24
16
23
21
24
23
21
19
1
1
1
1
1
1.5
2
1
1 .
10.5
SRi
To check the ranking, note that the sum of the four rank sums is 25 + 34.5 + 20 + 10.5 =
rcc  1 945
SRi 

 90 .
90, and that the sum of the rank sums should be
2
2
 12

SRi2   3r c  1
Now compute the Friedman statistic  F2  
 rc c  1 i


 


 12
25 2  34 .52  20 2  10 .52   395   1 2325 .5  135  20 .03 .





9
4
5
15



Since the size of the problem is larger than those shown in Table 8, use the
 2 distribution, with df  c  1 , where c is the number of columns. Since c  4, if
  .05 , compare  2 with  23  7.8147 . Since  2  20.03 is larger than  2 , reject
F
.05
F
.05
the null hypothesis.
This problem was run on Minitab with the following results.
7
252solnF3 11/07/03
————— 11/7/2003 8:32:16 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Berenson\Data_Files-9th\Minitab\COFFEE.MTW".
Retrieving worksheet from file: C:\Berenson\Data_Files-9th\Minitab\COFFEE.MTW
# Worksheet was saved on Thu Nov 06 2003
Results for: COFFEE.MTW
MTB > print c1 c2 c3
Data Display
Row
Expert
Brand
Rating
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
5
5
5
5
6
6
6
6
7
7
7
7
8
8
8
8
9
9
9
9
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
24
26
25
22
27
27
26
24
19
22
20
16
24
27
25
23
22
25
22
21
26
27
24
24
27
26
22
23
25
27
24
21
22
23
20
19
8
252solnF3 11/07/03
MTB > Friedman c3 c2 c1.
Friedman Test: Rating versus Brand, Expert
Friedman test for Rating by Brand blocked by Expert
S = 20.03
S = 20.72
DF = 3
DF = 3
P = 0.000
P = 0.000 (adjusted for ties)
Brand
1
2
3
4
N
9
9
9
9
Est
Median
25.000
26.750
24.000
22.250
Grand median
=
24.500
(b)
Sum of
Ranks
25.0
34.5
20.0
10.5
Since the p-value is essentially zero, reject H0 at 0.05 level of significance. There is
evidence of a difference in the median summated ratings of the four brands of Colombian
coffee.
In (a), we conclude that there is evidence of a difference in the median summated ratings
of the four brands of Colombian coffee while in problem 11.23, we conclude that there is
evidence of a difference in the mean summated ratings of the four brands of Colombian
coffee.
9
252solnF3 11/07/03
Downing and Clark, Chapter 17,Application 6: Six companies show the profits below from sales in four
different cities (A, B, C, D). Use the Friedman statistic to test the null hypothesis that the cities are equally
profitable for the companies. (Assume that the parent distribution is not Normal)
Solution: The null hypothesis is H 0 : Columns from same distribution or H 0 : 1   2   3   3 . We use
a Friedman test because the data is cross-classified by company. This time we rank our data only within
rows. There are c  4 columns and r  6 rows.
Original Data
Ranked Data
City
City
City
City
City
City
City
City
A
B
C
D
A
B
C
D
x1
x2
x3
x4
r1
r2
r3
r4
Firm 1
Firm 2
Firm 3
Firm 4
Firm 5
Firm 6
22
20
19
15
18
17
11
19
24
18
17
16
16
18
16
17
13
19
14
14
13
19
15
12
4
4
3
1
4
3
19
1
3
4
3
3
2
16
3
2
2
2
1
4
14
2
1
1
4
2
1
11
SRi
To check the ranking, note that the sum of the four rank sums is 19 + 16 + 14 + 11 = 60, and that the sum
rcc  1 645
SRi 

 60 .
of the rank sums should be
2
2
 12

SRi2   3r c  1
Now compute the Friedman statistic  F2  
 rc c  1 i


 

 12
19 2  16 2  14 2  112

 6 4 5
  365  101 943   90  3.40 .

Since the size of the problem is larger than those shown in Table 8, use the  2 distribution, with
df  c  1 , where c is the number of columns. Since c  4, if   .05 , compare  F2 with  .2053  7.8147 .
Since  F2  3.40 is not larger than  .205 , do not reject the null hypothesis.
10
252solnF3 11/07/03
Downing and Clark, Chapter 17,Application 4: Fifteen frequent fliers are asked to rate in order of
preference Aircat (A), Bluebird (B) and Condor (C) Airlines. The results are as below. Test the hypothesis
that there is no preference among fliers between the three Airlines.
Solution: The null hypothesis is H 0 : Columns from same distribution or H 0 : 1   2   3   3 . We use
a Friedman test because the data is cross-classified by company. This time we rank our data only within
rows. There are c  3 columns and r  15 rows. It should be very obvious that the ranking has already
been done for you, but it is repeated here to remind you that it must be done before rank sums are
computed.
Airline
A
x1
Flier
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
1
1
1
2
3
3
2
1
1
1
3
2
3
1
Original Data
Airline
Airline
B
C
x2
x3
3
3
3
2
1
1
2
3
3
2
3
2
3
1
3
Airline
A
r1
2
2
2
3
3
2
1
1
2
3
2
1
1
2
2
1
1
1
1
2
3
3
2
1
1
1
3
2
3
1
26
Ranked Data
Airline
Airline
B
C
r2
r3
3
3
3
2
1
1
2
3
3
2
3
2
3
1
3.
35
2
2
2
3
3
2
1
1
2
3
2
1
1
2
2
29
SRi
To check the ranking, note that the sum of the three rank sums is 26 + 35 + 29 = 90, and that the
rcc  1 1534
SRi 

 90 .
sum of the rank sums should be
2
2
 12

SRi2   3r c  1
Now compute the Friedman statistic  F2  
 rc c  1 i


 

 12
26 2  35 2  29 2





15
3
4

  315 4  151 2742   180  182 .8  180  2.8 .

Since the size of the problem is larger than those shown in Table 8, use the  2 distribution, with
df  c  1 , where c is the number of columns. Since c  3, if   .05 , compare  F2 with  .2052   5.9915 .
Since  F2  2.8 is not larger than  .205 , do not reject the null hypothesis.
11
Download