252solnF3 11/07/03 (Open this document in 'Page Layout' view!) F. ANALYSIS OF VARIANCE 1. 1-Way Analysis of Variance Text 11.1-11.6, 11.7**, 11.8 [11.1- 11.7, 11.8*] (11.1- 11.7, 11.8* (Same problem, different numbers – both answers will be posted) 2. 2 -Way Analysis of Variance Text 11.15-11.18, 11.23, 11.29-11.32, 11.36 [11.15-11.18, 11.23, 11.28-11.30, 11.34] (11.15-11.18, 11.23, 11.28-11.30, 11.34), F1, F2, F4 3. More than 2-Way analysis of Variance F3 4. Kruskal-Wallis Test Text 12.86-12.87, 12.89 [11.39-11.40, 11.42] (11.39-11.40, 11.42), Downing and Clark 18-12, 18-13 (in chapter 17 in D&C 3rd edition), 5. Friedman Test Text 12.93-12.95 [11.46-11.48] (11.65-11.67 on CD) Downing and Clark 18-4, 18-6 (in chapter 17 in D&C 3rd edition), Graded Assignment 4 (Will be posted) This document includes Problem F3, all problems in Chapter 12 and the four problems in Downing and Clark. -------------------------------------------------------------------------------------------------------------------------- ------- 3-Way ANOVA Problem. Problem F3: 48 measurements describe the time it took a group of truckers to get from their terminal to a destination. The trip times were characterized by driver’s experience (Factor A – 2 levels), route (Factor B – 3 levels) and season (Factor C – 2 levels). For each combination of factors there are 4 measurements. Set up the ‘degrees of freedom’ column of and ANOVA table showing all interactions. Solution: If we multiply the levels of the factors together and then multiply by the number of measurements per cell, we find a total of 2 3 2 4 48 measurements. Source SS DF MS F F .05 Experience (A) 500 1 Route (B) 400 2 Season (C) 300 1 Interaction (AB) 50 2 Interaction (AC) 60 1 Interaction (BC) 70 2 Interaction (ABC) 2 Within 100 36 Total 1600 47 Question: I have put some numbers, pretty much at random, in the SS column. Are you ready to (i) Calculate the missing number in the SS column? (ii) Compute the MS column? (iii) Get all the values in the F column by dividing the within (error) mean square into the other mean squares? (iv) Look up the appropriate values of F on the table? List the seven hypotheses that would be tested by these F tests and to say which ones should be rejected. 1 252solnF3 11/07/03 Kruskal-Wallis Test Problems Exercise 12.86 [11.39 in 8th and 9th]: Solutions are repeated, edited, from the Instructor’s Solution Manual 2 11.39 For the 0.01 level of significance and 5 degrees of freedom, U 15.086 . Exercise 12.87 [11.40 in 8th and 9th]: Assume that each group is too large for the K-S table. 11.40 (a) (b) Decision rule: If H > U 15.086 , reject H0. Decision: Since Hcalc = 13.77 is below the critical bound of 15.086, do not reject H0. 2 Exercise 12.88 [11.41 in 8th and 9th]: This wasn’t assigned, but the Minitab printout should give you some practice. NOBS means number of observations. 11.41 H0: A B C H1: At least one of the medians differs. Decision rule: If H = 9.210, reject H0. Test statistic: H = 0.64 Decision: Since Hcalc = 0.64 is below the critical bound of 9.210 or because the p-value is above .01 , do not reject H0. There is insufficient evidence to show any real difference in the median reaction times for the three learning methods. Minitab Output Kruskal-Wallis Test 2 U LEVEL NOBS 1 9 2 8 3 8 OVERALL 25 MEDIAN AVE. RANK Z VALUE 10.00 11.6 -0.74 15.50 13.3 0.12 12.50 14.4 0.64 13.0 H = 0.64 d.f. = 2 p = 0.728 Exercise 12.89 [11.42 in 8th and 9th]: 11.42 (a) H0: 1 2 3 4 Where 1 is Low, 2 is Normal, 3 is High and 4 is very high. H1: At least one of the medians differs. First we rank the data. The data appears below in columns marked x1 to x 4 and the ranks are in columns marked r1 to r4 . Row Low x1 1 2 3 4 5 8.0 8.1 9.2 9.4 11.7 Normal r1 SRi 11 12 15 16 19 73 ni 5 x2 7.6 8.2 9.8 10.9 12.3 High r2 x3 8 13 17 18 20 76 6.0 6.3 7.1 7.7 8.9 5 Very High r3 4 5 7 9 14 39 5 x4 r4 5.1 5.6 5.9 6.7 7.8 1 2 3 6 10 22 5 2 252solnF3 11/07/03 To check the ranking, note that the sum of the four rank sums is 73 + 76 + 39 + 22 = 210, and that the sum of the first n1 n2 n3 n4 5 5 5 5 n 20 numbers is nn 1 20 21 210 . 2 2 Now, compute the Kruskal-Wallis statistic 12 H nn 1 i SRi 2 ni 3n 1 12 73 2 76 2 39 2 22 2 321 .028571 13110 63 11 .914 . If we 20 21 5 5 5 5 5 look up this result in the Kruskal-Wallis table (Table 9), we find that the problem is too large for the table. If the size of the problem is larger than those shown in Table 9, use the 2 distribution, with df m 1 , where m is the number of columns. Since there are m 4 columns, we have 3 degrees of freedom. If we try to locate H 11 .914 on the chi3 11 .3449 and 23 12 .8382 , so the p-value is squared table, we find that .2010 .005 between .01 and .005. In particular if our significance level is 5%, compare H with .2053 7.8143 . Since H calc is larger than .205 , reject the null hypothesis. This data set was run on Minitab with the following results. ————— 11/7/2003 6:36:24 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > Retrieve "C:\Berenson\Data_Files-9th\Minitab\BATFAIL.MTW". Retrieving worksheet from file: C:\Berenson\Data_Files-9th\Minitab\BATFAIL.MTW # Worksheet was saved on Tue Mar 31 1998 Results for: 252BATFAIL.MTW MTB > print c1 c2 Data Display Row Time Pressure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 8.0 8.1 9.2 9.4 11.7 7.6 8.2 9.8 10.9 12.3 6.0 6.3 7.1 7.7 8.9 5.1 5.6 5.9 6.7 7.8 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 MTB > Kruskal-Wallis c1 c2. 3 252solnF3 11/07/03 Kruskal-Wallis Test: Time versus Pressure Kruskal-Wallis Test on Time Pressure 1 2 3 4 Overall N 5 5 5 5 20 Median 9.200 9.800 7.100 5.900 H = 11.91 DF = 3 Ave Rank 14.6 15.2 7.8 4.4 10.5 Z 1.79 2.05 -1.18 -2.66 P = 0.008 The p – value of .008 is below the significance level, so we reject the null hypothesis. (b) According to the Instructor’s Solution Manual, there is sufficient evidence to show there is a significant difference in the four pressure levels with respect to median battery life. The warranty policy should exploit the highest median battery life and explicitly specify that such median battery life level can only be warranted when the batteries are operated under normal pressure level. 4 252solnF3 4/15/02 Downing and Clark, Chapter 17,Application 12: The benefits paid to employees of three yo-yo manufacturers appear below. Test the hypothesis that the benefits expenditures of the three companies have the same distribution. Solution: The original data appears in the left three columns and the rankings appear in the next three. Company A x1 10 26 29 21 17 23 30 31 39 33 Original Data Company Company B C x2 x3 25 12 20 11 27 19 14 38 32 36 Company A r1 16 24 13 22 15 28 18 35 37 34 1 17 20 12 8 14 21 22 30 24 169 SRi Ranks of Data Company Company B C r2 r3 16 3 11 2 18 10 5 29 23 27 144 7 15 4 13 6 19 9 26 28 25 152 10 10 10 ni The null hypothesis is H 0 : Columns from same distribution or, if the parent distributions are assumed non-normal, H 0 : 1 2 3 . We use a Kruskal-Wallis test instead of a Friedman test because the data appear to be three independent random samples. To check the ranking, note that the sum of the three rank sums is 169 + 144 + 152 = 465, and that nn 1 30 31 465 the sum of the first n1 n2 n3 n 30 numbers is 2 2 12 SRi 2 3n 1 Now, compute the Kruskal-Wallis statistic H nn 1 i ni 12 169 2 144 2 152 2 331 .01290 7240 .10 93 0.4206 . If we look up this result in 10 10 30 31 10 the Kruskal-Wallis table (Table 9) , we find that the size of the data set is too large for the table. If the size of the problem is larger than those shown in Table 9, use the 2 distribution, with df m 1 , where m is the number of columns. Since there are m 3 columns, we have two degrees of freedom. If we try to locate H 0.4206 on the chi-squared table, we find that .2102 4.6052 and .2902 0.2107 , so the pvalue is between .10 and .90. In particular if our significance level is 5%, compare H with .2052 5.9915 . Since H is smaller than .205 , do not reject the null hypothesis. 5 252solnF3 4/15/02 Downing and Clark, Chapter 17,Application 13: Four experimental precision scales (A, B, C, D) are tested on a fixed weight with the results below. Test the null hypothesis that the distributions of values given by the four scales are the same. (The text uses .10 for this problem.) Solution: The null hypothesis is H 0 : Columns from same distribution or, if the parent distributions are assumed non-normal, H 0 : 1 2 3 3 . We use a Kruskal-Wallis test instead of a Friedman test because the data appear to be four independent random samples. In this case, there is no way the data could be cross-classified, since the column lengths are unequal. Scale A x1 103 121 106 120 114 128 116 Original Data Scale Scale B C x2 x3 112 105 132 136 109 138 135 126 124 117 131 104 130 108 123 119 113 133 127 134 125 115 Ranks of Data Scale Scale B C r2 r3 Scale D x4 Scale A r1 129 111 122 137 107 110 139 118 1 4 12 14 18 19 26 3 7 10 15 22 24 30 33 34 36 94 214 SRi 2 6 11 13 17 21 23 25 28 29 31 32 238 Scale D r4 5 8 9 16 20 27 35 37 157 7 10 12 8 ni To check the ranking, note that the sum of the four rank sums is 94 + 214 + 238 + 157 = 703, and nn 1 37 38 703 . that the sum of the first n1 n2 n3 n4 7 10 12 8 n 37 numbers is 2 2 12 SRi 2 3n 1 Now, compute the Kruskal-Wallis statistic H nn 1 i ni 12 94 2 214 2 238 2 157 2 338 .00853 13643 .344 114 2.444 . If we look up this 10 12 8 37 38 7 result in the Kruskal-Wallis table (Table 9) , we find that the size of the data set is too large for the table. If the size of the problem is larger than those shown in Table 9, use the 2 distribution, with df m 1 , where m is the number of columns. Since there are m 4 columns, we have three degrees of freedom. Since our significance level is 10%, compare H with .2103 6.2514 . Since H is smaller than .210 , do not reject the null hypothesis. 6 252solnF3 11/07/03 Friedman Test Problems Exercise 12.93[11.46 in 9th] (11.65 on CD in 8th edition): Solutions are repeated, edited, from the Instructor’s Solution Manual 11.46 d.f. = 5, = 0.1, U2 9.2363 Exercise 12.94 [11.47 in 9th edition] (11.66 on CD in 8th edition): 11.47 (a) H0: 1 2 3 4 5 6 H1: At least one of the medians differs. If the appropriate values cannot be found on the Friedman table, use 2 and reject H0 if F2 > 9.2363. (b) Since F2 = 11.56 > 9.2363, reject H0. There is enough evidence that the medians are different. Exercise 12.95 [11.48 in 9th] (11.67 on CD in 8th edition): 11.48 (a) H0: 1 2 3 4 Where 1 is A, 2 is B, 3 is C and 4 is D. H1: At least one of the medians differs. First we rank the data within rows. The data appears below in columns marked x1 to x 4 and the ranks are in columns marked r1 to r4 . Row Brand A x1 1 2 3 4 5 6 7 8 9 24 27 19 24 22 26 27 25 22 Brand B r1 x2 2 3.5 2 2 2.5 3 4 3 3 25 26 27 22 27 25 27 26 27 23 Brand C r2 4 3.5 4 4 4 4 3 4 4 34.5 Brand D x3 r3 x4 r4 25 26 20 25 22 24 22 24 20 3 2 3 3 2.5 1.5 1 2 2 20 22 24 16 23 21 24 23 21 19 1 1 1 1 1 1.5 2 1 1 . 10.5 SRi To check the ranking, note that the sum of the four rank sums is 25 + 34.5 + 20 + 10.5 = rcc 1 945 SRi 90 . 90, and that the sum of the rank sums should be 2 2 12 SRi2 3r c 1 Now compute the Friedman statistic F2 rc c 1 i 12 25 2 34 .52 20 2 10 .52 395 1 2325 .5 135 20 .03 . 9 4 5 15 Since the size of the problem is larger than those shown in Table 8, use the 2 distribution, with df c 1 , where c is the number of columns. Since c 4, if .05 , compare 2 with 23 7.8147 . Since 2 20.03 is larger than 2 , reject F .05 F .05 the null hypothesis. This problem was run on Minitab with the following results. 7 252solnF3 11/07/03 ————— 11/7/2003 8:32:16 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > Retrieve "C:\Berenson\Data_Files-9th\Minitab\COFFEE.MTW". Retrieving worksheet from file: C:\Berenson\Data_Files-9th\Minitab\COFFEE.MTW # Worksheet was saved on Thu Nov 06 2003 Results for: COFFEE.MTW MTB > print c1 c2 c3 Data Display Row Expert Brand Rating 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 24 26 25 22 27 27 26 24 19 22 20 16 24 27 25 23 22 25 22 21 26 27 24 24 27 26 22 23 25 27 24 21 22 23 20 19 8 252solnF3 11/07/03 MTB > Friedman c3 c2 c1. Friedman Test: Rating versus Brand, Expert Friedman test for Rating by Brand blocked by Expert S = 20.03 S = 20.72 DF = 3 DF = 3 P = 0.000 P = 0.000 (adjusted for ties) Brand 1 2 3 4 N 9 9 9 9 Est Median 25.000 26.750 24.000 22.250 Grand median = 24.500 (b) Sum of Ranks 25.0 34.5 20.0 10.5 Since the p-value is essentially zero, reject H0 at 0.05 level of significance. There is evidence of a difference in the median summated ratings of the four brands of Colombian coffee. In (a), we conclude that there is evidence of a difference in the median summated ratings of the four brands of Colombian coffee while in problem 11.23, we conclude that there is evidence of a difference in the mean summated ratings of the four brands of Colombian coffee. 9 252solnF3 11/07/03 Downing and Clark, Chapter 17,Application 6: Six companies show the profits below from sales in four different cities (A, B, C, D). Use the Friedman statistic to test the null hypothesis that the cities are equally profitable for the companies. (Assume that the parent distribution is not Normal) Solution: The null hypothesis is H 0 : Columns from same distribution or H 0 : 1 2 3 3 . We use a Friedman test because the data is cross-classified by company. This time we rank our data only within rows. There are c 4 columns and r 6 rows. Original Data Ranked Data City City City City City City City City A B C D A B C D x1 x2 x3 x4 r1 r2 r3 r4 Firm 1 Firm 2 Firm 3 Firm 4 Firm 5 Firm 6 22 20 19 15 18 17 11 19 24 18 17 16 16 18 16 17 13 19 14 14 13 19 15 12 4 4 3 1 4 3 19 1 3 4 3 3 2 16 3 2 2 2 1 4 14 2 1 1 4 2 1 11 SRi To check the ranking, note that the sum of the four rank sums is 19 + 16 + 14 + 11 = 60, and that the sum rcc 1 645 SRi 60 . of the rank sums should be 2 2 12 SRi2 3r c 1 Now compute the Friedman statistic F2 rc c 1 i 12 19 2 16 2 14 2 112 6 4 5 365 101 943 90 3.40 . Since the size of the problem is larger than those shown in Table 8, use the 2 distribution, with df c 1 , where c is the number of columns. Since c 4, if .05 , compare F2 with .2053 7.8147 . Since F2 3.40 is not larger than .205 , do not reject the null hypothesis. 10 252solnF3 11/07/03 Downing and Clark, Chapter 17,Application 4: Fifteen frequent fliers are asked to rate in order of preference Aircat (A), Bluebird (B) and Condor (C) Airlines. The results are as below. Test the hypothesis that there is no preference among fliers between the three Airlines. Solution: The null hypothesis is H 0 : Columns from same distribution or H 0 : 1 2 3 3 . We use a Friedman test because the data is cross-classified by company. This time we rank our data only within rows. There are c 3 columns and r 15 rows. It should be very obvious that the ranking has already been done for you, but it is repeated here to remind you that it must be done before rank sums are computed. Airline A x1 Flier 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1 1 1 2 3 3 2 1 1 1 3 2 3 1 Original Data Airline Airline B C x2 x3 3 3 3 2 1 1 2 3 3 2 3 2 3 1 3 Airline A r1 2 2 2 3 3 2 1 1 2 3 2 1 1 2 2 1 1 1 1 2 3 3 2 1 1 1 3 2 3 1 26 Ranked Data Airline Airline B C r2 r3 3 3 3 2 1 1 2 3 3 2 3 2 3 1 3. 35 2 2 2 3 3 2 1 1 2 3 2 1 1 2 2 29 SRi To check the ranking, note that the sum of the three rank sums is 26 + 35 + 29 = 90, and that the rcc 1 1534 SRi 90 . sum of the rank sums should be 2 2 12 SRi2 3r c 1 Now compute the Friedman statistic F2 rc c 1 i 12 26 2 35 2 29 2 15 3 4 315 4 151 2742 180 182 .8 180 2.8 . Since the size of the problem is larger than those shown in Table 8, use the 2 distribution, with df c 1 , where c is the number of columns. Since c 3, if .05 , compare F2 with .2052 5.9915 . Since F2 2.8 is not larger than .205 , do not reject the null hypothesis. 11