252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove E. CHI-SQUARED AND RELATED TESTS. 1. Tests of Homogeneity and Independence Text 12.18, 12.19 - 21, 12.26 [12.23*, 12.24, 12.27] (12.22, 12.27) E1, E2, E3 2. Tests of Goodness of Fit Text 12.51, 12.54 [12.49*, 12.52*. Both on CD12_5], E4, E5, E6 a. Uniform Distribution b. Poisson Distribution c. Normal Distribution 3. Kolmogorov-Smirnov Test E7, E8, E9, E10, E11 a. Kolmogorov-Smirnov One-Sample Test b. Lilliefors Test. Solutions to outline point 1 are in this document. --------------------------------------------------------------------------------------------------------------------------------Problems involving Tests of Homogeneity and Independence. Exercise 12.18 [12.23 in 9th]: The results of a Gallup phone survey appear below. Consumers were asked if they objected to having their medical records shared with different types of organizations. Results follow. O Ins Cos Pharm Research Yes 820 590 670 No 180 410 330 a) Is the proportion of people who object different for different institutions? .05 . b) If appropriate, use the Marascuilo procedure to determine which organizations are different. Discuss. Solution: a) We are testing H 0 : Homogeneity or H 0 : p1 p2 p3 , where p1 is the proportion saying ‘yes’ to an insurance company, p 2 is the proportion saying ‘yes’ to a pharmacy, etc. O Yes Ins Cos Pharm Research 820 590 410 1000 pr .6933 The row proportions are gotten by .3067 1.0000 2080 dividing row totals into the overall total, for example .6933 . We now get our expected table by 3000 using the row proportions to multiply the column totals, for example we replace 820 by .6933 1000 No Total 180 1000 Total 670 330 1000 E 693 .3 . The expected array is Yes No Total 2080 920 3000 Ins Cos The formula for the chi-squared statistic is 2 Pharm 693 .3 306.7 1000 Research 693.3 306.7 1000 O E 2 E or 2 693.3 306.7 1000 Total pr 2080 .6933 920 3000 .3067 1.0000 O2 n . The first of these two E formulas is shown below. 1 252solnE1 10/23/06 E (Open this document in 'Page Layout' view!) E O 2 E O2 E O O Roger Even Bove E 693.3 820 -126.7 16052.89 23.15432 306.7 180 127.7 16052.89 53.17017 693.3 590 103.3 10670.89 15.39145 306.7 410 -103.3 10670.89 34.79260 693.3 670 23.3 542.89 0.78305 306.7 330 -23.3 542.89 1.77010 3000.0 3000 0.0 129.06535 2 (The Instructor’s Solution Manual gets calc 128.24 ) The degrees of freedom for this application are r 1c 1 2 13 1 12 2 . Since .05, we compare the calculated chi-square with 2 2 is larger than the table value we reject H 0 . The Instructor’s Solution 2.05 5.9915 . Since our calc Manual puts it this way: H 0 : p1 p2 p3 H1 : at least one proportion differs where population 1 = insurance companies, 2 = pharmacies, 3 = medical researchers 2 Decision rule: df = (c – 1) = (3 – 1) = 2. If > 5.9915, reject H0. 2 = 128.24 2 Decision: Since calc = 128.24 is above the upper critical bound of 5.9915, reject H0. There Test statistic: is enough evidence to show that there is a significant difference in the proportion of people who object to their medical records being shared. b) The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected and (ii) p a p b 2 s p , where a and b represent 2 groups, the chi - squared has c 1 degrees of freedom and the standard deviation is s p p a q a pb qb , you can say that you have a significant na nb difference between p a and p b . 820 590 .820 p2 .590 1000 1000 pq .820 .180 pq .590 .410 670 p3 .670 and the variances 1 1 .0001476 , 2 2 .0002419 , 1000 n1 1000 n2 1000 For the three column proportions saying ‘yes,’ we have p1 p3q3 .670 .330 .0002211 . We get the following table. n3 1000 Pair Critical Range 2 sp pa pb 1 to 2 5.9915 .0001476 .0002419 .048 .820 .590 .230 2 to 3 5.9915 .0002419 .0002211 .053 .590 .670 .080 1 to 3 5.9915 .0001476 .0002211 .047 .820 .670 .150 Since, in every case, the difference between the proportions exceeds the critical range, we can say that there is a significant difference between each pair of proportions. 2 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove Exercise 12.19 -12.21 [12.24 9in 9th] (12.22 in 8th edition): On the basis of the first table of each pair posted below, Are there significant differences in cities in such characteristics between cities? Provide a pvalue for each. Solution: The solution is repeated from the Instructor’s Solution Manual . (a) Item #1: Use the guest's name: H 0 : p1 p2 p3 H1 : Not all p j are equal where population 1 = Hong Kong, 2 = New York, 3 = Paris Observed Frequencies: Finding Expected Frequencies: Finding City Hong Kong New York Yes 26 39 No 74 61 Total 100 100 Paris 28 72 100 Total 93 207 300 City Hong Kong New York Yes 31 31 No 69 69 Total 100 100 Paris 31 69 100 Total 93 207 300 Level of Significance 0.05 Number of Rows 2 Number of Columns 3 Degrees of Freedom 2 Critical Value 5.991476 Chi-Square Test Statistic 4.581581 p -Value 0.101186 Do not reject the null hypothesis Test statistic: All cells (b) fo fe fe 2 4.582 Decision: Since the measured test statistic of 4.582 is smaller than the critical value of 5.991, we do not reject the null hypothesis. There is not enough evidence to conclude that there is a difference in the proportion of hotels that use the guest's name among the three cities. The p value is 0.101. The probability of obtaining a sample that gives rise to a test statistic more extreme than 4.582 is 0.101 if the null hypothesis is true. 3 252solnE1 10/23/06 (c) (Open this document in 'Page Layout' view!) Roger Even Bove Item #2: Minibar charges correctly posted at check-out: H 0 : p1 p2 p3 H1 : Not all p j are equal Observed Frequencies: Expected Frequencies: City Minibar Charges Posted Hong Kong New York Yes 86 76 No 14 24 Total 100 100 Paris 78 22 100 Total 240 60 300 City Minibar Charges Posted Hong Kong New York Yes 80 80 No 20 20 Total 100 100 Paris 80 20 100 Total 240 60 300 Level of Significance 0.05 Number of Rows 2 Number of Columns 3 Degrees of Freedom 2 Critical Value 5.991476 Chi-Square Test Statistic 3.499998 p -Value 0.173774 Do not reject the null hypothesis Test statistic: All cells (d) (e) fo fe fe 2 3.50 Decision: Since the measured test statistic of 3.5 is smaller than the critical value of 5.991, we do not reject the null hypothesis. There is not sufficient evidence to conclude that there is a difference in the proportion of hotels that correctly post Minibar charges among the three cities. The p value is 0.174. The probability of obtaining a sample that gives rise to a test statistic more extreme than 3.5 is 0.174 if the null hypothesis is true. Item #3: Bathroom tub and shower spotlessly clean: H 0 : p1 p2 p3 H1 : Not all p j are equal Observed Frequencies: Expected Frequencies: City Bathroom and Shower Clean Hong Kong New York Yes 81 76 No 19 24 Total 100 100 Paris 79 21 100 Total 236 64 300 City Bathroom and Shower Clean Hong Kong New York Paris Yes 78.6666667 78.666667 78.66667 No 21.3333333 21.333333 21.33333 Total 100 100 100 Total 236 64 300 Level of Significance 0.05 Number of Rows 2 Number of Columns 3 Degrees of Freedom 2 Critical Value 5.991476 Chi-Square Test Statistic 0.754766 p -Value 0.685653 Do not reject the null hypothesis 252solnE1 10/16/03 4 252solnE1 10/23/06 (e) (Open this document in 'Page Layout' view!) Test statistic: fo fe . (f) (g) (h) 2 fe All cells Roger Even Bove 0.755 Decision: Since the measured test statistic of 0.755 is smaller than the critical value of 5.991, we do not reject the null hypothesis and conclude that there is no significant relationship between item #3 and the city. The p value is 0.686. The probability of obtaining a sample that gives rise to a test statistic more extreme than 0.755 is 0.686 if the null hypothesis is true. Since the null hypotheses are not rejected for all the 3 items, it is not necessary to perform the Marascuilo procedure. (a) Item #1: Use the guest's name: H 0 : p1 p2 p3 H1 : Not all p j are equal Test statistic: fo fe (c) Decision: Since the measured test statistic of 9.163 is greater than the critical value of 5.991, we reject the null hypothesis and conclude that there is a significant difference in the proportion of hotels that use the guest's name among the 3 cities. The p value is 0.01. The probability of obtaining a sample that gives rise to a test statistic more extreme than 9.163 is 0.01 if the null hypothesis is true. Item #2: Minibar charges correctly posted at check-out: H 0 : p1 p2 p3 H1 : Not all p j are equal Test statistic: fo fe (e) fo fe 2 1.51 fe All cells (g) 7.0 Decision: Since the measured test statistic of 7.0 is greater than the critical value of 5.991, we reject the null hypothesis and conclude that there is significant relationship between item #2 and the city. The p value is 0.03. The probability of observing a sample that gives rise to a test statistic more extreme than 7.0 is 0.03 if the null hypothesis is true. Item #3: Bathroom tub and shower spotlessly clean: H 0 : p1 p2 p3 H1 : Not all p j are equal Test statistic: (f) 2 fe All cells (d) 9.163 fe All cells (b) 2 Decision: Since the measured test statistic of 1.51 is smaller than the critical value of 5.991, we do not reject the null hypothesis and conclude that there is no significant relationship between item #3 and the city. The p value is 0.470. The probability of obtaining a sample that gives rise to a test statistic more extreme than 1.51 is 0.470 if the null hypothesis is true. Marascuilo procedure for Item #1: U2 2.4478 ; U2 pS j 1 pS j nj Critical range = p S j' 1 p S j' n j' 5 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove (g) Sample Group Proportio n 0.26 1 (i) Sample Size Absolute Std. Error Critical Comparison Difference of Difference Range Results 200 Group 1 to Group 2 0.13 0.04638426 0.114 Means are different 0.39 200 Group 1 to 0.02 0.04438468 0.109 Means 2 Group 3 are not different 0.28 200 Group 2 to 0.11 0.0468775 0.115 Means 3 Group 3 are not different There is a difference between Hong Kong and New York in the proportion of hotels that use the guest’s name. Marascuilo procedure for Item #2: Sample Sample Absolute Std. Error Critical Group Proportio Size Comparison Difference of Difference Range Results n 0.86 200 Group 1 to 0.1 0.03891015 0.095 Means 1 Group 2 are different 0.76 200 Group 1 to 0.08 0.03820995 0.094 Means 2 Group 3 are not different 0.78 200 Group 2 to 0.02 0.04207137 0.103 Means 3 Group 3 are not different There is a difference between Hong Kong and New York in the proportion of hotels that correctly post Minibar charges. The larger is the sample size, the higher is the power of the test. When the sample size is doubled, the ability of the test to recognize a difference in the proportion of hotels that use the guest's name among the 3 cities and the proportion of hotels that correctly post Minibar charges among the 3 cities is increased. However, we still cannot conclude that there is a significant difference in the proportion of hotels with spotless bathroom tub and shower among the 3 cities at 0.05 level of significance. 6 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove Exercise 12.26 [12.27 in 9th] (12.27 in 8th edition): On the basis of the first table shown below, are the time of year and numbers selected independent? Solution: The solution is repeated from the Instructor’s Solution Manual . This is a test of independence. 12.27 (a) Decision: Since the calc 20.680 is above the critical bound of 12.592, reject H0. There is evidence of a relationship between the quarter of the year in which draftable-aged men were born and the numbers assigned as their draft eligibilities during the Vietnam War. It appears that the results of the lottery drawing are different from what would be expected if the lottery were random. (a) H0: There is no relationship between the quarter of the year in which draftableaged men were born and the numbers assigned as their draft eligibilities during the Vietnam War. H1: There is a relationship between the quarter of the year in which draftableaged men were born and the numbers assigned as their draft eligibilities. 2 2 Decision rule: If > 12.592, reject H0. Test statistic: 9.803 2 (b) (c) Decision: Since the calc 9.803 is below the critical bound of 12.592, do not reject H0. There is not enough evidence to conclude there is any relationship between the quarter of the year in which draftable-aged men were born and the numbers assigned as their draft eligibilities during the Vietnam War. It appears that the results of the lottery drawing are consistent with what would be expected if the lottery were random. 2 (b) 7 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove Problem E1: (Sincich}An Ernst and Young survey of 126 warehouses operated by retail stores tests the independence of the number of deliveries to stores per week from warehouse size. Use .05 for a test of independence. Deliveries/week 1 or fewer 2-3 4-5 Size (thousands of square feet) Below 100 100-249.9 250-400 5 13 9 12 11 13 9 14 13 Above 400 5 6 11 Solution: This is an extremely basic chi-squared problem with H 0 : Independence . First we total rows and columns and then compute the fraction of data in each row. For example in the first row there are 32 deliveries out of a total of 126, so the fraction in the first row is 126/30=.3540, which is the first element in p r . O (Observed) is the array in the frame. O Size (thousands of square feet) Deliveries /week Below 100 100 - 249.9 1 or fewer 5 13 2-3 11 12 4-5 9 14 Total 26 38 250 - 400 9 13 13 35 Above 400 5 6 11 27 Total 32 42 pr .2540 .3333 52 .4127 126 1.0000 We use p r to get E (Expected) by multiplying the column totals. For example we get 6.6032 by multiplying the first column total, 26, by .2540. Row and column totals remain the same except for rounding error. E Deliveries /week 1 or fewer 2-3 4-5 Total Size (thousands of square feet) Below 100 6.6032 8.6667 10.7302 26.0001 100 - 249.9 9.6508 12.6667 250 - 400 8.8889 11.6667 Above 400 6.6571 9.0000 14.6825 14.4444 11.1429 38.0000 35.0000 27.0000 Total 32 42 pr .2540 .3333 52 .4127 126 1.0000 8 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) We now place corresponding values of O and O Roger Even Bove E 2 O . Degrees of freedom are E r 1c 1 3 14 1 6 , where r is the number of rows and c is the number of columns. E together to get 5 12 9 13 11 14 9 13 13 5 6 11 126 O2 n 133 .428 126 7.428 E . We compare this with .2056 12.5916 from 2 the 2 table. Since our computed 2 is less than the table 2 , we cannot reject H 0 . 6.6032 8.6667 10.7302 9.6508 12.6667 14.6825 8.8889 11.6667 14.4444 6.6571 9.0000 11.1429 126.0001 O2 E 3.7860 16.6153 7.5488 17.5115 9.5526 12.4960 9.1125 14.4857 11.7000 3.6459 4.0000 22.9743 133.426 Problem E2: A random sample of 64 cans of each of 3 brands of canned fruit is examined. The proportion that are not as labeled is .1094 for brand 1, .0781 for brand 2 and .1563 for brand 3. Is the proportion the same for each brand? .01 Solution: We are testing H 0 : Homogeneity or H 0 : p1 p2 p3 , where p1 is the proportion not as labeled in batch 1, p 2 is the proportion not as labeled in batch 2, etc. Since the quantities in O must be whole numbers, we get the first row of O by multiplying 64, the batch size, by .1094 for brand 1, .0781 for brand 2 and .1563 for brand 3 and taking the nearest integer. The first value is thus .1094 64 7.0016 and we use 7. Our O table is thus the table at right. We use our row proportions to create E at right. O2 . Degrees of freedom are E r 1c 1 2 13 1 2 , where r is the number of rows and c is the number of columns. We now place corresponding values of O and E together to get O2 n 193 .9508 192 1.9508 . We compare this with .2012 9.21034 from the 2 table. E Since our computed 2 is less than the table 2 , we cannot reject H 0 . 2 O Batch 1 Batch2 Batch 3 Not as labeled 7 As labeled 57 Total 64 5 10 59 64 54 64 E Batch 1 Batch2 Not as labeled 7.3333 7.3333 As labeled 56.6667 56.6667 Total 64 64 Batch 3 7.3333 56.6667 64 Total pr 22 .1146 170 192 .8854 1.0000 Total 21.9999 170.0001 192 .0000 pr .1146 .8854 1.0000 9 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) O O2 E 6.6818 57.3353 3.4091 61.4294 13.6364 51.4588 193.9508 7 57 5 59 10 54 192 E 7.3333 56.6667 7.3333 56.6667 7.3333 56.6667 192 Roger Even Bove 10 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove Problem E3: A real estate firm wants to check whether selling price is related to the number of days a home is on the market. A random sample of 100 homes is taken and divided into three classes according to selling price. The realtor discovers that 57% of the 30 homes in the under $100,000 class were on the market for 60 days or fewer. 38% of the 50 homes in the $100,000 - $200,000 class were on the market for 60 days or fewer. Finally, in the above $200,000 class, 35% of 20 homes were on the market for 60 days or fewer. a. Do a test of the equality of proportions for the $100,000-$200,000 class and the above $200,000 class. Repeat this test as a chi-squared test. b. Do a test of equality of proportions for all three classes. Solution: Our data is n 100 p1 .57, p 2 .38, p3 .35 and n1 30, n2 50, n3 20 . a) H 0 : p2 p3 H 1 : p 2 p 3 Let p p 2 p3 and p p 2 p3 .38 .35 .03 Then our hypotheses are H 0 : p 0 and H 1 : p 0 . Interval for Confidence Interval Hypotheses Test Ratio Difference between proportions q 1 p p p z 2 sp H 0 : p p0 p p1 p2 H1 : p p0 z sp Critical Value pcv p0 z 2 p p p0 If p0 0 p p If p 0 p1q1 p2 q 2 n1 n2 p0 p01 p02 or p 0 0 p p01q 01 p02 q 02 n1 n2 Or use p0 q 0 1 n1 1 n2 n p n2 p2 p0 1 1 n1 n2 s p We should replace 1 with 2 and 2 with 3 in these formulas. p0 n2 p 2 n3 p 3 n2 n3 50 .38 20 .35 26 .3714 50 20 70 1 1 n2 n3 p p0 q0 z p p 0 Note: 2 p q 0 1 p 0 1 .3714 .6286 .3714 .6286 1 1 50 20 .03 0 0.2347 . .12784 .3714 .6286 .07 .016343 0.12784 Since this is between z z .025 1.96 we cannot reject H 0 . 2 To do a 2 test, use the numbers in columns 2 and 3 below, but instead of using O E 2 E , use 2 OE 1 2 E 2 . 11 252solnE1 10/23/06 (Open this document in 'Page Layout' view!) Roger Even Bove b) H 0 : Homogeneity or H 0 : p1 p2 p3 Solution: We are testing H 0 : Homogeneity or H 0 : p1 p2 p3 , where p1 is the proportion sold in 60 days among the expensive homes, p 2 is the same proportion among midrange homes, etc. To get the top line of O , multiply p1 .57, p 2 .38 and p 3 .35 by n1 20, n 2 50 and n3 20 respectively .As in the previous problem the results must be whole numbers. O Sold in 60 days Not sold Total E We use our row proportions to create E at right. We now place corresponding values of O and O2 . Degrees of freedom are E r 1c 1 2 13 1 2 , where r is the number of rows and c is the number of columns. O2 2 n 103 .3184 100 3.31837 . E We compare this with .2052 5.9915 from the E together to get 2 table. Since our computed 2 is less than the table 2 , we cannot reject H 0 . Sold in 60 days Not sold Total O E 17 19 7 13 31 13 100 12.9 21.5 8.6 17.1 28.5 11.4 100.0 Expensive 17 13 30 Midrange 19 31 50 Cheapest 7 13 20 Total Expensive 12.9 17.1 30.0 Midrange 21.5 28.5 50.0 Cheapest 8.6 11.4 20.0 Total pr 43.0 57.0 100.0 .43 .57 1.00 43 57 100 O2 E 22.4031 16.7907 5.6977 9.8830 33.7193 14.8246 103.3184 12 pr .43 .57 1.00