252solnE2 10/25/05 (Open this document in 'Page Layout' view!) E. CHI-SQUARED AND RELATED TESTS. 1. Tests of Homogeneity and Independence Text 12.18, 12.19 - 21, 12.26 [12.23*, 12.24, 12.27] (12.22, 12.27) E1, E2, E3 2. Tests of Goodness of Fit Text 12.51, 12.54 [12.49*, 12.52*. Both on CD12_5], E4, E5, E6 a. Uniform Distribution b. Poisson Distribution c. Normal Distribution 3. Kolmogorov-Smirnov Test E7, E8, E9, E10, E11 a. Kolmogorov-Smirnov One-Sample Test b. Lilliefors Test. This document includes Exercises 12.49, 12.27 and Problems E4-E6. ----------------------------------------------------------------------------------------------------------------------------- ---- Problems Involving Tests of Goodness of Fit. Exercise 12.51 [12.49 in 9th] (On CD not in 8th edition): The manager of a computer network has the following data on service interruptions per day over the last 500 days. Does it follow a Poisson distribution? .01 Interruptions Number per day of days 0 160 1 175 2 86 3 41 4 18 5 12 6 8 Total 500 Solution: H 0 : Poisson First find a mean for the distribution. Our first step would be to find a mean for the distribution. We would then have to use a Poisson table with O xO x a mean of 1.3. 0 160 0 O2 f E fn O x 1 175 175 E 2 86 172 0 .272532 136.27 160 187.862 3 41 123 1 .354291 177.15 175 172.876 4 18 72 2 .230289 115.14 86 64.235 5 12 60 3 .099792 49.90 41 33.687 6 8 48 4 .032432 16.22 18 19.975 500 650 5 .008432 4.22 12 34.123 xO 650 6 .002230 1.11 8 57.658 1.3 This would give us m n 500 1.0000 500.01 500 570.414 We have only 5 degrees of freedom since we lost one degree of freedom because we estimated the mean from the data. .2015 15 .0863 . Since our computed chi-square of 570.414 – 500 = 70.414 is above the 1 252solnE2 10/30/03 table chi-square, we reject the null hypothesis. However, the small values of E in the last two rows make O E 2 12 4.22 2 14.34 and O E 2 8 1.112 42.77 me suspicious and I will compute E 4.22 E 1.11 These account for more than all the difference between the table value and the computed value, so let’s merge the last two rows and try again. 0 1 2 3 4 5+ E fn O 136.27 177.15 115.14 49.90 16.22 5.33 500.01 160 175 86 41 18 20 500 f x .272532 .354291 .230289 .099792 .032432 .010732 1.0000 O2 E 187.862 172.876 64.235 33.687 19.975 75.047 553.680 .2014 13 .2763 . Since our computed chi-square of 553.680 – 500 = 53.680 is above the table chi square, we reject the null hypothesis. Exercise 12.54 [12.52 in 9th] (On CD not in 8th edition): A random sample of 500 car batteries revealed the following distribution of battery life in years. If x 2.80 and s 0.97 , does it follow a Normal distribution? .05 Life Frequency 0 under 1 12 1 under 2 94 2 under 3 170 3 under 4 188 4 under 5 28 5 under 6 Total 8 500 Solution: Let’s try to get the probabilities, by subtracting x 2.80 from the mean and dividing by s 0.97 to get z . Then get F z from the Normal table. Income in Thousands Under 1 1–2 2–3 3–4 4–5 Over 5 Sum x 1 2 3 4 5 z F z f z E fn O -1.86 -0.82 0.21 1.24 2.27 0.0314 0.2061 0.5832 0.8925 0.9884 1.0000 0.0314 0.1747 0.3771 0.3093 0.0959 0.0116 15.70 87.35 188.55 154.65 47.95 5.80 500.00 12 94 170 188 28 8 500 O2 E 9.171 101.156 153.275 228.542 16.350 11.034 519.528 We have only 6 – 1 – 2 = 3 degrees of freedom since we lost two degrees of freedom because we estimated 2 3 the mean and variance from the data. .05 7.8147 . Since our computed chi-square of 519.528 – 500 = 19.528 is above the table value, reject the null Hypothesis. 2 252solnE2 10/30/03 Problem E4: Check to see if the following 1000 tax payments come from the distribution N(25000, 10000). Amount in thousands Number Below 10 40 10-15 60 15-20 170 20-25 140 25-30 180 30-35 210 35-40 80 Above 40 120 Solution: H 0 : N 25000 ,10000 .01 Income in x z F z f z Thousands 10 -1.5 0.0668 0.0668 Below 10 15 -1.0 0.1587 0.0919 10-15 20 -0.5 0.3085 0.1498 15-20 25 0.0 0.5000 0.1915 20-25 30 0.5 0.6915 0.1915 25-30 35 1.0 0.8413 0.1498 30-35 40 1.5 0.9332 0.0919 35-40 1.0000 0.0668 40 up the normal table and either adding the table value to .5 (if .5 (if z is negative). E 66.8 91.9 149.8 191.5 191.5 149.8 91.9 66.8 To find E , write the top of each interval in the x column. Then x x 25 find z . 10 F z is the cumulative distribution, for example, .1587 is F 1.0 Pz 1.0 . The F z column can be found by looking up the value for z in z is positive) or subtracting the table value from Since there are eight cells DF 8 1 7 . From the chi-squared table .2017 18 .475 , but the 2 we compute is 107.1918, so we reject H 0 . Note that this problem could be done using a KolmogorovSmirnov test. Income in E O 2 E O 2 E O E O Thousands E 66.8 40 26.8 718.24 10.7521 Below 10 91.9 60 31.9 1017.61 11.0730 10-15 149.8 170 -20.2 408.04 2.7239 15-20 191.5 140 51.5 2652.25 13.8499 20-25 191.5 180 11.5 132.25 0.6906 25-30 149.8 210 -60.2 3624.04 24.1925 30-35 91.9 80 11.9 141.61 1.5409 35-40 66.8 120 -53.2 2830.24 42.3689 40 up 1000.0 1000 0.0 107.1918 3 252solnE2 10/30/03 Problem E5: See if Frunzi earthquakes fit a Poisson distribution with parameter of 1. Earthquakes Per Day 0 1 2 3 4 or more Number of Days Observed 25 17 5 2 1 Solution: H 0 : Poisson(1) , .05 . f comes from the Poisson table. E fn f x 0 1 2 3 4+ .3679 .3679 .1839 .0613 .0190 1.0000 18.395 18.395 9.195 3.965 0.950 50.000 O 2 O E 25 17 5 2 1 50 x E fn O 0 1 2+ 18.395 18.395 13.210 50.000 25 17 8 50 O2 E 33.977 15.711 4.845 54.533 O2 n 54.533 50 4.533 . There are E only three cells now. DF 3 1 2 . From the ( 2) chi-squared table 2 .05 5.991. Due to the small size of the E’s in the last two cells we must merge the last three cells to get the table at right. So we do not reject H 0 . Note that this problem could be done using a Kolmogorov-Smirnov test. Note: What if our H 0 was simply H 0 : Poisson ? Our first step would be to find a mean for the distribution. x O xO 0 1 2 3 4+ 25 17 5 2 1 50 0 17 10 6 4 37 This would give us m any case we would have lost a degree of freedom from estimating the mean. For example, using Minitab to generate the 0.74 table we would get: x xO 37 .74 n 50 We would then have to use a Poisson table with a mean of 0.7 unless a computer was available to generate a Poisson table with a mean of. 0.74. In f E fn 0 .4771 23.86 1 .3531 17.66 2 .1305 6.53 3 .0360 1.61 4 .0060 0.30 5 .0009 0.05 6 .0001 0.01 But, once again we would have to merge our lowest cells, so that the last line would be 3+ with an E of 8.51. Once again we would have only three cells, but, because of our estimation of the mean, DF 3 1 1 1 . 4 252solnE2 10/30/03 Problem E6: Check to see if the earthquakes in Frunzi fit a Poisson Distribution Earthquakes Per Day 0 1 2 3 4 5 6 7 Number of Days Observed 40 45 7 4 2 0 1 1 H 0 : Poisson .05 The table at right shows our calculation of the mean for the sample as was done in the previous problem. When we find that the mean is Solution: xO 92 .92 , we use the Poisson table n 100 with a mean of 0.9, but find values of E that are too small. To get a count of 5,we must merge the last 5 rows. O E O2 x m 0 1 2 3+ 40 45 7 8 100 40.66 36.59 16.47 6.28 100.00 E 39.351 55.343 2.975 10.191 107.860 x O xO 0 1 2 3 4 5 6 7 40 45 7 4 2 0 1 1 100 0 45 14 12 8 0 6 7 92 f .4066 .3659 .1647 .0494 .0111 .0020 .0003 .0000 E fn 40.66 36.59 16.47 4.94 1.11 0.20 0.03 0.00 100.00 O2 n 107 .860 1000 7.860 There are only four cells now. DF 4 11 2 . From the chiE ( 2) squared table 2 .05 5.991. So we reject H 0 . 5 252solnE2 10/30/03 Exercise 17.9 from McClave et. al. : (Not assigned but left in because it’s a good example of a test for uniformity) This problem read that the 1997 Equifax/Harris Consumer Privacy Survey asked 128 Internet users to indicate the level of their agreement with the statement "The government needs to be able to scan Internet messages and user communications to prevent fraud and other crimes." Of the respondents 59 agreed strongly, 108 agreed somewhat, 82 disagreed somewhat and 79 disagreed strongly. a) specify null and alternate hypothesis you would use to determine if the opinions of Internet users are evenly divided among the four categories. b) Test the hypothesis in a) using .05. c) In the context of this exercise what are Type I and Type II errors? Solution: a) H 0 : Uniformity or H 0 : p1 p2 p3 p4 where pi is the probability of being in category i.. b) .05 . Since there are 4 categories, each pi must be Row O 1 59 2 108 3 82 4 79 Total 328 O2 E 82 82 82 82 328 1 4 . We divide 328 into 4 equal parts of 82. E 42.4512 142.2439 82.0000 76.1098 342.8049 O2 n 342 .8049 328 14.8049 . E .2053 14.8049 . Since the computed 2 is less than the table 2 , do not reject the null hypothesis. df 4 1 3 2 c) Type I error: Wrongly reject uniformity. Type II error: wrongly fail to reject uniformity. 6