Chi-square Goodness of fit wksht Name _________________________ 1.) Pi – we know the approximation of 3.14 but did you know that ∏ is an irrational number that goes on forever and ever without any repeating pattern? Did you know at the 523,551,502nd digit of pi, the sequence 123456789 appears for the first time? However there are no 0’s and only one 7 in the first 20 decimal places of pi. Does that pattern persist, or do the digits show up with equal frequency? The table shows the number of times each digit appears in the first million digits. Test the hypothesis that the digits 0 through 9 are uniformly distributed in the decimal representation of ∏. Digit Count 0 99959 1 2 3 4 5 6 7 8 9 99758 100026 100229 100230 100359 99548 99800 99985 100106 Expected Counts 100,000 100,000 100,000 100,000 100,000 100,000 100,000 100,000 100,000 100,000 Assumptions: Representative sample of digits of pi All expected counts are greater than 5 2 Chi-square goodness-of-fit test obs exp 2 exp Ho: The digits 0-9 are uniformly distributed in the first million digits of pi. Ha: At least one proportion is different. (obs exp)2 5.509 with 9 df exp 2 p-value = x2cdf(5.509, ∞, 9) = .7879 α = .05 Since p-value > α, we fail to reject the null hypothesis. There is not sufficient evidence to suggest that the digits 0 – 9 are not uniformly distributed. 2) Find the p-value for the given chi-square test statistic and degrees of freedom. Give the decision that you would make at a significance level of α = .01. a) X2 = 7.5, df = 2 p-value = .024 fail to reject b) X2 = 13.0, df = 6 c) X2 = 18.0, df = 9 p-value=.035 fail to reject d) X2 = 21.3, df = 4 p-value=.043 fail to reject p-value=.000276 reject 3) An article about the California lottery gave the following information on the age distribution of adults in California: 35% are between 18 and 34 years old, 51% are between 35 and 64 years old, and 14% are 65 years old or older. The article also gave information on the age distribution of those who purchase lottery tickets as recorded below. Age of purchaser 18-34 Frequency (70) 36 35-64 (102) 130 65 and older (28) 34 Suppose that the data resulted from a random sample of 200 lottery ticket purchasers. Based on these sample data, is it reasonable to conclude that one or more of these three age groups buy a disproportionate share of lottery tickets? Assumptions: Random sample of lottery tickets All expected counts are greater than 5 (in parentheses) Ho: Lottery tickets sold are proportionate to age groups. Ha: One or more of the age groups buy a disproportionate share of lottery tickets. Chi-square goodness-of-fit test (obs exp)2 25.486 with 2 df exp 2 p-value = x2cdf(25.486, ∞, 2) = .0000029 α = .05 Since p-value < α, we reject the null hypothesis. There is sufficient evidence to suggest that one or more age group buys a disproportionate share of lottery tickets. 4) According to Census Bureau data, in 1998 the California population consisted of 50.7% whites, 6.6% blacks, 30.6% Hispanics, 10.8% Asians, and 1.3% other ethnic groups. Suppose that a random sample of 1000 students graduating from California colleges and universities in 1998 resulted in the following data on ethnic group. Ethnic Group White Number in Sample (507) 679 Do these data provide evidence that the proportion of students graduating Black (66) 51 the appropriate hypotheses using α = .01. Hispanic (306) 77 Asian (108) 190 Other (13) 3 from colleges and universities in California for these ethnic group categories differs from the respective proportions in the population for California? Test Assumptions: Random sample of students All expected counts are greater than 5 (in parentheses) Ho: The proportion of students of different ethnic groups graduating from college in California is proportionate to the population ethnic groups. Ha: One or more of the proportions is different. Chi-square goodness-of-fit test 2 (obs exp)2 303.09 with 4 df exp p-value = x2cdf(303.09, ∞, 4) = 0 α = .01 Since p-value < α, we reject the null hypothesis. There is sufficient evidence to conclude that one or more of the proportions of college graduates in California differs from the ethnic group population proportions. 5) Criminologists have long debated whether there is a relationship between weather and violent crime. The author of the article “Is There a Season for Homicide?” classified 1361 homicides according to season resulting in the following data: Winter Spring Summer Fall 328 (340.25) 334 (340.25) 372 (340.25) 327 (340.25) Do these data support the theory that the homicide rate is not the same over the four seasons? Test the relevant hypotheses using a significance level of .05. Assumptions: Representative sample of homicides All expected counts are greater than 5 (in parentheses) Ho: Homicides are distributed evenly throughout the seasons. Ha: Homicides are not distributed evenly throughout the seasons. Chi-square goodness-of-fit test 2 (obs exp)2 4.03 with 3 df exp p-value = x2cdf(4.03, ∞, 3) = .2577 α = .05 Since p-value > α, we fail to reject the null hypothesis. There is not sufficient evidence to conclude that homicides are not distributed evenly throughout the season.