Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal: I can compute expected counts, conditional distributions, and contributions to the chi-square statistic. h.w: pg. 621: 1, 3, 5, 9, 11 Test for Goodness of Fit To analyze categorical data, we construct two-way tables and examine the counts or percents of the explanatory and response variables. Count and record M&M colors per bag. Expected count: M&Ms Color Distribution % according to their website Brown Yellow Red Blue Orange Green Plain 13 14 13 24 20 16 Peanut 12 15 12 23 23 15 Peanut 10 Butter/ Almond 20 10 20 20 20 We want to compare the observed counts to the expected counts. The null hypothesis is that there is no difference between the observed and expected counts. The alternative hypothesis is that there is a difference between the observed and expected counts Simulate count of M&M’s bag or use own M&M’s bag Label: 1-13 14-27 28-40 41-64 65-84 85-00 Brown Yellow Red Blue Orange Green Math:Prb:Randint(0,99,50) sto in L1 Sort in ascending and tally. Chi-square statistic 2 O E 2 E Go to Blank student notes. It measures how well the observed counts fit the expected counts, assuming that the null hypothesis is true. The distribution of the chi-square statistic is called the chi-square distribution, X 2. This distribution is a density curve. The total area under the curve is 1. The curve begins at zero on the horizontal axis and is skewed right. As the degrees of freedom increase, the shape of the curve becomes more symmetric. Pg. 703 “Goodness of Fit Test.” Using the M&M Minis® chi-square statistic, find the probability of obtaining a X2 value at least this extreme assuming the null hypothesis is true. Use your Chi-square statistic and df = 6-1 = 5 P-value = X2 cdf(lb,up,df) CONDITIONS for Individual Expected Counts: The Goodness of Fit Test may be used when all expected counts are at least 1 and no more than 20% of the expected counts are less than 5. Following the Goodness of Fit Test, check to see which component made the greatest contribution to the chi-square statistic to see where the biggest changes occurred. Conditions for Chi-Square Test Random: The data come from a random sample or a randomized experiment. Large sample size: All expected counts are at least 5. Independent: Individual observations are independent. When sampling without replacement, check the 10% condition. Ex: The Graying of America It is believed that with better medicine and healthier lifestyles, people are living longer and consequently a larger percentage of the population is of retirement age. Compare distribution of 1980 population to 1996 population. Step 1: State - Identify the population of interest and the parameter you want to draw a conclusion about. State the hypothesis in words and symbols. We want determine if the distribution of age groups in the United States in 1996 has changed significantly from the 1980 distribution. Ho: the age group dist. in 1996 is the same as the 1980 dist. Ha: the age group dist. in 1996 is different from the 1980 dist. Or, State the hypothesis as proportions. Ho: p0-24 = 0.4139, p25-44 = 0.2768, p45-64 = 0.1964, p65+ = 0.1128. Ha: at least one of the proportions differs from the stated values. Goal of “Goodness of Fit Tests” The more the observed counts differ from the expected counts, the more the evidence we have to reject Ho and thus conclude that the population dist. in 1996 is significantly different from 1980. Always a good idea to plot the data. Step 2: Plan - Choose the appropriate inference procedure. Verify the conditions for using the selected procedure. If the conditions are met, conduct a chisquare goodness of fit test. Random: We must assume the two distributions of age groups come from a randomized experiment. Calculate expected counts in each age category and verify that they are large enough (see conditions). Yes, all > 5; Proceed with Chi – square calculations Independent: We clearly have two independent age groups, one from 1980 and one from 1996. We must check the 10% condition. There are at least 10(286,598) U.S citizens in 1980 and at least 10(500) U.S citizens in 1996. Step 3: Do - If the conditions are met, carry out the inference procedure. Calculate the x 2 statistic to measure how well the observed counts (O) differ form the expected counts (E) under Ho. 2 O E 2 E A large value of x 2 shows more evidence against Ho and also results in a small Pvalue. Calculate P-value df: use n-1 degrees of freedom. This is because X 2 the family of curves is used to assess evidence against Ho. Since we are using percentages, 3 of the 4 percentages are allowed to vary, the 4th is not. Df = 4-1 = 3, Table C for a P-value of 0.05, critical value is 7.81. Calc: 2nd VARS: X 2 cdf(8.2275,E99,3) .0415 Step 4. Conclude - Interpret the results in the context of the problem. Since our value of 8.2275 is more extreme than 7.81, we reject Ho and conclude that the population dist. in 1996 is significantly different from the 1980 dist. at the 5% level. To be cont.