13.1 Goodness of Fit Test AP Statistics Chi-Square Distributions The chi-square distributions are a family of distributions that take on only positive values and are skewed to the right. A specific chisquare distribution is determined by its degrees of freedom. Properties: 1. The total area under a chi-square curve is equal to 1. 2. Each chi-square curve (except when df = 1) begins at 0 on the horizontal axis, increases to a peak, and then approaches the horizontal axis asymptotically from above. 3. Each chi-square curve is skewed to the right. As the number of degrees of freedom increase, the curve becomes more and more symmetrical and looks more like a normal curve (see Figure 13.2 page 732 ). According to the m&m/Mars company, in 1995 “…the new mix of colors of m&m’s plain chocoloate candies will contain 30 percent browns, 20 percent yellows and reds, and 10 percent each of oranges, greens, and blues.” However, the mix of colors has been known to change every few years. Your task today is to determine whether or not the current mix of colors matches that of 1995. We want to see if there is sufficient evidence to reject the company’s 1995 claim. To do this, we’ll be introduced to a new type of test—the Chisquare Goodness of Fit Test. A Goodness of Fit Test is used to determine whether a population has a certain hypothesized distribution. The null hypothesis is that the population proportions are equal to the hypothesized proportions. The alternative is that at least one of the proportions differ from the hypothesized proportions. If all expected counts are at least 1 and 80% of them are greater than 5, then 2 O E 2 X E has an approximately Chi-Square Distribution with df = (k – 1). • Open a bag of milk chocolate m&m’s and carefully count how many of each color are in the sample. Record the observed data in the “observed” row of the table below. • Using the statement from the m&m/Mars company, determine how many of each color you expected to see. Note, you’ll have to figure this out using the total number of m&m’s in your sample bag. Enter these counts in the “expected” row below. • If your bag reflects the distribution advertised in 1995, there should be little difference between the observed and expected counts. To quantify the difference we’ll calculate a total which we’ll call “Chi-Square” or X2. • For each color, perform this calculation: Observed Expected 2 Expected Enter each value in the last row of the table. Add up all of these “component” values to find X2. • If this total value is small, we have little evidence to suggest a difference in distributions. However, the larger X2 gets, the more evidence we have to suggest the company’s claim may no longer be applicable to bags of milk chocolate m&m’s. Brown Yellow Red Orange Green Blue Total Observed 5 5 2 2 3 4 21 Expected 6.3 4.2 4.2 2.1 2.1 2.1 21 .2683 .1524 1.1524 .0048 .3857 1.719 3.6825 O E 2 E To determine the likelihood of observing a difference between observed and expected as extreme as the one we observed, we must look up the p-value on a Chi-Square table. Chi-square distributions are skewed right and specified by degrees of freedom. In a Goodness of Fit test, the degrees of freedom equal one less than the number of categories. • Find the p-value for our test by looking up X2 for 5 degrees of freedom. Sketch the curve and observed X2 below. Interpret the result in the context of the problem. X2cdf(X2, 1E99, df) Since p is large (> α), there is not significant evidence to reject the 1995 claim. Steps: 1. Identify the population of interest and the parameter(s) that you want to draw conclusions about. State hypotheses in words and symbols. 2. Choose the appropriate inference procedure and verify the conditions for using it. Chi-Square Conditions: 1. 2. All individual expected counts are at least 1 No more than 20% of the expected counts are less than 5 3. Carry out the inference procedure (calculate the T.S., df, and p-value). 4. Interpret your results in the context of the problem. Example 1: (13.13 p. 744) A “wheel of fortune” at a carnival is divided into four equal parts: Part I: Part II: Part III: Part IV: Win a doll Win a candy bar Win a free ride Win nothing You suspect that the wheel is unbalanced (i.e., not all parts of the wheel are equally likely to be landed upon when the wheel is spun). The results of 500 spins of the wheel are as follows: Part: Frequency: I 95 II 105 III 135 IV 165 Perform a goodness of fit test. Is there evidence that the wheel is not in balance? Since the wheel is divided into four equal parts, if it is in balance, then the four outcomes should occur with approximately equal frequency. Here are the observed and expected values: Part: I II III IV Observed: 95 105 135 165 Expected: 125 125 125 125 Ho: The wheel is balanced (the four outcomes are uniformly distributed) Ha: The wheel is not balanced We will use a chi-square goodness of fit test to measure the strength of the evidence against the hypothesis that the wheel is balanced. Since all expected counts are greater than 5, we can proceed with the test. df = 3 X 2 2 2 2 2 95 125 105 125 135 125 165 125 125 125 7.2 3.2 0.8 12.8 24 125 125 P X 32 24 2.4980105 X2cdf(24, 1E99, 3) p<α Reject Ho We have significant evidence to conclude that the wheel is not balanced. Since “Part IV: Win nothing” shows the greatest deviation from the expected result, there may be reason to suspect that the carnival game operator may have tampered with the wheel to make it harder to win. Example 2: A statistics student suspected that his 1982 penny was not a fair coin, so he held it upright on a table top with a finger of one hand and spun the penny repeatedly by flicking it with the index finger of his other hand. In 200 spins of the coin, it landed with tails side up 122 times. (a) Perform a goodness of fit test to see if there is sufficient evidence to conclude that spinning the coin does not produce an equal proportion of heads and tails. Ho : The distribution of heads and tails from spinning a 1982 penny shows equally likely outcomes. Ha : Heads and tails are not equally likely. We will use a chi-square goodness of fit test to measure the strength of the evidence against the hypothesis that the penny is a fair coin. Since all expected counts are greater than 5, we can proceed with the test. 2 2 78 100 122 100 2 df = 1 X 100 100 4.84 4.84 9.68 P X12 9.68 .00186 p < α Reject Ho We have significant evidence to conclude that spinning a 1982 penny does not produce equally likely results. (b) Use a one-proportion inference procedure to determine whether spinning the coin is equally likely to result in heads or tails. p = probability of getting tails when the coin is spun Ho : p = 0.5 Ha : p ≠ 0.5 Assume SRS. Assume population > 10(200) np = n(1-p) = 100 > 10 1-prop-z-test .61 .50 z 3.1113 .501 .50 200 p < α Reject Ho. Pz 3.1113and z 3.1113 .00186 There is significant evidence to conclude that heads and tails are clearly not equally likely (α = 0.05) (c) Compare your results for parts (a) and (b). The p-values are identical. Same conclusion.