“Goodness of Fit” the Fun way…Chi Square Analysis Have you ever wondered why the package of M&Ms you just bought never seems to have enough of your favorite color? Or, why is it seems that you can’t get enough green M&M’s? What’s going on at the Mars Company? Is the number of the different colors of M&Ms in a package really different from one package to the next, or does the Mars Company do something to insure that each package gets the correct number of each color? Have you stayed up nights wondering about this??? According to Mars every bag of plain M&Ms contains: One way that we could determine if the Mars Co. is true to its word is to sample a package of M&Ms and do a type of statistical test know as a “goodness of fit” test. This test allows us to determine if any differences between our observed measurements (counts of colors from our M&M sample) and our expected numbers (what Mars advertises to be true) are simply due to chance, sample error, or some other reason (i.e. Mars Co.’s sorters aren’t really doing a very good job of putting the correct number of M&Ms in each package. Hypothesis – Differences in measured ratios are not due to chance (but perhaps do to faulty workmanship). Null Hypothesis – If the Mars Co. M&M sorters are doing their job correctly, then there should be no difference in M&M ratios between actual store-bought bags of M&Ms and what the Mars Co. claims are the actual ratios and any differences found are do to chance. What is this Null Hypothesis – it is a hypothesis that is the reverse of what we, the experimenters actually believe; it is put forward to allow the data to contradict it by looking at the probability of the null hypothesis. We will either accept or reject the Null Hypothesis. If we accept the null hypothesis (that there is really no difference in M&M color ratios between what is actually in the bags and what Mars says is in the bags, then any differences we do see are by chance. If we reject the Null Hypothesis, then we will know that the differences ARE NOT do to chance but rather foul play. We will determine if this Null hypothesis is valid or viable. How can we do this by counting the M&Ms in the bags? By using deviation to determine the probability of getting the differences. We must determine what is the probability of getting the observed deviation from the expected results by chance. The statistical calculation of Chi Square will do this. The key is Chi Square increases as the difference between expected and observed increases. The bigger the differences the more likely there has been some significant chance event that is causing the differences. χ2 = … matches data to = ∑( d2/e) An acceptable range Step 1: Calculating Chi Square is PLUG and CHUG: 1. Separate the M&Ms into color categories and count/record the number of each in Table 1. 2. Add your data to the classes so that we will have a large sample size 3. Calculate deviation (d), deviation squared (d2) and Chi Square (χ2) Step 2: Determine the probability that the difference between the observed and expected values occurred simply by chance (sample error) 1. Determine the degrees of freedom. In other words # colors (b/c we are counting different colors) – 1 (because we are measuring each color independently) a. # colors = ______ b. # colors – 1 = _________ c. degrees of freedom = _______ 2. Compare the chi-square value to the appropriate value in the Probability table below Accept the null hypothesis← Reject the null → hypothesis Degrees of Probability Freedom 0.90 0.50 0.25 0.10 0.05 0.01 1 0.016 0.46 1.32 2.71 3.84 6.64 2 0.21 1.39 2.77 4.61 5.99 9.21 3 0.58 2.37 4.11 6.25 7.82 11.35 4 1.06 3.36 5.39 7.78 9.49 13.28 5 1.61 4.35 6.63 9.24 11.07 15.09 Step 3: Accept or Reject the null hypothesis 1. We will reject the null hypothesis for any Chi-square value that has a probability ≤ 0.05. 2. We will accept the null hypothesis for any Chi-square value that has a probability >0.05 Questions: 1. Based on our sample, should we accept or reject our null hypothesis? 2. If we reject it, what might be some possible explanations for this outcome? If we accept it, what might be some possible explanations for this outcome? Expected ratios: TABLE 1: ACTUAL RATIOS: CLASS DATA Groups 1 2 3 4 5 6 7 8 9 10 11 12 Total Color Green Yellow Red Orange Blue Brown TABLE 2: Calculating Chi Square values Color Green Yellow Red Orange Blue Brown XXX XXX XXX XXX XXX XXX Observed Expected (e) Deviation (d) Deviation squared (d2) (d2/e) χ2 = ∑(d2/e) Total