IB Biology M & M Statistics Chi-Square Analysis (X2) Name__________________________________ Period___________ Date_______________ Purpose: Use Chi-Square analysis and compare observed and expected ratios of M&Ms. Materials: Bags of M&Ms, clean hands, paper plates Problem Statement: Have you ever wondered why the package of M&Ms you just bought never seems to have enough of your favorite color? Or, why is it that you always seem to get the package of mostly brown M&Ms? What’s going on at the Mars Company? Is the number of the different colors of M&Ms in a package really different from one package to the next, or does the Mars Company do something to insure that each package gets the correct number of each color of M&M? I’ll bet that you have stayed up at night wondering about this! Well, I recently visited the M&M web page and found that the Mars Company claims that each package of M&Ms that they sell should have the stated percentages (see laminated sheet) of each color of M&M. Wow…that must be some job! One way to determine if the Mars Company is true to its word is to sample a package of M&Ms and conduct a statistical test known as a “goodness of fit” test. This type of statistical test allows one to determine if any differences between observed measurements (counts of colors form our M&M sample) and our expected (what the Mars Co. claims) are simply due to chance sample error, or some other reason (i.e. the Mars Co.’s sorters are not doing a very good job of putting the correct number of M&M’s in each package). The “goodness of fit” test that you will be using is known as a Chi square analysis. This test is generally used when one is dealing with discrete data (i.e. count data, or non-continuous data). You will be calculating a statistic (this time it is called the Chi-square or X2 test). In addition, you will be using a table to determine the probability of arriving at this particular (X2) value. Remember that the probability values tell us whether the differences observed in the data are due to chance alone. This is known as sample error. Let’s begin by stating the null hypothesis: If the Mars Company M&M sorters are doing their job correctly, then there should be no difference in M&M color ratios between actual store-bought bags of M&Ms and what the Mars Co. claims represents the actual ratios. To test this hypothesis we will need to calculate the (X2) statistic, which is calculated in the following way: X2 = Sum ( of (d2/e) Where d is the difference between the observed and expected for each color category, and e is the expected value for each color category. After the quantity d2/e is calculated for each category, the values of all categories are summed. 1 Another way to represent this formula is as follows: The main thing to note about this formula is that, when all else is equal, the value of (X2) increases as the difference between the observed and expected values increase. Degrees of Freedom: Once you have determined your X2 value, you must compare the calculated value to the appropriate value in a probability table. Note the term “degrees of freedom”. For this statistical test, the degree of freedom is equal to the color categories, minus one. For example, if there are 6 colors, the degrees of freedom = 6-1=5. The reason why it is important to consider degrees of freedom is that the value of the chi-square statistic is calculated as the sum of the squared deviations for all categories/colors. The natural increase in the value of chi-square with an increase in categories/colors must be taken into account. Procedure: 1. Wash your hands! You may want to munch on the food you will be handling! 2. Open a bag of M&Ms and split them up between members of your lab group (use paper plates here!) 3. DO NOT EAT ANY M&MS JUST YET! 4. Determine the expected number of M&Ms for your bag, using the laminated charts; record in Data Table A 5. Separate the M&Ms in to color categories and count them, to determine the actual number of M&Ms of each color; record in Data Table A. 6. Calculate the Chi square value in Data Table B and show your calculations. 7. Determine the degrees of freedom (number of color categories-1) 8. Use the Probability Table on page 4 and use a 0.05 probability level as the critical value. Read the parameters for using the table and decide between the two options below: If the calculated chi-square value is less than the 0.05 value, we reject the NULL hypothesis If the value is greater than the 0.05 value, we accept the NULL hypothesis 9. Share your data with the class! 2 Data Table A: Expected and Observed Color Ratios for M&Ms (type=_______) Red Brown Expected * (percent) Blue Yellow Green Orange **Observed (number/percent) *source: http://us.mms.com/us/about/products/ ** # M&Ms of one color x 100 Total # M&Ms Data Table B: Chi square Calculations Individual Data M&M’s Type___________ A Observed (o) B Expected (e) C (obs-exp) d=deviation D (obs-exp)2 d2=deviation squared E (obs-exp)2 exp d2 /e Red Brown Blue Yellow Green Orange X2 Degrees of Freedom Accept or Reject NULL hypothesis? (Use Probability Table) - 3 To use the Probability Table: Scan across the row corresponding to the correct degree of freedom that you determined for your bag of M&Ms. Values of the chi-square are given for several different probabilities, ranging from 0.90 on the left, to 0.01 on the right. Note that the chi-square increases as the probability decreases. Notice that a chi-square value as large as 1.61 would be expected by chance in 90% (0.90) of the cases, whereas one as large as 11.07 would only be expected by chance in 1% (0.01) of the cases. The column that we need to concern ourselves with is the one under “0.05”. Scientists, in general are willing to say that if their probability of getting the observed deviation from the expected results by chance is greater than 0.05 (5%)then we can accept the null hypothesis. In other words, there is really no difference in M&M color ratios between actual store-bought bags of M&Ms and what the Mars Co. claims are the actual ratios. Stated another way…any differences we see between what Mars claims and what is actually in a bag of M&Ms just happened by chance sampling error. Five percent! That’s not much, but its good enough for a scientist! If, however, the probability of getting the observed deviation from the expected results by chance is less than 0.05 (5%), then we should reject the null hypothesis. In other words, for our study, there is a difference in M&M color ratios between actual store-bought bags of M&Ms and what the Mars Co. claim are the actual ratios. Stated another way…any differences we see between what Mars Co. claims and what is actually in a bag of M&Ms, did not just happen by chance sampling error. Based on your sample, should you reject or accept the NULL hypothesis? You decide! Record your answer in DATA Table B. If you accept or reject the NULL hypothesis, what might be some possible explanations for this outcome? Accept the NULL Hypothesis Reject the NULL Hypothesis Probability Degrees of 0.90 Freedom 0.50 0.25 0.10 0.05 0.01 1 0.016 0.46 1.32 2.71 3.84 6.64 2 0.21 1.39 2.77 4.61 5.99 9.21 3 0.58 2.37 4.11 6.25 7.82 11.35 4 1.06 3.36 5.39 7.78 9.49 13.28 5 1.61 4.35 6.63 9.24 11.07 15.09 4