AP BIOLOGY Chi-square Analysis with Skittles INTRODUCTION The Chi-square test (X2) is a test that makes a comparison between the data collected in an experiment versus the data you expected to find. It is used beyond genetics studies and can be used whenever you want to compare the differences between expected results and experimental data. In the case of genetics (and coin tosses) the expected results can be calculated using the Laws of Probability (and possibly the help of a Punnett square). The Chi-square test helps you to decide if the difference between your observed results and your expected results is probably due to random chance alone, or if there is some other factor influencing the results. - Is the variance in your data probably due to random chance alone and therefore your hypothesis about the genetics of a trait is supported by the data? - Are the differences between the observed and expected results probably not due to random chance alone, and your hypothesis about the genetics of a trait is thereby not supported by the data? - Should you consider an alternative inheritance mechanism to explain the results? The Chi-square test will not, in fact, prove or disprove if random chance is the only thing causing observed differences, but it will give an estimate of the likelihood that chance alone is at work. In classical genetics research where you are trying to determine the inheritance pattern of a phenotype, you establish your predicted genetic explanation and the expected phenotype ratios in the offspring as your hypothesis. For example, you think a mutant trait in fruit flies is a simple dominant inheritance. To test this you would set up a cross between 2 true-breeding flies: white-eyed female x wild type male. You would then predict the ratios of phenotypes you would expect from this cross. This then establishes a hypothesis that any difference from these results will not be significant and will be due to random chance alone. This is refered to as your “null hypothesis”. It, in essence, says that you propose that nothing else – no other factors – are creating the variation in your results except for random chance differences. Have you ever wondered why the package of Skittles you just bought never seems to have enough of your favorite color? Or, why is it that you always seem to get the package of mostly Lemon or Lime? What’s going on at the Mars Company? Is the number of the different colors of Skittles in a package really different from one package to the next, or does the Mars Company do something to insure that each package gets the correct number of each color of Skittle? Personally, I’m more of an M&M person but I have opened my mind to chocolate/peanut allergy people so we can do Skittles too! After researching Skittle data (sad but true, I spent 37 minutes doing this), the Mars company claims they make each bag random but approximately 20% of each color should be in each package to make a total of 42 pieces. We can perform a Chi-square test on individual packages and then compare that to a larger population of the large package. I couldn’t find data on how many were supposed to be in a large bag. 1 OBJECTIVES 1. write a null hypothesis that pertains to this investigation 2. determine the degrees of freedom (df) for an investigation 3. calculate the X2 value for a given set of data 4. use the critical values table to determine if the calculated value is equal to or less than the critical value 5. determine if the Chi-square value exceeds the critical value and if the hypothesis is accepted or rejected MATERIALS Individual bag of Skittles Large bag of Skittles to be used by the whole class Paper towels Calculator PROCEDURE Null Hypothesis: __________________________________________________________________ __________________________________________________________________ NOTE: You cannot eat the Skittles until told to do so!! 1. Wash hands BEFORE starting the lab. 2. Place the contents of your individual bag of Skittles on a paper towel. 3. Record the number of different colors (classes) in Table 1 and Table 2 as “Number Observed” (o). 4. Calculate the number of each color expected in Table 1 and Table 2 as “Number Expected” (e). Hint: You must count all the colors and add the total number of Skittles before you can calculate the number of expected of each color. Round to the nearest whole number. 5. When everyone has completed the lab portion of their individual bag, work as a class to count and record the large bag. Record numbers in Table 3 and Table 4. 6. If you still have time after the lab portion of both the individual bag and the class bag, you may eat Skittles while you calculate the Chi-square. 7. Left over Skittles should be placed in a Ziploc bag. Don’t worry, you will get the rest at a later date! 2 Table 1- Individual Color of Candy Grape (purple) Number Observed (o) Percentage Expected 20% Lemon (yellow) 20% Strawberry (pink/red) 20% Orange (uh, orange!) 20% Lime (green) 20% Number Expected Total # Candies = 42 Table 2- Individual Chi-square Class (colors) Expected (e) Grape (purple) Observed (o) (o-e)2 o-e (o-e)2/e Lemon (yellow) Strawberry (pink/red) Orange (uh, orange!) Lime (green) Table 3- Class Data Color of Candy Grape (purple) Number Observed (o) Percentage Expected 20% Lemon (yellow) 20% Strawberry (pink/red) 20% Orange (uh, orange!) 20% Lime (green) 20% Number Expected Total # Candies = 3 Table 4- Class Chi-square Class (colors) Expected (e) Grape (purple) Observed (o) o-e (o-e)2 (o-e)2/e Lemon (yellow) Strawberry (pink/red) Orange (uh, orange!) Lime (green) DATA ANALYSIS With the Chi-square calculation table completed, you would look up your Chi-square value on the Chi-square Distribution table on the bottom of this lab. But to know which column and row to use on that chart, you must now determine the degrees of freedom to be used and the acceptable probability that the Chi-square you obtained is caused by chance alone or by other factors. The following two steps will help you to determine the degrees of freedom and the probability. The rows in the Chi-square Distribution table refer to degrees of freedom. The degrees of freedom are calculated as the one less than the number of possible results in your experiment. df = n-1 example: 5 classes of colors – 1 = 4 QUESTIONS 1. What is the X2 value for your data? 2. What is the critical value (p = 0.05) for your data? 3. Is your null hypothesis accepted or rejected? Explain why or why not. 4. If the null hypothesis is rejected, propose an alternate and recalculate the X2 value. 5. How many degrees of freedom are there? 6. Compare your individual data with the class data. What differences or similarities do you see? Explain which “population” is a better study. 4