Genetics Workshop Number Three The ChiSquare Welcome to your third Genetics Workshop. Here we will go slowly and systematically through the steps needed to do chi-square analysis. This Workshop is divided into three different problems and a summary of general ideas. The first one is an analysis of the results from a monohybrid cross and I will walk you though it very slowly. We will go over the fundamentals of chi-square work and reinforce what you learn in the lesson. The second chi-square problem is another monohybrid problem. I will move a little faster while showing you how to organize your math and that should give you a better feel for how to actually do a chi-square. The last chi-square involves a dihybrid cross so it is a little bit harder but it uses the same ideas as a simpler chisquare problem and you will see that it helps to set up a table in order to keep track of the numbers. We will conclude this workshop with a short series of questions to help you to understand that you can use chi-square analysis for more than simply testing the ratios from crosses. By the way, don't get chi-square mixed up with Punnett squares. Punnett squares are a useful way to simulate how alleles come together in a mating and they result in ratios of the different genotypes of offspring. Punnett squares produce expected ratios of genotypes which you can now, easily, transform into ratios of phenotypes. A chi-square is a statistical tool that helps us to decide if the observed ratio is close enough to the expected ratio to be acceptable. Chi-square analysis can be used in any area, not just genetics. Whenever you have to determine if an expected ratio fits an observed ratio, you can use the chi-square. Before we begin, pick up and print out a copy of your Chi-Square Worksheet. Fill it in as we work together and, when you are done each Part, check your answers with mine. (The hyperlink for the answer sheet is at the end of this page.) There is a single worksheet for this entire (three part) Workshop and a single answer sheet too. Just do it a section at a time. After this Workshop you will be well prepared to do the SAQs from the chi-square lessons. Chi-square of a monohybrid cross as a "walk through" Mendel's data from one experiment was ... P = smooth seeds crossed with wrinkled seeds F1 = all smooth seeds (so smooth is dominant and wrinkled is recessive) F2 = 5,474 smooth seeds and 1,850 wrinkled seeds 1. What ratio did he observe? 5474 / 1850 = 2.9589189 : 1 = 2.96 : 1 2. What ratio did he expect? 3:1 You should understand that the chi-square compares the NUMBER (not ratio) observed to the NUMBER (not ratio) expected. You are given the observed numbers and from that data you might guess what the ratio should be. You then use that "guessed" ratio to calculate what the expected numbers would by from that guessed ratio. Calculating the expected number is critical to doing the chi-square and many students have trouble with that first step - they forget how to do it, use it backwards or don't do it at all! Let's work through this important step together so you will understand that logic. You already know the number observed. Smooth = 5474 Wrinkled = 1850 3. What is the total number of seeds? 7324 4. What number of wrinkled is expected? 7324 / 4 = 183 5. What number of smooth is expected? 1831 X 3 = 5493 or 7324 X 3/4 = 5493 OK, you now have the expected numbers calculated from the expected ratio. The best (easiest) way to COMPARE two values is to find their DIFFERENCE (by SUBTRACTION). 6. What is the difference between observed and expected smooth? 5474 - 5493 = -19 7. What is the difference between observed and expected wrinkled? 1850 - 1831 = 19 For "statistical magnification" we INCREASE those differences by squaring them. 8. What is the square of the difference between the observed and expected smooth? -192 = 361 or -19 X -19 = 361 9. What is the square of the difference between the observed and expected wrinkled? 192 = 361 or 19 X 19 = 361 These "square of the differences" are too large and must be "NORMALISED" by dividing each by the number EXPECTED (NOT the number observed). This could be called the "squared differences per expected". 10. What is the square of the difference between the observed and expected smooth, divided by the expected number of smooth? 361 / 5493 = 0.06572 = 0.066 11. What is the square of the difference between the observed and expected wrinkled, divided by the expected number of wrinkled? 361 / 1831 = 0.19716 = 0.197 Lastly, we add together these "squared differences per expected" to give us the TOTAL "squared differences per expected". 12. What is the sum of the "squared differences per expected"? 0.066 + 0.197 = 0.263 the 2 = 0.263 Therefore, the chi-square for this experiment is 2 = 0.263. OK - so what? Statisticians have developed chi-square tables, based upon the probabilities that a particular chi-square value will come about purely by chance. There are two "features" to consider. A. Significance Level…. We (scientists) like to use the level of 5% as our significant "cut-off". Any chi-square larger than the value from the 5% Table indicates an experiment in which the ratios observed are so far off the ratios expected that we have to conclude that the ratios expected are wrong! B. Degrees of Freedom… The more "classes" (categories) the more likely that a statistical "blip" will increase the acceptable limits of the chi-square. The "degrees of freedom" are one less than the number of classes. 13. Name all the different classes in the experiment (earlier)….. Smooth and Wrinkled 14. How many degrees of freedom were in that experiment? 2-1=1 One degree of freedom. Degrees of Freedom 5 % Significance Levels 1 3.84 2 5.99 3 7.81 4 9.49 Here's a portion of the Chi Square Significance Table. 15. Is the chi-square you calculated within the boundary of "the possible"? Yes! We calculated a 2 = 0.263. With one degree of freedom we could have a chisquare up to 3.84 before we would become suspicious that the observed data was in a ratio too far removed from the ratio we tested. Chi-square of a monohybrid cross as a quick table When doing a Chi-square it helps to set it up as a table and to understand that all we have been doing is represented by the equation 2 = [(O - E)2/E] Consider these results among the F2s 4,400 yellow seeds 1,624 green seeds First, set up a table like the one below Phenotypes O E O-E (O-E)2 (O-E)2 E Yellow Green Total Second, enter the data. Remember, data is what is observed. So data goes in the "observed" (O) column. Phenotypes O Yellow 4400 Green 1624 Total 6024 E O-E (O-E)2 (O-E)2 E Next you fill in the "expected" (E) column. Using the total as a starting point divide that number into the two sets of data that would produce the 3 to 1 ratio you expect. Note that it might be easier to do the 1 (green) of the 3 :1 ratio first. However, if you are comfortable with fractions it shouldn't be too hard to do them in any order. Phenotypes O E Yellow 4400 6024 X 3/4 4518 Green 1624 6024 X 1/4 O-E (O-E)2 (O-E)2 E 1506 Total 6024 6024 Notice that the total expected is the same as the total observed. If they don't add up to the same number you have made an error in the math. Now fill in the rest of the table. It's a lot of work but, now that you have it all organized, it should be just a matter of using your calculator correctly. There is no reason to "total" columns O-E or(O-E)2 so leave them blank. However, it is very important to complete the "total" in the last column, (O-E)2/E, because that is the chi-square! Fill in the rest of the table. O E O-E (O-E)2 (O-E)2 E Yellow 4400 6024 X 3/4 4518 4400 4518 -118 -1182 13,924 13924 / 4518 3.08 Green 1624 6024 X 1/4 1506 1624 1506 -118 -1182 13,924 13924 /1506 9.24 Total 6024 6024 Phenotypes 12.32 Is the chi-square you calculated here within the boundary of "the possible"? (To answer that, first go back to the Chi Square Significance Table you saw earlier. Then page back down to here.) NO! 2 = 12.32 but, with one degree of freedom we cannot accept any ratio that gives us a chi-square larger than 3.84. Do we accept that these results are within acceptable range of a 3 : 1 ratio? No! We must reject the 3 : 1 ratio. This data is far off the 3 : 1 ratio. Chi-square of a dihybrid cross as a quick table Consider these results from a dihybrid cross 30 red tall 65 white tall 83 red short 206 white short Before we dive into the chi-square we have to first determine what ratio we will test and which category (class) fits with each part of the ratio. Based upon these numbers, which phenotypes are dominant and recessive for the two loci? (Remember, these are the F2s from a dihybrid cross so they should be close to a specific ratio that you learned earlier. And you also learned which traits end up in each part of that ratio.) Also, as best you can, assign genotypes to these phenotypes. A dihybrid cross should produce a 9 : 3 : 3 :1 ratio in the F2s and a simple look at the numbers will give you an idea of which belongs to each category. The biggest group is the white shorts so they must be the doubly dominant class. In other words, white shorts can be assigned the genotype W-S-. On the opposite end of the ratio, the least represented group, would be the doubly recessive so the red talls are the "1" in the 9 : 3 : 3 :1 ratio and have the genotype wwss. You can deduce the other two classes, making up the "3" in the ratio. The white talls have the genotype W-ss and the red shorts are wwS-. Now that you have identified each category and assigned it to the ratio, we can begin the chi-square to determine if it fits. Let's begin by first arranging our computation table. It will be twice the size of the previous table. It might help to arrange them in the table in a descending order to represent the 9 : 3 : 3 : 1 ratio. Draw the appropriate table including the observed numbers. Phenotypes O White and short (W-S-) 206 Red and short (wwS-) 83 White and tall (W-ss) 65 Red and tall (wwss) 30 Total 384 E O-E (O-E)2 (O-E)2 E Great! We are ready to start. First determine the "expecteds". It might be easier to do the "1" part of the ratio first and work up the table. Regardless, take your time and calculate what the expected numbers should be and fill in the "E" column. Phenotypes O E O-E (O-E)2 (O-E)2 E White and short (W-S-) 206 24 X 9 216 Red and short (wwS-) 83 24 X 3 72 White and tall (W-ss) 65 24 X 3 72 Red and tall (wwss) 30 24 X 1 24 Total 384 384 I hope you were able to work through that and get these numbers too. Did you check your math by adding up the column to make sure the E column equals the C column? Now it is time to fill in the rest of the table and calculate the chi-square. Go ahead and complete the calculations before paging down. Phenotypes O E O-E (O-E)2 (O-E)2 E White and short (W-S-) 206 24 X 9 216 206 - 216 10 102 100 100 / 216 0.463 Red and short (wwS-) 83 24 X 3 72 83 - 72 11 112 121 121 / 72 1.681 White and 65 24 X 3 65 - 72 -72 49 / 72 tall (W-ss) 72 -7 49 0.681 30 - 24 6 62 36 36 / 24 1.500 Red and tall (wwss) 30 24 X 1 24 Total 384 384 4.325 Did you get 4.325 for the answer? If you didn't, look over my answer and figure out where you went wrong - and try to learn from your error so you can do it right next time. [A common mistake occurs in the last column - many students divide by either the observed or by some other expected number. Remember to always divide by the expected number for that category.] OK, you have calculated the chi-square and it is now time to do something with it. Here's a portion of the Chi Square Significance Table. How many "classes" (categories, groups) are in this experiment? Degrees of Freedom 5 % Significance Levels 1 3.84 2 5.99 3 7.81 4 9.49 Four (Red and tall, White and tall, Red and short, White and short) Some students get through the difficult chi-square but then make a simple mistake at this point. Some get confused and pick a number out of the ratio and say there at nine classes! Or three. Or some other number and I cannot figure out where it came from. So, just to keep yourself thinking clearly, it is smart to list the categories. Now, how many degrees of freedom are in this experiment? Degrees of Freedom 5 % Significance Levels Three (4 -1 ) 1 3.84 Does the 9 : 3 : 3 : 1 ratio fit the data? 2 5.99 3 7.81 4 9.49 Yes! With three degrees of freedom you can have a chi-square as large as 7.81 before we would be beyond our 5% significance. Notice that if you had been so foolish as to stick with the one degree of freedom (that we were using with the monohybrid crosses) you would have decided that the chisquare was too large and would have (WRONGLY) rejected the ratio! The chi-square can be used whenever there is an expected ratio What is the expected ratio of boys to girls? 1:1 What is the degrees of freedom in that example? There are two categories (classes) so there is one degree of freedom. There are in vitro fertilization (IVF) methods that can increase the chances that a girl will be born or a boy will be born. You can use the chi-square to determine if a particular IVF clinic is really increasing the chances of having a boy or girl. You could look at the number of girls and boys born to women who wanted girls or boys and calculate the chi-square. If a particular IVF clinic can, indeed, increase the odds, would you expect the chisquare to be above or below the value of 3.84 (which I got from the table above)? If the IVF clinic can change the ratio from the expected 1 : 1 then the chi-square, calculated on the number of daughters or sons born, would be greater than 3.84. I hope you understand that here we are "hoping" that the ratio will NOT be 1 : 1. (In point of fact, scientists aren't supposed to "hope" for results but the fact remains that they often hope a lot! ) Let's consider another situation. You are the district manager of three fast food restaurants and you are looking over the revenues. You see that store A made $1,000,000, store B made $3,000,000 and store C brought in $5,000,000. You wonder if that is just a statistically blip. How would you use the chi-square to test the idea that these stores are different - beyond luck? (Don't do the chi-square - just tell me how you would set it up.) You would "expect" a 1 : 1 : 1 ratio in the revenues if they were all the same. In other words, the total revenues of $9,000,000 would be distributed evenly. You would expect ... Store A = $3,000,000 Store B = $3,000,000 Store C = $3,000,000 You could now find, for each store, the difference between expected and observed revenues, square the difference, divide that by the expected and then add all three together to get a chi-square value. Suppose the manager of store A complains that you are not being fair because you haven't taken into account the differences in local population around each store. His store serves a smaller community. So, you go to the population records and discover that store A serves a population that is only a quarter the size of the communities served by stores B and C. Can you redo the chi-square? How? The information about the populations tells you that there are four times as many likely customers for stores B and C as A. You can express that as a ratio of 1 : 4 : 4. If revenues are dependent upon population you would expect ("expect" is the magic word that means "here comes a chi-square") Store A = $1,000,000 Store B = $4,000,000 Store C = $4,000,000 The observed revenues were Store A = $1,000,000 Store B = $3,000,000 Store C = $5,000,000 Now you would do another chi-square to determine if these numbers fit a 1 : 4 : 4 ratio (thus showing that revenues are probably dependent upon population). And finally, what is the degree of freedom for this-three store problem? There are three categories (Stores, A, B and C) so there are two degrees of freedom. These last few puzzles, about sex ratios and revenue ratios, are to show you that the chi-square has many uses and that all you have to do is identify how to think about the ratios, expectations and outcomes. If you haven't done so already, pick up a copy of the answers to The Chi-Square and compare it to your own Worksheet. Make sure you understand it