Chi-square analysis

CHI-SQUARE EXERCISE There are many methods of statistically testing scientific data--each test applicable under different conditions. The Chi-square (X2) test, a measure of the discrepancy between the observed results and some hypothetically expected results, is well suited to the field of genetics. You will use this test in the analysis of your data. Before you actually apply the test, however, you must understand certain basics of statistical testing. We all know by simple observation that nature is variable. When Mendel counted his unit characters, he did not find exact 3:1 or 9:3:3:1 ratios. He obtained very close approximations to the ratios, but values which varied slightly from the theoretical ones. The variability he saw can be accounted for by mistakes in counting or the natural variability in living systems. The question we want to answer is, "When is the deviation from the expected value due to chance alone, and when is it due to an incorrect hypothesis or inappropriate testing of the hypothesis?" How much deviation from expected values will you accept before you start to question the accuracy of your underlying assumptions or the reliability of your sampling (testing) technique? This question can be answered only in an arbitrary manner. Suppose that every day at lunch you and a friend toss a coin to determine who will buy the coffee. Assuming your friend and the coin are honest, you expect to buy the coffee about 50% of the time. If you find that you are buying coffee 55% of the time, you may not be bothered too much, but simply consider it bad luck. If, however, you find you are buying coffee 60, 70 or 80% of the time you might begin to wonder about the honesty of your friend, the coin, or both. But when would you begin to wonder? By convention the cutoff point has been taken at the 0.05 (5%) level of probability. That is, if the probability of obtaining your result is one in twenty or less, the deviation you observed from the expected value is generally considered to be too great to have occurred by chance alone. What should also be apparent at this point is that the rejection or acceptance of an observed ratio is highly dependent on the number of events you have counted. Reconsider tossing the coin for coffee. If you have performed this exercise ten times and have only won three times, you may not be too concerned (although you certainly couldn't be sure the coin was fair). However, if you have played 100 times and won only 30 times (still a 7:3 ratio), you might have cause for alarm. In other words, sample size (or number of repetitions) can and should affect your decision. How then do you carry out the Chi Square test? The mathematical process involved is given in the formula: (O-E)2 !2 = " E where O = the observed value and E = the expected value For each class of observations, calculate the deviation of the observed value from the expected value. The deviation is then squared and divided by the expected value. Then all of the values are summed. It should be immediately obvious that the smaller the differences between the observed and the expected values, the smaller the value of X2 will be and, conversely, the greater the difference the greater the value of X2. In order to understand and use the Chi-square test properly, you should know the meaning of null hypothesis (Ho), degrees of freedom (df) and probability (P). A null hypothesis is simply a general description of what we expect to happen according to a standard hypothesis; i.e., you would expect to get a 1:1 ratio of heads to tails if you tossed a coin l000 times. Strictly stated, the “null hypothesis” is the hypothesis that the difference between the observed results and the predicted results could arise by chance, rather than by some specific process. There is always one degree of freedom fewer than the total number of classes into which the data fall. You can have heads or tails, so there are two classes or df = 2 - 1 (df = l). Why is this so? If you have a shoe in each hand, you can drop the shoe in the left hand or the right hand--you have a choice. But, after you drop one shoe, you have no choice as to which shoe to drop next. In this case, as with the coins, you have one (1) degree of freedom. Probability (P) is the percent of the time that the observed data could have occurred by chance in a situation where the null hypothesis is really true. Perhaps all of the above can be made more obvious by using examples. EXAMPLE 1 Let us examine Mendel's F2 data for the Round/wrinkled and Yellow/green dihybrid cross. He counted a total of 556 peas with this observed ratio: 315:108:101:32. These values can be seen in Row 1 of Table 1. We will use this table to test Mendel's data by the X2 method. Now we must find what the expected values are. Based on Mendel's second law we expect a 9:3:3:1 ratio or 9/16 of the total (313 out of 556) to fall into the Round Yellow class; 3/16 or 104/556 should be Round green; 3/16 or 104/556 should be wrinkled Yellow; and finally 1/16 or 35/556 should be wrinkled green. These calculated values are put in Row 2 of Table 1. We now have the values Mendel observed and the expected values based on our null hypothesis. In Row 3 we subtract the expected values from the observed values to determine the deviation between them. You can ignore the sign of these values because in Row 4 they will be squared. In Row 5 each one of the squared values is divided by its respective expected value from Row 2. Finally all of the Row 5 values are summed in the lower right hand corner of the table. This value is X2. In this particular instance it is equal to 0.510. Now you have to decide whether this X2 value could have occurred easily by chance or not. Fortunately, the chances of getting various values of X2 by chance have been calculated by mathematicians--see Table 3. However, before you can check your value of X2, you must determine the degrees of freedom (df) involved. In Mendel's F2 data there are four classes. Therefore, there must be three (3) degrees of freedom--df = 4 - 1 = 3. Table 1 Phenotype Round Round wrinkled wrinkled of Seeds Yellow green Yellow green Total ========================================================= Observed 315 108 101 32 556 numbers ___ Expected 313 104 104 35 556 numbers ___ Deviation 2 4 3 3 (O-E) ___ Deviation 4 16 9 9 Squared (O-E)2 ___ 2 2 (O-E) 0.013 0.154 0.086 0.257 X =0.510 E ___ The probability (P) that the difference between the observed and expected values can be accounted for by chance alone can now be determined, since df and X2 are known. Now look at Table 3 and find the appropriate df (3) and determine P for this X2 value (0.510). Since 0.58 is closest to 0.510, the probability of obtaining the results you did by chance if a 9:3:3:1 process is actually operating is close to 90% (or 9 in 10). Therefore, there is about a 90% probability that the observed deviation is due to chance alone. An accepted practice is to reject a null hypothesis if the probability of the results occurring by chance is 0.05 (5%) or less. Otherwise, all we can say is that we failed to reject the null hypothesis. A more sensitive experiment might allow you to reject the null hypothesis, but for now you aren't able to do so. The important point is that statistics can be used to reject a null hypothesis but never to prove one. However in this case the data give us no reason to reject (or even question) the null hypothesis that Mendel's data should fall in a 9:3:3:1 ratio. EXAMPLE 2 Al Blop received 800 F2 tulip seeds from N. T. Careful which he planted in a nice warm (37°C), moist place. Of the 800, only 652 germinated, grew and eventually produced flowers. Careful had told Blop that the parental types were Red flowers/Smooth petal edges and yellow flowers/ruffled petal edges--Red dominant over yellow and Smooth over ruffled--so he could expect a 9:3:3:1 ratio in the F2's he was planting. Blop, however, got the following results: 440 Red/Smooth; 50 Red/ruffled; 147 yellow/Smooth; 15 yellow/ruffled He checked the null hypothesis (that he should have a 9:3:3:1 ratio) by using the X2 test. Blop found X2 to be 78.6 (see Table 2). Table 2 Phenotype Red Red yellow yellow of Seeds Smooth ruffled Smooth ruffled Total ========================================================= Observed 440 50 147 15 652 numbers ___ Expected 367 122 122 41 652 numbers ___ Deviation 73 72 25 26 (O-E) ___ Deviation 5329 5184 625 676 Squared (O-E)2 ___ 2 2 (O-E) 14.5 42.5 5.1 16.5 X =78.6 E ___ Since there are 4 classes of data, the df is 3 (df = 4 - 1). Looking at Table 3, we can see that P = <0.001. This means that the probability of his results occurring by chance is much less than 0.001 and ,therefore, the null hypothesis should be rejected. But--Blop wondered whether the 9:3:3:1 hypothesis was incorrect or whether some other factor was at work. While carefully studying his results he realized if he there had been 800 total plants (rather than the 652 he had), he would have expected the smooth-leaved plants to be about the numbers that he actually obtained—but there would be too few with ruffled leaves. (Try this.) Could the ruffled plants be unhealthy? Blop wrote Careful about his results and concerns. Careful telephoned to say that the tulip seeds should have been grown at 25° C since the ones with ruffled edges were temperature sensitive, and don’t germinate well at 37°C. Thus the apparent need to reject the null hypothesis was explained. There was something other than random chance that made the observed results differ from those expected from a 9:3:3:1 ratio. The use of X2 gave Blop the incentive to ask questions about his data, and to seek answers to those questions. In summary, to determine and use the Probability Value: 1. Determine df. 2. Determine X2. 3. In Table 3, find the appropriate df and follow the numbers horizontally, matching your calculated X2 as closely as possible to the value given on that line. 5. Read the probability at the top of that column. 6. Reject the null hypothesis if the probability is equal to or less than 0.05 (5%) or fail to reject the null hypothesis if the probability is greater than 0.05. Table 3. CHI-SQUARE VALUES Probabilities (P) ___________________________________________________________________ df .90 .70 .50 .30 .20 .10 .05 .01 .001 ___________________________________________________________________ 1 .016 .15 .46 1.07 1.64 2.71 3.84 6.64 10.83 2 .21 .71 1.39 2.41 3.22 4.61 5.99 9.21 13.82 3 .58 1.42 2.37 3.67 4.64 6.25 7.82 11.35 16.27 4 1.06 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47 5 1.61 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 2.20 3.83 5.35 7.23 8.56 10.65 12.59 16.81 22.46 7 2.83 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 3.49 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.13 9 4.17 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 4.87 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 This test should never be used on data expressed as percentages or ratios, nor should it be used when the expected values in one or more classes fall below five. This following exercise is designed to familiarize you with Chi-square analysis and to show you how the total sample size can affect your calculated X2 value. The ear of corn shown above has seeds of two different colors; therefore, you have a case of phenotypic segregation. If you make the assumption that the "parents" of this ear of corn were heterozygous for a single trait, seed color, you can test this hypothesis using X2. Of course, to test this hypothesis you must count the kernels on the ear. Before you begin, look at the ear to determine whether there are more dark or light kernels. Which? _____________ The trait in the majority should be the dominant one. Once you have determined the dominant trait, go through the graduated series of countings and tests described on the next page. This exercise should demonstrate the importance of a large sample size in all genetic testing. Count the number of kernels in a single row on your ear and classify them as to color. Use the dominance relationship determined previously and assume a 3:1 ratio from your hypothesis as to the type of cross which gave rise to this ear. Test your data using X2 and the table below. Table for Calculating Chi-Square (X2)---One Row __________________________________________________________________ Phenotype Total ========================================================= Observed numbers _ _ Expected numbers ________ ____ Deviations _________ _____ Deviations squared _________________ Deviations squared, divided by expected numbers ___ _ X2 = _______ _ Probability = If you get a Probability = 5% or <5%, you may reject the null hypothesis. If, however, you get a Probability = >5%, you cannot reject the null hypothesis; i.e., by convention your data are considered not to be inconsistent with the expected 3:1 ratio. Because you got that result, and because lots of other scientists have obtained it in the past, we tend to believe the 3:1 theory. Do your data fit a 3:1 ratio?____________________________________ Now count three more rows and repeat the Chi-square test on the total for the four rows you have counted. Then do the same for all the "kernels" found in all sixteen (16) rows. Table for Calculating Chi-Square (X2)---Four Rows __________________________________________________________________ Phenotype Total ========================================================= Observed numbers _ _ Expected numbers ________ ____ Deviations _________ _____ Deviations squared _________________ Deviations squared, divided by expected numbers ___ _ X2 = ________ Probability = Table for Calculating Chi-Square (X2)---Sixteen Rows __________________________________________________________________ Phenotype Total ========================================================= Observed numbers _ _ Expected numbers ________ Deviations _________ ____ _____ Deviations squared _________________ Deviations squared, divided by expected numbers ___ _ X2 = ________ Probability How do X2 and its associated probability vary as your sample size gets larger? Remember that a bigger sample size should get you closer to the "truth," whether "truth" is the null hypothesis or "truth" is some deviation from the null hypothesis. Tables for Calculating Chi-Square (X2) for my Cross, which is ___________ My Data: Total Phenotypes: Observed Numbers Expected Numbers Deviations Deviations squared Deviations squared, divided by expected numbers X2 = Probability = Section’s Data: Total Phenotypes: Observed Numbers Expected Numbers Deviations Deviations squared Deviations squared, divided by expected numbers X2 = Probability = Class Data: Total Phenotypes: Observed Numbers Expected Numbers Deviations Deviations squared Deviations squared, divided by expected numbers X2 = Probability = How does the accumulation of larger numbers of offspring affect the evaluation of your null hypothesis? Do you reject or fail to reject your null hypothesis? Explain.

Chi-square analysis

Related documents

Products

Support

Chi-square analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib