UNIT 6: MENDELIAN GENETICS CHI SQUARE ANALYSIS Ms. Gaynor Honors Genetics CHI SQUARE ANALYSIS The chi square analysis allows you to use statistics to determine if your data is “good” or not Is your data “good” enough to accept your hypothesis? allows us to test for deviations of observed frequencies from expected frequencies The following formula is used You need 2 different hypotheses: 1. NULL Hypothesis •Data are occurring by chance and it is all RANDOM! There is NO preference between the groups of data. 2. Alternative Hypothesis •Data are occurring by someoutlside force. It is NOT by chance and it is NOT RANDOM! There is preference between the groups of data. This statistical test is compared to a theoretical probability distribution These probability (p) values are on the Chi Square distribution table HOW DO YOU USE THIS TABLE PROPERLY? you need to determine the degrees of freedom Degrees of freedom is the # of groups (categories) in your data minus one (1) If the level of significance read from the table is less than .05 or 5% then your hypothesis is accepted and the data is useful…the data is NOT due to randomness! Two Types of Hypotheses: 1. NULL HYPOTHESIS states that there is no substantial statistical deviation between observed and expected data. a hypothesis of no difference (or no effect) is called a null hypothesis symbolized H0 In other words, the results are totally random and occurred by chance alone. There is NO preference. The null hypothesis states that the two variables are independent, or that there is NO relationship to one another. Null Hypothesis Example A scientist studying bees and butterflies. Her hypothesis was that a single bee visiting a flower will pollinate with a higher efficiency than a single butterfly, which will help produce a greater number of seeds in the flower bean pod. We will call this hypothesis H1 or an alternate hypothesis because it is an alternative to the null hypothesis. What is the null hypothesis? H0: There is no difference between bees and butterflies in the number of seeds produced by the flowers they pollinate. Two Types of Hypotheses: 2. ALTERNATIVE HYPOTHESIS states that there IS a substantial statistical deviation between observed and expected data. a hypothesis of difference (or effect) is called a alternative hypothesis symbolized H1 In other words, the results are affected by an outside force and are NOT random and did NOT occur by chance alone. There is a preference. 2 Types of Chi Square Problems 1. 2. Non-genetic Null Hypothesis: Data is due to chance and is completely random. There is no preference between the groups/categories. Alternative Hypothesis Data is NOT due to chance and there IS a preference between the groups/categories. Data is not random. Genetic Null Hypothesis: Data is due to chance and is random due to independent assortment being random. Punnett square ratios are expected. If there are 2+ genes involved in the experiment…There is no gene linkage affecting independent assortment & segregation. Punnett square ratios are expected. Alternative Hypothesis Data is due NOT to chance and is NOT random. Punnett square ratios are NOT expected. If there are 2+ genes …There IS gene linkage affecting independent assortment & segregation Let’s look at a fruit fly cross and their phenotypes x Black body, eyeless (bbee) F1: all wild type (BbEe) Wild type (BBEE) F1 x F1 5610 Wild type 1896 Black body, eyeless 1881 Eyeless, Wild type 622 Black body, Wild type Analysis of the results Once the numbers are in, you have to determine the expected value of this cross. This is your hypothesis called the null hypothesis (no gene linkage is occuring). What are the expected outcomes of this cross? F1 Cross: BbEe x BbEe 9/16 should be wild type (normal body, wildtype eyes) 3/16 should be normal body eyeless 3/16 should be black body wild eyes 1/16 should be black body eyeless. The following formula is used If your null hypothesis is supported by data •you are claiming that mating is random as well as segregation and independent assortment. If your null hypothesis is not supported by data •you are seeing that the deviation (difference) between observed and expected is very far apart something non-random must be occurring…GENE LINKAGE!!! Now Conduct the Analysis: To compute the hypothesis value take 10009/16 = 626 (a.k.a- 1/16 of total offspring) Now Conduct the Analysis: Remember: To compute the hypothesis value take 10009/16 = 626 1. Using the chi square formula compute the chi square value (χ2) for this cross: Calculate (o-e)2/ e for EACH phenotype 2. Sum all numbers to get your chi square value (5610 - 5630)2/ 5630 = .07 (1881 - 1877)2/ 1877 = .01 (1896 - 1877 )2/ 1877 = .20 (622 - 626) 2/ 626 = .02 2 = .30 Determine how many degrees of freedom are in your experiment 4 (phenotype) groups– 1 = 3 I Have my Chi Square Value (X2)….What next? Figure out which hypothesis is accepted: your NULL hypothesis= 9:3:3:1 ratio is seen due to non-linkage genetics (independent assortment/ segregation is occuring) The alternative hypothesis = any change from the expected is due to SOME OUTSIDE FORCE! IT IS NOT RANDOM! THE GENES ARE LINKED! To figure which hypothesis is accepted, you need to use the CHI SQUARE TABLE, which list CRITICAL VALUES! This value is useful b/c we can obtain the probability that the data occurs (and the probability that the data are an error) CHI SQUARE TABLE CHI-SQUARE DISTRIBUTION TABLE Reject Null Hypothesis Accept Null Hypothesis (chance ONLY) (NOT chance ONLY) Probability (p) Degrees of Freedom 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001 1 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83 2 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82 3 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27 4 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.38 18.47 5 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52 6 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46 7 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32 8 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12 9 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88 10 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59 In biological applications, a probability 5% is usually adopted as the standard conventional criteria for probability to have statistical significance is 0.001-0.05 What level of probability does one choose to decide whether two groups differ as a result of NON-CHANCE events or simply because of CHANCE? •most scientists have decided: •If difference between 2+ groups is so great that it would happen by chance fewer than 1 out of 20 times ("P" < 0.05), then the groups differ significantly. •That is, the null hypothesis (due to chance/no difference in data) is rejected. •If greater confidence in the results is desired, scientists will choose probability levels of less than 1 in 100 (P < .01) or 1 in 1000 (P < 0.001). Looking statistical values up on the chi square distribution table tells us the following: the PROBABILITY (P) value read off the table places our chi square value of 0.30 closer to .95 or 95% (~94%) This value means that there is a 6% chance that our results are biased and due to gene linkage. In other words, the probability of getting our results is 94%. 94% of the time when our observed data is this close to our expected data, this deviation is due to random chance. We therefore accept our null hypothesis. When reporting chi square data use the following formula sentence…. With ? degrees of freedom, my chi square value is ? , which gives me a p value between ?__% and ?__%, I therefore (accept or reject) my null hypothesis. PRACTICE PROBLEMS What is the critical value at which we would reject the null hypothesis for the fruit fly example earlier? For 3 degrees of freedom the value for our chi square must be > 7.815 to accept the alternative hypothesis and support that gene linkage is occurring. What if our chi square value was 8.0 with 4 degrees of freedom, do we accept or reject the null hypothesis? Accept, since the critical value is >9.48 with 4 degrees of freedom.