The Chi-Square Test for Goodness of Fit Learning Objectives After completion of this module, the student will be able to 1. develop a statistical test for goodness of fit based on a mathematical model that is appropriate for the data 2. calculate the chi-square statistics 3. determine whether or not to reject the null hypothesis Knowledge and Skills 1. Concepts: chi-square test, goodness of fit Prerequisites 1. mean and variance 2. binomial distribution, uniform distribution, normal distribution Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1 Applications We begin with a number of applications that illustrate situations in which we need to find a statistical test that can assess goodness of fit. High Blood Pressure A study by Suh et al. (1987) examined the familial aggregation of blood pressure in 196 4-member families. The following table lists the number of families stratified according to the number of members with high blood pressure. What can you say about familial aggregation of blood pressure? (Picture Source: Flickr) No. of high B.P. in the family 0 1 2 3 4 Total No. observed families 108 66 19 2 1 196 Approach: Formulate a null hypothesis and describe the distribution under the null hypothesis. How random is random? Write down randomly 100 integers between 1 and 5 and count how many of each type you have. Simulate and check whether your numbers are “random.” Approach: Formulate a null hypothesis and describe the distribution under the null hypothesis. Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2 Plant Distribution The tree species Pinus mugo is the most important invader of abandoned subalpine grasslands in the Northern Calcareous Alps, Austria. The data below from Dullinger et al. 2003 shows the number of observed trees in each of the grasslands. A question of interest is whether the different grasslands are invaded by Pinus mugo indiscriminately or whether they show a preference. Grassland Type Carex firma grassland Cares smpervirens grassland Leontodon hispidus-Crepis aurea grassland Agriostis alpina_Festuca pumila grassland Calamagrostis varia grassland Carex ferruginea grassland Nardus stricta pasture Deschampsia cespitosa pasture Helictotrichon parlatorei grassland Tall herb community Area Observed (m2) 1018 61 6188 264 583 20 1243 29 408 7 846 9 320 2 2013 11 1433 2 420 0 Approach: Formulate a null hypothesis and describe the distribution under the null hypothesis. (Picture Source: Flickr) Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3 A Brief Description of the Chi-Square Test The chi-square test for goodness of fit is designed to test whether observed frequencies differ significantly from expected frequencies. Here are the assumptions: 1. The data needs to be grouped or binned, with each group or bin containing five or more observations. Some data is already grouped into data classes, such as the data on High Blood Pressure or Plant Distribution. The data you would generate in the Example How Random is Random? can be naturally grouped into five groups according to the values 1,2,…5. The bin sizes depend on the data. As a general rule, bins should contain at least five observations. 2. The data must come from a univariate distribution whose cumulative distribution function must be known. The null hypothesis states that the data follow a specific distribution and the alternative is that the data do not. We define a quantity that summarizes our data. This quantity is the chi-square statistic. This statistic requires that the data is grouped into “bins.” (1) k Obs j Exp j j 1 Exp j 2 2 where Obs j observed frequency for bin j Exp j expected frequency for bin j The test statistic in Equation (1) is then approximately chi-square distributed with k 1 m degrees of freedom in the number if m is the number of population parameters that need to be estimated. The value of chi-square depends on how the data is binned. The larger the sample size, the better the approximation. The null hypothesis is rejected if the statistic in Equation (1) exceeds a critical value determined by the significance level. The function in Excel is CHIDIST(x ,degrees_freedom) These values are tabulated (see, for instance, the NIST/SEMATECH e-Handbook of Statistical Methods http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm). Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4 Example Gregor Mendel (1822-1884) was an Austrian monk and scientist who conducted careful experiments on pea plants to gain a better understanding of how traits are inherited. The pea plants he worked on exhibited two flower colors, red and white. If he crossed true-breeding red-flowered plants with truebreeding white-flowered plants, all offspring were red-flowered. When he then crossed plants in this offspring generation, he observed that among 929 plants, 705 had red flowers and 224 had white flowers. Mendel hypothesized that the fraction of red flowers is 3/4. We can thus formulate the null hypothesis and the alternative. If we denote by q the fraction of red flowers, then H0 : q 0.75 H1 : q 0.75 We find Observed Expected Red Flowers 705 697 White Flowers 224 232 Hence, we obtain for the test statistic 2 (705 697)2 (224 232)2 0.0918 0.2759 0.3677 697 232 To calculate the degrees of freedom, observe that we have two groups, i.e., k 2 , and that we did not need to estimate any population parameters, i.e., m 0 . Hence, the degrees of freedom is 2 1 0 1 . Using Excel, we find CHIDIST(0.3677,1) 0.5443 This is not close to being statistically significant and we cannot reject the null hypothesis. Note that this does not mean that we accept the null hypothesis. Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 5 Why does it work? The following is more advanced material and can be skipped if there is no need to provide a theoretical foundation for the chi-square test. We will illustrate how the chi-square test works in the case of two categories (success and failure) when we test for a hypothesis of the form H: p p0 where p denotes the probability of success. Assume that we count the number of successes in n independent trials where each trial has probability p of success. We denote the number of successes by Sn . This quantity is binomially distributed. We know from the properties of the binomial distribution that the expected value of Sn is np and its variance is np(1 p) . Using the central limit theorem, we find that Z Sn np np(1 p) is approximately normally distributed with mean 0 and variance 1. We will need the following result: If Z is standard normally distributed (i.e., Z has mean 0 and variance 1), then Z 2 has a chi-square distribution with one degree of freedom, denoted by 12 . We thus conclude that Z2 (2) Sn np 2 np(1 p) is approximately chi-square distributed with one degree of freedom. To transform Equation (2) into a formula that is useful for determining the goodness of fit, we will relabel our variables. Denote the number of successes in n trials by X 1 and the number of failures by X 2 . Denote the success probability by p1 and the failure probability by p2 . Then X1 X2 n and p1 p2 1 Using the fact that X1 np1 n X2 n(1 p2 ) X2 np2 X2 np2 , it follows that 2 2 2 2 X1 np1 2 X1 np1 2 X2 np2 2 np1 (1 p1 ) np1 np2 This allows us to rewrite the statistic in Equation (1) as Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 6 (3) Z 2 X1 np1 2 X2 np2 2 np1 np2 which is approximately chi-square distributed with one degree of freedom. The statistic in Equation (3) is of the form Obs1 Exp1 2 Obs2 Exp2 2 Exp1 Exp2 where Obs j is the number of observed frequencies in the jth category and Expj is the number of expected frequencies in the jth category ( j 1,2 ). To generalize this to k categories (4) k Obs j Expj j 1 Exp j 2 we need that a sum of independent chi-square distributed random variables is again chi-square distributed. To determine the degree of freedom for the statistic in Equation (4), note that because of the constraint that the sum of the observations adds up to the sample size n, the number of degrees of freedom is reduced by 1. If , in addition, we need to estimate m population parameters to find the expected frequencies, the degrees of freedom is k 1 m . References Dullinger, T. Dirnbock, and G. Grabherr. 2003. Patterns of Shrub Invasion into High Mountain Grasslands of the Northern Calcareous Alps, Austria. Arctic, Antarctic, and Alpine Research 35: 434-441. Il Suh, Il Soon Kim and Young Moon Chae. 1987. Familial Aggregation of Blood Pressure. Yonsei Medical Journal 28: 199-208. Oberhauser, K., I. Gebhard, C. Cameron, and S. Oberhauser. 2007. Parasitism of Monarch Butterflies (Danus plexippus) by Lespesia archippivora (Diptera: Tachinidae). Am. Midl. Nat. 157: 312-328. NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/ Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 7 Resources A handbook of statistics is available at NIST: http://www.itl.nist.gov/div898/handbook/index.htm The chi-squared test for goodness of fit is explained in http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm Homework 1. In another experiment, Mendel hypothesized that a certain breeding experiment should result ina 3:1 ratio of green versus yellow pods. He conducted the experiment and found that among 580 plants there were 438 plants with green pods and 152 with yellow pods. Test Mendel’s hypothesis. 2. Carry out the hypothesis testing for the three applications in the beginning of this module. Citation: Neuhauser, C. The Chi-square Test for Goodness of Fit. Created: January 3, 2008 Revisions: December 5, 2009 Copyright: © 2009 Neuhauser. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 8