Methodology Glossary Tier 1 Chi-Square Tests of Association Chi-Square tests of association can be used to test whether there is a relationship between two categorical frequency variables. For example: Is there an association between the gender of applicants for a scheme and whether or not the applicant is accepted? Is there an association between the colour of a leaflet and whether or not the leaflet gets picked up? Is there an association between hair colour and what support for political parties? We always start with a null hypothesis that there is no association between the variables. For example, “There is no association between the gender of applicants and whether or not their application is accepted”. The following data have been made up to test this hypothesis. Data for chisquare tests can be from observational studies, surveys, administrative systems, models or experiments. The data need to be frequency counts (not percentages) and categorical. In theory there is no limit on the number of categories you can have but the more there are, the more difficult it will be to interpret the results. The first step is to put the observed data (what was measured/recorded) into a contingency table as shown in Table 1. Table 1: Observed frequencies Application successful Male 23 Female 31 TOTAL 54 Application successful 40 39 79 actually not TOTAL 63 70 133 The second step is to calculate the expected frequencies. These are the frequencies that we would have expected to have recorded given the row/column totals. There are several different ways you can do this. The most basic involves calculating what you would observe if there was no association between the two variables. This is done by multiplying the row total by the column total and dividing the result by the table total, for each cell. This is shown in Table 2: Methodology Glossary Tier 1 Table 2: Calculation of expected frequencies Application successful Application not successful Male (63 * 54) / 133 = 25.58 (63 * 79) / 133 = 37.42 TOTAL 63 Female (70 * 54) / 133 = 28.42 (70 * 79) / 133 = 41.58 70 TOTAL 54 79 133 Be sure to check that your observed and expected values both sum up to the same total. The third step is to calculate the chi-square statistic. The formula for chi-square is: χ2 = ∑ (E-O) 2 / E Where E is the expected values and O is the observed values. The sigma sign means that everything that follows is summed. So ‘(expected – observed)2 / expected’ is calculated for each cell in the contingency table as shown below. The observed value for this cell The expected value for this cell Application successful Application not successful (25.58 – 23)2 / 25.28 (37.42 - 40)2 / 37.42 = 0.26 = 0.18 (28.42 - 31)2 / 28.42 (41.58 - 39)2 / 41.58 = 0.23 =0.16 Male Female ….. and then the results from each cell are summed: 0.26 + 0.18 + 0.23 + 0.16 = 0.83 And that is the X2 value. The next thing to do is calculate the degrees of freedom. This is: (number of rows – 1) x (number of columns – 1) In the example above there are two rows and two columns, so the degrees of freedom is 1. The final step is to see whether the chi-square value, given the degrees of freedom, is statistically significant. This can be done by comparing the X2 Methodology Glossary Tier 1 value against a table of critical values that have been derived from the chisquare distribution. Alternatively most software packages will tell you the exact p-value so you can see instantly whether it is below the standard threshold of 0.05 or not. In this example, X2 = 0.83 which is greater than 0.0199, the critical value for 5% significance with 1 degree of freedom. Hence the result is not significant and we cannot reject the null hypothesis that there is no association between the gender of applicants and whether or not their application is accepted or not. What this means is that the likelihood that the difference between the observed and expected values is due to chance rather than any genuine affect is greater than 5%. Presenting and explaining results When presenting results in reports it is usual to give one table and put the expected frequencies in brackets, like this: Male Female TOTAL Application successful 23 (25.58) 31 (28.42) 54 Application successful 40 (37.42) 39 (41.58) 79 not TOTAL 63 70 133 The results are then written as X2 = 0.08, df = 1, p<0.05 (or p=NS if the result was not significant). To perform a chi-square test of association in Excel, see the embedded file: Chi Square Calculation