Looking through just your word document I think a chi square test is

advertisement
Methodology Glossary Tier 1
Chi-Square Tests of Association
Chi-Square tests of association can be used to test whether there is a
relationship between two categorical frequency variables.
For example:

Is there an association between the gender of applicants for a scheme
and whether or not the applicant is accepted?

Is there an association between the colour of a leaflet and whether or
not the leaflet gets picked up?

Is there an association between hair colour and what support for political
parties?
We always start with a null hypothesis that there is no association between
the variables.
For example, “There is no association between the gender of applicants and
whether or not their application is accepted”.
The following data have been made up to test this hypothesis. Data for chisquare tests can be from observational studies, surveys, administrative
systems, models or experiments. The data need to be frequency counts (not
percentages) and categorical. In theory there is no limit on the number of
categories you can have but the more there are, the more difficult it will be to
interpret the results.
The first step is to put the observed data (what was
measured/recorded) into a contingency table as shown in Table 1.
Table 1: Observed frequencies
Application
successful
Male
23
Female
31
TOTAL
54
Application
successful
40
39
79
actually
not TOTAL
63
70
133
The second step is to calculate the expected frequencies. These are the
frequencies that we would have expected to have recorded given the
row/column totals. There are several different ways you can do this. The
most basic involves calculating what you would observe if there was no
association between the two variables. This is done by multiplying the row
total by the column total and dividing the result by the table total, for each cell.
This is shown in Table 2:
Methodology Glossary Tier 1
Table 2: Calculation of expected frequencies
Application successful
Application not successful
Male
(63 * 54) / 133 = 25.58
(63 * 79) / 133 = 37.42
TOTAL
63
Female
(70 * 54) / 133 = 28.42
(70 * 79) / 133 = 41.58
70
TOTAL
54
79
133
Be sure to check that your observed and expected values both sum up to the
same total.
The third step is to calculate the chi-square statistic.
The formula for chi-square is: χ2 = ∑ (E-O) 2 / E
Where E is the expected values and O is the observed values. The sigma
sign means that everything that follows is summed. So ‘(expected –
observed)2 / expected’ is calculated for each cell in the contingency table as
shown below.
The observed
value for this
cell
The expected
value for this
cell
Application successful
Application not successful
(25.58 – 23)2 / 25.28
(37.42 - 40)2 / 37.42
= 0.26
= 0.18
(28.42 - 31)2 / 28.42
(41.58 - 39)2 / 41.58
= 0.23
=0.16
Male
Female
….. and then the results from each cell are summed:
0.26 + 0.18 + 0.23 + 0.16 = 0.83
And that is the X2 value.
The next thing to do is calculate the degrees of freedom. This is:
(number of rows – 1) x (number of columns – 1)
In the example above there are two rows and two columns, so the degrees of
freedom is 1.
The final step is to see whether the chi-square value, given the degrees of
freedom, is statistically significant. This can be done by comparing the X2
Methodology Glossary Tier 1
value against a table of critical values that have been derived from the chisquare distribution. Alternatively most software packages will tell you the
exact p-value so you can see instantly whether it is below the standard
threshold of 0.05 or not.
In this example, X2 = 0.83 which is greater than 0.0199, the critical value for
5% significance with 1 degree of freedom. Hence the result is not significant
and we cannot reject the null hypothesis that there is no association between
the gender of applicants and whether or not their application is accepted or
not. What this means is that the likelihood that the difference between the
observed and expected values is due to chance rather than any genuine
affect is greater than 5%.
Presenting and explaining results
When presenting results in reports it is usual to give one table and put the
expected frequencies in brackets, like this:
Male
Female
TOTAL
Application
successful
23 (25.58)
31 (28.42)
54
Application
successful
40 (37.42)
39 (41.58)
79
not TOTAL
63
70
133
The results are then written as X2 = 0.08, df = 1, p<0.05 (or p=NS if the result
was not significant).
To perform a chi-square test of association in Excel, see the embedded file:
Chi Square
Calculation
Download