How is the Chi-Square Test Statistic Calculated? What follows is the math behind calculating the chi-square test statistic for a test of independence between two categorical variables. The example used is that for testing whether Gender is independent of type of high school attended (public or private). The sample data is (the numbers reflect the counts). Public 38 46 84 Female Male Total Private 7 9 16 Total 45 55 100 The basic premise of the test of independence is to see if the distribution of the percentages is the same for each level of category. That is, is the percentage of males attending public schools “close enough” to that percentage for females attending public schools? We do this by comparing what we “observe” (i.e. the data in the table which is the sample data) to what we would expect to see in our sample if there was no relationship, i.e. the variables were independent. To test this null hypothesis “Gender and School Type are independent” versus the alternative hypothesis “Gender and School Type are dependent” or “there is a relationship between Gender and School Type” we employ a chi-square method. Using O to define “observed count” and E to define “expected count” then the chi-square test statistic (symbol X which is the Greek symbol “chi”) is calculated by: 2 (O E) 2 E The observed counts are easy, but how do we get the expected counts for each of the cells in the table? Well, since the idea of the expected counts is to provide what the distribution would be if no relationship existed we use the observed column and row totals to calculate how each individual cell count would be distributed if in fact there was not relationship. We do this by taking each row total times the column total and then divide by the overall total. This produces an expected count table of: Public (45*84)/100 = 37.80 (55*84)/100 = 46.20 84 Female Male Total Private (45*16)/100 = 7.20 (55*16)/100 = 8.80 16 Total 45 55 100 Substituting into the formula for the chi-square: 2 (O E) E 2 (38 37.80) 2 (7 7.20) 2 (46 46.20) 2 (9 8.80) 2 0.012 37.80 7.20 46.20 8.80 1 As you can see this test statistic is quite small as you might have guessed given how close the observed values were to the expected values. Reading the chi-square table is similar to reading the T-table in that there is a degree of freedom consideration and the table provides right tail probabilities. The DF is found by taking the number of rows minus 1 times the number of columns minus 1, written as: (R-1)*(C-1). For this example the degrees of freedom are (2-1)*(2-1) = 1. If you have a chi-square table (there is one in the text) you will see that the test statistic of 0.012 is less than the first chi-square value presented in the table for the row with 1 degree of freedom. This means that our p-value is greater than that right tail probability which in turn is greater than 0.05 resulting in us not rejecting Ho. We would conclude that there is not enough evidence to reject Ho: we cannot say a relationship exists between Gender and School Type. 2