How is the Chi-Square Test Statistic Calculate

advertisement
How is the Chi-Square Test Statistic Calculated?
What follows is the math behind calculating the chi-square test statistic for a test of
independence between two categorical variables. The example used is that for testing
whether Gender is independent of type of high school attended (public or private). The
sample data is (the numbers reflect the counts).
Public
38
46
84
Female
Male
Total
Private
7
9
16
Total
45
55
100
The basic premise of the test of independence is to see if the distribution of the
percentages is the same for each level of category. That is, is the percentage of males
attending public schools “close enough” to that percentage for females attending public
schools? We do this by comparing what we “observe” (i.e. the data in the table which is
the sample data) to what we would expect to see in our sample if there was no
relationship, i.e. the variables were independent.
To test this null hypothesis “Gender and School Type are independent” versus the
alternative hypothesis “Gender and School Type are dependent” or “there is a
relationship between Gender and School Type” we employ a chi-square method.
Using O to define “observed count” and E to define “expected count” then the chi-square
test statistic (symbol X which is the Greek symbol “chi”) is calculated by:

2
 (O  E)

2
E
The observed counts are easy, but how do we get the expected counts for each of the cells
in the table? Well, since the idea of the expected counts is to provide what the
distribution would be if no relationship existed we use the observed column and row
totals to calculate how each individual cell count would be distributed if in fact there was
not relationship. We do this by taking each row total times the column total and then
divide by the overall total. This produces an expected count table of:
Public
(45*84)/100 = 37.80
(55*84)/100 = 46.20
84
Female
Male
Total
Private
(45*16)/100 = 7.20
(55*16)/100 = 8.80
16
Total
45
55
100
Substituting into the formula for the chi-square:
2 
 (O  E)
E
2

(38  37.80) 2 (7  7.20) 2 (46  46.20) 2 (9  8.80) 2



 0.012
37.80
7.20
46.20
8.80
1
As you can see this test statistic is quite small as you might have guessed given how close
the observed values were to the expected values. Reading the chi-square table is similar
to reading the T-table in that there is a degree of freedom consideration and the table
provides right tail probabilities. The DF is found by taking the number of rows minus 1
times the number of columns minus 1, written as: (R-1)*(C-1). For this example the
degrees of freedom are (2-1)*(2-1) = 1. If you have a chi-square table (there is one in the
text) you will see that the test statistic of 0.012 is less than the first chi-square value
presented in the table for the row with 1 degree of freedom. This means that our p-value
is greater than that right tail probability which in turn is greater than 0.05 resulting in us
not rejecting Ho. We would conclude that there is not enough evidence to reject Ho: we
cannot say a relationship exists between Gender and School Type.
2
Download