Chi-Square Tests of Association in two

advertisement
Session 9
Tests of Association
in two-way tables
1
Learning Objectives
By the end of this session, you will be able to

conduct and interpret results from a chi-square test
for comparing several proportions or (equivalently)
testing the association between two categorical
variables

explain how results above can be extended to the
study of associations in general r x c tables

state assumptions underlying the above test and
actions to take if assumptions fail
2
An example
Below is a 5x2 table of observed frequencies
showing animals who did or did not get diseased
after inoculation with one of five vaccines.
Vaccine
diseased
healthy
Total
A
43
237
280
B
C
D
E
Total
52
25
48
57
225
198
245
212
233
1125
250
270
260
290
1350
Question:
Is there an
association
between
occurrence of
disease and
type of vaccine?
3
Null and alternative hypotheses
To answer the question, need to test
H0: disease occurrence is independent of type of
vaccine (i.e. proportions diseases are the same for all
vaccines)
H1: the two variables are associated
If H0 is true, best estimate of proportion is diseased total
divided by grand total
= 225/1350 = 0.167
We use this to compute expected values in each cell of
the table under the null hypothesis.
4
Computation of expected values
Expected values in the first row:
Expected value in cell 1 = (225 / 1350)*280
= (225*280) / 1350
= 46.67
Expected value in cell 2 = (1125 / 1350)*280
= (1125*280) / 1350
= 233.33
Can you calculate expected values in the next
row? Check that your 2 numbers add to 250.
5
Table of expected values
Vaccine
diseased
healthy
Total
A
46.67
233.33
280
B
41.67
208.33
250
C
45.00
225.00
270
D
43.33
216.67
260
E
48.33
241.67
290
Total
225
1125
1350
Note:
6
Chi-square test
Now compute the chi-square test-statistic given by
2
(O-E)
X2  
16.56
E
allcells
If H0 is true, X2 follows a 2 distribution with 4 d.f.
Note: d.f.=(r-1)(c-1) where r=number of rows and
c=number of columns in the table.
Comparing 16.56 to χ we get a p-value of 0.0024, a
highly significant result. We may conclude there is
strong evidence of an association between disease
occurrence and type of vaccine.
2
4
7
Extensions to r × c tables
Survey results are often expressed in terms of
2-way tables. In general, such tables may contain
r rows and c columns. Questions of interest in such
tables centre on whether these is an association
between the two variables that have been tabulated.
For example if the table tabulates education level of HH
head (none, primary, secondary, tertiary) by poverty
levels (not poor, poor, very poor), the question “is
poverty related to education” may be asked.
8
Chi-square test for an r × c table
To answer the above question, the null hypothesis is that
the two variables are NOT related, against the alternative
that they are.
Under the null hypothesis, comparison of expected
values with observed values leads to a chi-square test.
The d.f. associated with this test = (r-1)(c-1).
In the above example, the d.f.=(4-1)(3-1)=6
9
Assumption underlying the test

The chi-square test is approximate
 Validity relies on “large” samples
 Small samples of unbalanced data (large and
small counts together) may invalidate the
approximation

Rules of thumb for validity involve the expected
values, E
 Need large expected values under H0
 Say, most E5 and none less than 1
 If rule of thumb is not satisfied, may have an
unreliable p-value
10
Actions when assumptions fail
(a) Simple approaches:

Collect more data if this is possible

Collapse rows or columns if the table has more than
two rows/columns. But need to recognise that
 this leads to loss of information
 with some types of variables, there may be no
natural way of combining rows/columns
11
Actions when assumptions fail
(b) Use a continuity correction
This method is often called Yate’s correction and is
applicable just to 2x2 tables.
First we show the standard chi-square value
corresponding to a table with cell counts a, b, c, d as
below. (Verify later that this is correct)
row1
row2
col1 col2
a
b r1
c
d r2
n1
n2 N
X
2
ad  bc 

=
2
N
r1r2n1n2
12
Actions when assumptions fail
(b) Continuity correction (continued)…
The approximation of X2 to the chi-square is improved
by reducing the absolute value of O-E by ½ before
calculating X2. This results in the X2 taking the value
below.
2
X
2
| ad  bc | ½N 

=
N
r1r2n1n2
Note: The equivalent when comparing two proportions
using a z-test is to reduce by ½, the r value for the first
p=r/n and increase by ½ the r value for the second
proportion.
13
Example of use of continuity corrn
Whether
smoker?
Job
Driver Conductor
Total
No
40
67.8%
52
78.8%
92
73.6%
Yes
19
32.2%
14
21.2%
33
26.4%
Total
59
100.0%
66
100.0%
125
(100%)
Above is the example on Bus data used during the
practical sessions. Question of interest is whether the
proportion of smokers are different across job types.
14
Example of use of continuity corrn
The usual chi-square test leads to X2=1.937
Applying the continuity correction, we get
X2 = 1.412
Here, there is little difference because the sample
sizes are reasonably large.
More important to apply the continuity correction for
small sample sizes.
15
Actions when assumptions fail
(c) Use an Exact Test
• When actions suggested in (a) or (b) are not
possible, consider using an Exact Test.
• Details of such tests are beyond the scope of this
module.
• Some software packages (e.g. Stata) have the
facility to perform Fisher’s exact test. SPSS does
this only for 2x2 tables. Special software also exist
for such tests, e.g. StatXact.
16
Limitations

Chi-square tests are limited, in that only two factors
are examined at a time.

This may cause erroneous inferences to be made
(see Practical 15 for an example).

The inter-relations between more than two factors
can be investigating using more sophisticated
statistical techniques, e.g. log-linear modelling.
17
References

Altman, D.G., Machin, D., Bryant, T.N., and Gardner,
M.J. (2000) Statistics with confidence. (2nd Edition).
BMJ Books, Bristol, UK. pp 240.

Armitage, P., Matthews J.N.S. and Berry G. (2002).
Statistical Methods in Medical Research. 4th
edn. Blackwell.
18
Download