Analysis of 2-Way Contingency Tables (WIP)

advertisement
Contingency Tables
• Tables representing all combinations of
levels of explanatory and response
variables
• Numbers in table represent Counts of the
number of cases in each cell
• Row and column totals are called
Marginal counts
Example – EMT Assessment of Kids
• Explanatory Variable
– Child Age (Infant,
Toddler, Pre-school,
School-age,
Adolescent)
• Response Variable –
EMT Assessment
(Accurate, Inaccurate)
Source: Foltin, et al (2002)
Assessment
Age
Acc Inac
Tot
Inf
168
73
241
Tod
230
73
303
Pre
254
53
307
Sch
379
58
437
Ado
652
124
776
Tot
1683 381 2064
Pearson’s Chi-Square Test
• Can be used for nominal or ordinal explanatory
and response variables
• Variables can have any number of distinct levels
• Tests whether the distribution of the response
variable is the same for each level of the
explanatory variable (H0: No association
between the variables)
• r = # of levels of explanatory variable
• c = # of levels of response variable
Pearson’s Chi-Square Test
• Intuition behind test statistic
– Obtain marginal distribution of outcomes for
the response variable
– Apply this common distribution to all levels of
the explanatory variable, by multiplying each
proportion by the corresponding sample size
– Measure the difference between actual cell
counts and the expected cell counts in the
previous step
Pearson’s Chi-Square Test
• Notation to obtain test statistic
– Rows represent explanatory variable (r levels)
– Cols represent response variable (c levels)
1
2
…
c
Total
1
n11
n12
…
n1c
n1.
2
n21
n22
…
n2c
n2.
…
…
…
…
…
…
r
nr1
nr2
…
nrc
nr.
Total
n.1
n.2
…
n.c
n..
Pearson’s Chi-Square Test
• Marginal distribution of response and expected cell
counts under hypothesis of no association:
n.1
p1 
n..
^

^
n.c
pc 
n..
^
E ( nij )  ni. p j 
ni.n. j
n..
Pearson’s Chi-Square Test
• H0: No association between variables
• HA: Variables are associated
 T .S . : X  
2
i
(nij  E (nij ))
j
E (nij )
 R.R. : X   ,( r 1)( c 1)
2
2
 P  value  P(   X )
2
2
2
Example – EMT Assessment of Kids
Observed
Expected
Assessment
Assessment
Age
Acc Inac
Tot
Age
Acc Inac
Tot
Inf
168
73
241
Inf
197
44
241
Tod
230
73
303
Tod
247
56
303
Pre
254
53
307
Pre
250
57
307
Sch
379
58
437
Sch
356
81
437
Ado
652
124
776
Ado
633
143
776
Tot
1683 381 2064
Tot
1683 381 2064
Example – EMT Assessment of Kids
• Note that each expected count is the row total
times the column total, divided by the overall
total. For the first cell in the table:
n1.n.1
241(1683)
E ( n11 ) 

 197
n..
2064
• The
contribution to the test statistic for this cell is
(168  197)
 4.27
197
2
Example – EMT Assessment of Kids
• H0: No association between variables
• HA: Variables are associated
(168  197)
(124  143)
 T .S . : X 

 40.1
197
143
2
2
2
 R.R. : X  .05,(51)( 21)  .05, 4  9.488
2
2
2
Reject H0, conclude that the accuracy of
assessments differs among age groups
Example - SPSS Output
C
S
E
c
c
o
u
u
t
A
I
C
n
8
3
1
E
5
5
0
T
C
0
3
3
E
1
9
0
P
C
4
3
7
E
3
7
0
S
C
9
8
7
E
3
7
0
A
C
2
4
6
E
8
2
0
T
C
3
1
4
E
0
0
0
a
p
a
d
i
l
d
u
f
a
P
3
4
0
L
5
4
0
L
6
1
0
A
4N
a
0
m
Example - Cyclones Near
Antarctica
• Period of Study: September,1973-May,1975
• Explanatory Variable: Region (40-49,5059,60-79) (Degrees South Latitude)
• Response: Season
(Aut(4),Wtr(5),Spr(4),Sum(8)) (Number of
months in parentheses)
• Units: Cyclones in the study area
• Treating the observed cyclones as a “random
sample” of all cyclones that could have
occurred
Source: Howarth(1983), “An Analysis of the Variability of Cyclones around Antarctica and Their Relation to Sea-Ice Extent”,
Annals of the Association of American Geographers, Vol.73,pp519-537
Example - Cyclones Near
Antarctica
Region\Season
40-49S
50-59S
60-79S
Total
Autumn
370
526
980
1876
Winter
452
624
1200
2276
Spring
273
513
995
1781
Summer
422
1059
1751
3232
Total
1517
2722
4926
9165
For each region (row) we can compute the percentage of storms occuring during
each season, the conditional distribution. Of the 1517 cyclones in the 40-49
band, 370 occurred in Autumn, a proportion of 370/1517=.244, or 24.4% as a
percentage.
Region\Season
40-49S
50-59S
60-79S
Autumn
24.4
19.3
19.9
Winter
29.8
22.9
24.4
Spring
18.0
18.9
20.2
Summer
27.8
38.9
35.5
Total% (n)
100.0 (1517)
100.0 (2722)
100.0 (4926)
Example - Cyclones Near
Antarctica
40.00
region
40-49S
50-59S
60-79S
30.00
regp ct
Bars show Means
20.00
10.00
Autumn
Winter
Spring
Summer
season
Graphical Conditional Distributions for Regions
Example - Cyclones Near
Antarctica
Observed Cell Counts (fo):
Region\Season
40-49S
50-59S
60-79S
Total
Autumn
370
526
980
1876
Winter
452
624
1200
2276
Spring
273
513
995
1781
Summer
422
1059
1751
3232
Total
1517
2722
4926
9165
Note that overall: (1876/9165)100%=20.5% of all cyclones occurred in Autumn. If
we apply that percentage to the 1517 that occurred in the 40-49S band, we would
expect (0.205)(1517)=310.5 to have occurred in the first cell of the table. The full
table of fe:
Region\Season
40-49S
50-59S
60-79S
Total
Autumn
310.5
557.2
1008.3
1876
Winter
376.7
676.0
1223.3
2276
Spring
294.8
529.0
957.3
1781
Summer
535.0
959.9
1737.1
3232
Total
1517
2722
4926
9165
Example - Cyclones Near
Antarctica
Computation of
Region
40-49S
40-49S
40-49S
40-49S
50-59S
50-59S
50-59S
50-59S
60-79S
60-79S
60-79S
60-79S
2
 obs
Season
Autumn
Winter
Spring
Summer
Autumn
Winter
Spring
Summer
Autumn
Winter
Spring
Summer
fo
fe
370
452
273
422
526
624
513
1059
980
1200
995
1751
310.5
376.7
294.8
535.0
557.2
676.0
529.0
959.9
1008.3
1223.3
957.3
1737.1
(fo-fe)^2
3540.25
5670.09
475.24
12769
973.44
2704
256
9820.81
800.89
542.89
1421.29
193.21
((fo-fe)^2)/fe
11.4017713
15.0520042
1.61207598
23.8672897
1.74702082
4
0.48393195
10.2310762
0.79429733
0.44379138
1.4846861
0.11122561
71.2291706
Example - Cyclones Near
Antarctica
• H0: Seasonal distribution of cyclone
occurences is independent of latitude band
• Ha: Seasonal occurences of cyclone
occurences differ among latitude bands
2
• Test Statistic:  obs
 71.2
• P-value: Area in chi-squared distribution with
(3-1)(4-1)=6 degrees of freedom above 71.2
Frrom Table 8.5, P(222.46)=.001  P<
.001
SPSS Output - Cyclone Example
O
N
A
S
p
t
i
m
o
u
n
r
i
t
m
t
m
R
4
C
0
2
3
2
7
E
5
7
8
0
0
%
%
%
%
%
%
5
C
6
4
3
9
2
E
2
0
0
9
0
%
%
%
%
%
%
6
C
0
0
5
1
6
E
3
3
3
1
0
%
%
%
%
%
%
T
C
6
6
1
2
5
E
0
0
0
0
0
%
%
%
%
%
%
a
p
a
d
i
l
d
u
f
a
P
9
6
0
P-value
L
7
6
0
L
8
1
0
A
5N
a
0
m
Data Sources
•
Foltin, G., D. Markinson,M. Tunik, et al (2002).
“Assessment of Pediatric Patients by Emergency
Medical Technicians: Basic,” Pediatric Emergency Care,
18:81-85.
• Howarth, D.A. (1983), “An Analysis of the Variability of
Cyclones around Antarctica and Their Relation to SeaIce Extent”, Annals of the Association of American
Geographers, 73:519-537
Download