Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts Example – EMT Assessment of Kids • Explanatory Variable – Child Age (Infant, Toddler, Pre-school, School-age, Adolescent) • Response Variable – EMT Assessment (Accurate, Inaccurate) Source: Foltin, et al (2002) Assessment Age Acc Inac Tot Inf 168 73 241 Tod 230 73 303 Pre 254 53 307 Sch 379 58 437 Ado 652 124 776 Tot 1683 381 2064 Pearson’s Chi-Square Test • Can be used for nominal or ordinal explanatory and response variables • Variables can have any number of distinct levels • Tests whether the distribution of the response variable is the same for each level of the explanatory variable (H0: No association between the variables) • r = # of levels of explanatory variable • c = # of levels of response variable Pearson’s Chi-Square Test • Intuition behind test statistic – Obtain marginal distribution of outcomes for the response variable – Apply this common distribution to all levels of the explanatory variable, by multiplying each proportion by the corresponding sample size – Measure the difference between actual cell counts and the expected cell counts in the previous step Pearson’s Chi-Square Test • Notation to obtain test statistic – Rows represent explanatory variable (r levels) – Cols represent response variable (c levels) 1 2 … c Total 1 n11 n12 … n1c n1. 2 n21 n22 … n2c n2. … … … … … … r nr1 nr2 … nrc nr. Total n.1 n.2 … n.c n.. Pearson’s Chi-Square Test • Marginal distribution of response and expected cell counts under hypothesis of no association: n.1 p1 n.. ^ ^ n.c pc n.. ^ E ( nij ) ni. p j ni.n. j n.. Pearson’s Chi-Square Test • H0: No association between variables • HA: Variables are associated T .S . : X 2 i (nij E (nij )) j E (nij ) R.R. : X ,( r 1)( c 1) 2 2 P value P( X ) 2 2 2 Example – EMT Assessment of Kids Observed Expected Assessment Assessment Age Acc Inac Tot Age Acc Inac Tot Inf 168 73 241 Inf 197 44 241 Tod 230 73 303 Tod 247 56 303 Pre 254 53 307 Pre 250 57 307 Sch 379 58 437 Sch 356 81 437 Ado 652 124 776 Ado 633 143 776 Tot 1683 381 2064 Tot 1683 381 2064 Example – EMT Assessment of Kids • Note that each expected count is the row total times the column total, divided by the overall total. For the first cell in the table: n1.n.1 241(1683) E ( n11 ) 197 n.. 2064 • The contribution to the test statistic for this cell is (168 197) 4.27 197 2 Example – EMT Assessment of Kids • H0: No association between variables • HA: Variables are associated (168 197) (124 143) T .S . : X 40.1 197 143 2 2 2 R.R. : X .05,(51)( 21) .05, 4 9.488 2 2 2 Reject H0, conclude that the accuracy of assessments differs among age groups Example - SPSS Output C S E c c o u u t A I C n 8 3 1 E 5 5 0 T C 0 3 3 E 1 9 0 P C 4 3 7 E 3 7 0 S C 9 8 7 E 3 7 0 A C 2 4 6 E 8 2 0 T C 3 1 4 E 0 0 0 a p a d i l d u f a P 3 4 0 L 5 4 0 L 6 1 0 A 4N a 0 m Example - Cyclones Near Antarctica • Period of Study: September,1973-May,1975 • Explanatory Variable: Region (40-49,5059,60-79) (Degrees South Latitude) • Response: Season (Aut(4),Wtr(5),Spr(4),Sum(8)) (Number of months in parentheses) • Units: Cyclones in the study area • Treating the observed cyclones as a “random sample” of all cyclones that could have occurred Source: Howarth(1983), “An Analysis of the Variability of Cyclones around Antarctica and Their Relation to Sea-Ice Extent”, Annals of the Association of American Geographers, Vol.73,pp519-537 Example - Cyclones Near Antarctica Region\Season 40-49S 50-59S 60-79S Total Autumn 370 526 980 1876 Winter 452 624 1200 2276 Spring 273 513 995 1781 Summer 422 1059 1751 3232 Total 1517 2722 4926 9165 For each region (row) we can compute the percentage of storms occuring during each season, the conditional distribution. Of the 1517 cyclones in the 40-49 band, 370 occurred in Autumn, a proportion of 370/1517=.244, or 24.4% as a percentage. Region\Season 40-49S 50-59S 60-79S Autumn 24.4 19.3 19.9 Winter 29.8 22.9 24.4 Spring 18.0 18.9 20.2 Summer 27.8 38.9 35.5 Total% (n) 100.0 (1517) 100.0 (2722) 100.0 (4926) Example - Cyclones Near Antarctica 40.00 region 40-49S 50-59S 60-79S 30.00 regp ct Bars show Means 20.00 10.00 Autumn Winter Spring Summer season Graphical Conditional Distributions for Regions Example - Cyclones Near Antarctica Observed Cell Counts (fo): Region\Season 40-49S 50-59S 60-79S Total Autumn 370 526 980 1876 Winter 452 624 1200 2276 Spring 273 513 995 1781 Summer 422 1059 1751 3232 Total 1517 2722 4926 9165 Note that overall: (1876/9165)100%=20.5% of all cyclones occurred in Autumn. If we apply that percentage to the 1517 that occurred in the 40-49S band, we would expect (0.205)(1517)=310.5 to have occurred in the first cell of the table. The full table of fe: Region\Season 40-49S 50-59S 60-79S Total Autumn 310.5 557.2 1008.3 1876 Winter 376.7 676.0 1223.3 2276 Spring 294.8 529.0 957.3 1781 Summer 535.0 959.9 1737.1 3232 Total 1517 2722 4926 9165 Example - Cyclones Near Antarctica Computation of Region 40-49S 40-49S 40-49S 40-49S 50-59S 50-59S 50-59S 50-59S 60-79S 60-79S 60-79S 60-79S 2 obs Season Autumn Winter Spring Summer Autumn Winter Spring Summer Autumn Winter Spring Summer fo fe 370 452 273 422 526 624 513 1059 980 1200 995 1751 310.5 376.7 294.8 535.0 557.2 676.0 529.0 959.9 1008.3 1223.3 957.3 1737.1 (fo-fe)^2 3540.25 5670.09 475.24 12769 973.44 2704 256 9820.81 800.89 542.89 1421.29 193.21 ((fo-fe)^2)/fe 11.4017713 15.0520042 1.61207598 23.8672897 1.74702082 4 0.48393195 10.2310762 0.79429733 0.44379138 1.4846861 0.11122561 71.2291706 Example - Cyclones Near Antarctica • H0: Seasonal distribution of cyclone occurences is independent of latitude band • Ha: Seasonal occurences of cyclone occurences differ among latitude bands 2 • Test Statistic: obs 71.2 • P-value: Area in chi-squared distribution with (3-1)(4-1)=6 degrees of freedom above 71.2 Frrom Table 8.5, P(222.46)=.001 P< .001 SPSS Output - Cyclone Example O N A S p t i m o u n r i t m t m R 4 C 0 2 3 2 7 E 5 7 8 0 0 % % % % % % 5 C 6 4 3 9 2 E 2 0 0 9 0 % % % % % % 6 C 0 0 5 1 6 E 3 3 3 1 0 % % % % % % T C 6 6 1 2 5 E 0 0 0 0 0 % % % % % % a p a d i l d u f a P 9 6 0 P-value L 7 6 0 L 8 1 0 A 5N a 0 m Data Sources • Foltin, G., D. Markinson,M. Tunik, et al (2002). “Assessment of Pediatric Patients by Emergency Medical Technicians: Basic,” Pediatric Emergency Care, 18:81-85. • Howarth, D.A. (1983), “An Analysis of the Variability of Cyclones around Antarctica and Their Relation to SeaIce Extent”, Annals of the Association of American Geographers, 73:519-537