Categorical Data Analysis: Stratified Analyses, Matching, and Agreement Statistics Biostatistics 510 13-15 March 2007 Carla Talarico Overview • Variable stratification • Cochran-Mantel-Haenszel (CMH) statistics • Matching and matched data • Agreement statistics – McNemar’s Test – Cohen’s Kappa Stratification by a Third Variable • Exposure of interest • Disease outcome • Third variable, e.g., confounder C E ? D Confounding • Effect of exposure on disease may be different in the presence of a third variable (“Confounder”) • Reflects the fact that epidemiologic research is conducted among humans with unevenly distributed characteristics • Results because of a lack of comparability between the exposed and unexposed groups in the base population Controlling for Confounding • Design phase of studies – Randomization in experimental studies – Restriction – Matching • Analysis phase – Stratified analysis – Model fitting Stratified Analyses: The CMH Option in SAS • Gives a stratified statistical analysis of the relationship between Exposure (E) and Disease (D), after controlling for a Confounder (C): Proc freq; tables C * E * D / cmh; Run; • Can simultaneously stratify by multiple confounders: Proc freq; tables C1 * C2 * E * D / cmh; Run; Estimates of Common Relative Risk for 2x2 Tables • Adjusted odds ratio (OR) and relative risk (RR) for stratified 2x2 tables with 95% CL • Obtain OR and RR estimates for association between Exposure and Disease, adjusted for the Confounder • For this course, report the Mantel-Haenszel estimate of the common odds ratio, ORMH Breslow-Day Test for Homogeneity of the Odds Ratios • For stratified 2x2 tables • Null hypothesis is that the ORs are equal across all strata – χ2 distribution with q – 1 df, where q is the number of strata • Alternative hypothesis is that at least one stratum-specific OR differs from other stratumspecific ORs χ2BD (con’t) • If reject H0 for χ2BD test: – There is evidence for heterogeneity of ORs across strata; not appropriate to report the adjusted common OR – Report the stratum-specific ORs when effect modification is present CMH Statistic 1: Nonzero Correlation • Tests the null hypothesis of no association vs. the alternative hypothesis that there is a linear association between the row and column variables in at least one stratum • Both row and column variables have to be ordinal • Under H0, ~ χ2 with 1 df CMH Statistic 2: Row Mean Scores Differ • Tests the null hypothesis of no association vs. the alternative hypothesis that the mean scores of the table rows are unequal for at least one stratum • Useful only when the column variable is ordinal • Under H0, ~ χ2 with (r – 1) df CMH Statistic 3: General Association • Tests the null hypothesis of no association vs. the alternative hypothesis that there is some kind of association between the row and column variables for at least one stratum • Does not require the row or column variable to be ordinal • Under H0, ~ χ2 with (r – 1)(c – 1) df Matching • Control for confounding more efficiently than if the matching had not been performed • Design phase of a study • Gain statistical efficiency in effect estimation Matching (con’t) • Select comparison participants into a study such that they are the same (or nearly the same) on certain variable(s) • Matched design requires a matched analysis • Once match on a variable, the effect of that variable cannot be estimated in your data set Matched Data and the AGREE Option in SAS • AGREE option computes tests and measures of agreement for square tables (where the number of rows equal the number of columns) title "McNemar's Test for highchol and hibmi for pill and non-pill"; proc freq data=pairs; tables hichol1*hichol2 hibmi1*hibmi2 / agree norow nocol; run; AGREE Option in SAS • AGREE option generates: -McNemar’s Test -Kappa -Weighted Kappa McNemar’s Test of Symmetry for Matched Samples • For 2x2 tables • Appropriate when have data from matched pairs of subjects with a dichotomous (yes/no) outcome • Null hypothesis of marginal homogeneity – Werner data set of matched pairs, comparing proportion of women with high cholesterol who take birth control pill to the proportion of women with high cholesterol who do not take the pill • χ2 distribution with 1 df • McNemar’s Test for Matched Werner data set Proportions with agematched pairs • There are 92 pairs. • 45.65% of the NoPill group have high chol. • 47.83% of the Pill group have high chol. Χ2 M = (21 – 23)2 (21 +23) = 0.0909 Frequency Percent No Pill: High Chol=1 No Pill: High Chol=2 Total Pill: High Chol=1 Pill: High Chol=2 Total 21 22.83 21 22.83 42 45.65 23 25.00 27 29.35 50 54.35 44 47.83 48 52.17 92 100.00 Simple Kappa Coefficient (Cohen’s Kappa) • Measure of inter-rater agreement, corrected for chance Κ = P0 - Pe 1 - Pe • Scale from -1 to +1 – Κ = +1 when there is perfect agreement – Κ = 0 when the agreement equals that expected by chance • Magnitude of Kappa reflects the strength of the agreement, beyond chance Cohen’s Kappa (con’t) • SAS gives 95% CI for Kappa • Kappa Guidelines (Landis and Koch) Kappa Statistic <0.00 Strength of Agreement Poor 0.00 – 0.20 Slight 0.21 – 0.40 Fair 0.41 – 0.60 Moderate 0.61 – 0.80 Substantial 0.81 – 1.00 Almost perfect Good Resources for Categorical Data Analysis and SAS • SAS: Categorical Data Analysis Using The SAS System by Maura E. Stokes, Charles S. Davis, and Gary G. Koch. 2nd Ed, SAS Institute Inc., Cary, NC, 2000. • See pages 155-156 of Biostat 510 course pack • Kappa: “The Measurement of Observer Agreement for Categorical Data,” by J. Richard Landis and Gary G. Koch. Biometrics 33(1):159174, 1977