Note 17 Paired and Matched Binary Data The data here are from a clinical trial of an allergy medication. Participants recorded use of decongestants before starting on study medication and then after using study medication for four weeks. deco0 records usage (0/1) before medication, deco1 after medication. [17.1] . use hepp-patient, clear . list +--------------------------+ | id deco0 deco1 n | |--------------------------| 1. | 1 0 0 237 | 2. | 2 0 1 57 | 3. | 3 1 0 71 | 4. | 4 1 1 14 | +--------------------------+ The first command below shows an incorrect analysis, since the data are actually paired. (That is, the test for independence here takes no note of the fact that there are actually 379 pairs of measurements represented here.) The correct analysis uses McNemar’s test, which looks only at the diagonals of the 2 × 2 table, the only cells which are informative about how the before and after probabilities differ. [17.2] . ta deco0 deco1 [fw=n], chi2 exact ** WRONG ** Decongesta | nt before | Decongestant after rx rx | 0 1 | Total -----------+----------------------+---------0 | 237 57 | 294 1 | 71 14 | 85 -----------+----------------------+---------Total | 308 71 | 379 Pearson chi2(1) = Fisher’s exact = 1-sided Fisher’s exact = 0.3686 Pr = 0.544 ** WRONG ** 0.637 ** WRONG ** 0.332 ** WRONG ** The following commands do a correct analysis. The first does McNemar’s test using the exact binomial probabilities, while the next two use the normal approximation. When the z value computed below is squared, it is very close to McNemar’s χ2 test, which is calculated in the last display. 1 NOTE 17. PAIRED AND MATCHED BINARY DATA 2 [17.3] . bitesti 128 71 .5 N Observed k Expected k Assumed p Observed p -----------------------------------------------------------128 71 64 0.50000 0.55469 Pr(k >= 71) = 0.125220 Pr(k <= 71) = 0.907659 Pr(k <= 57 or k >= 71) = 0.250440 (one-sided test) (one-sided test) (two-sided test) [17.4] . display "z = " (71/128 - 0.5)/sqrt(71*57/(128*128*128)) z = 1.2449056 [17.5] . display (1.2449056)^2 " 1.54979 .21316645 " 2*(1-normprob(1.2449056)) [17.6] . display "McNemar’s chi-squared statistic = " ((71-57)^2)/(71+57) McNemar’s chi-squared statistic = 1.53125 The mcc command analyzes a matched case-control study, and is the way to calculate McNemar’s test for a paired binomial data set in Stata. Unfortunately, one must reinterpret the labels that Stata uses, as this is obviously not a case-control study. You can think of treatment status as defining case vs. control, so that cases correspond to the “after” measurement and controls to the “before.” [17.7] . mcc deco1 deco0 [fw=n] | Controls | Cases | Exposed Unexposed | Total -----------------+------------------------+-----------Exposed | 14 57 | 71 Unexposed | 71 237 | 308 -----------------+------------------------+-----------Total | 85 294 | 379 McNemar’s chi2(1) = 1.53 Prob > chi2 = 0.2159 Exact McNemar significance probability = 0.2504 Proportion with factor Cases .1873351 Controls .2242744 --------difference -.0369393 ratio .8352941 rel. diff. -.047619 odds ratio .8028169 [95% Conf. Interval] --------------------.0979673 .0240887 .6278769 1.111231 -.1248173 .0295792 .5564015 1.153877 (exact) NOTE 17. PAIRED AND MATCHED BINARY DATA 3 More generally, we could have several “controls” for each “case”, in which case the approach above cannot readily be used. The clogit command does conditional logistic regression, which is a generalization of McNemar’s test. The data need to be in long format for this command, so we start by having Stata reshape the data set. [17.8] . reshape long deco, i(id) j(after) (note: j = 0 1) Data wide -> long ----------------------------------------------------------------------------Number of obs. 4 -> 8 Number of variables 4 -> 4 j variable (2 values) -> after xij variables: deco0 deco1 -> deco ----------------------------------------------------------------------------[17.9] . list, sep(0) 1. 2. 3. 4. 5. 6. 7. 8. [17.10] +-------------------------+ | id after deco n | |-------------------------| | 1 0 0 237 | | 1 1 0 237 | | 2 0 0 57 | | 2 1 1 57 | | 3 0 1 71 | | 3 1 0 71 | | 4 0 1 14 | | 4 1 1 14 | +-------------------------+ . clogit after deco [fw=n], group(id) nolog or Conditional (fixed-effects) logistic regression Log likelihood = -261.93562 Number of obs LR chi2(1) Prob > chi2 Pseudo R2 = = = = 758 1.53 0.2155 0.0029 -----------------------------------------------------------------------------after | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------deco | .8028169 .1427759 -1.23 0.217 .5665467 1.13762 ------------------------------------------------------------------------------ NOTE 17. PAIRED AND MATCHED BINARY DATA 4 There is yet one more way to carry out these calculations. McNemar’s test can be thought of as a test of symmetry in the 2 × 2 table about the main diagonal. Sometimes we have paired observations of ordered categories. In the example below, antihistamine use before and after treatment is recorded in patients receiving an allergy treatment. [17.11] . use hepp-anti-full, clear [17.12] . symmetry before after [fw=n] ----------------------------------------------------------------------| after before | None Occasional Daily low Daily full Total -----------+----------------------------------------------------------None | 199 56 8 5 268 Occasional | 34 16 2 0 52 Daily low | 15 6 3 1 25 Daily full | 25 7 2 4 38 | Total | 273 85 15 10 383 ----------------------------------------------------------------------chi2 df Prob>chi2 -----------------------------------------------------------------------Symmetry (asymptotic) | 30.17 6 0.0000 Marginal homogeneity (Stuart-Maxwell) | 30.11 3 0.0000 -----------------------------------------------------------------------In this case, considerable information is lost if the data are simply reduced to antihistamine use (any or none). In the 2 × 2 case, both symmetry tests are equal to the McNemar’s χ2 test. [17.13] . symmetry antibefore antiafter [fw=n], exact ------------------------------antibefor | antiafter e | 0 1 Total ----------+-------------------0 | 199 69 268 1 | 74 41 115 | Total | 273 110 383 ------------------------------chi2 df Prob>chi2 -----------------------------------------------------------------------Symmetry (asymptotic) | 0.17 1 0.6759 Marginal homogeneity (Stuart-Maxwell) | 0.17 1 0.6759 -----------------------------------------------------------------------Symmetry (exact significance probability) 0.7381