Paired data: McNemar's test and conditional logistic regression. In some experiments, cases are matched to controls. Examples: Drug in one eye, placebo in other eye Litter mates: some get drug, some get placebo One patient gets both treatments sequentially Sometimes subjects are matched on factors such as age and sex. Matching subjects often helps reduce the unexplained variability. For a t-test, we typically use a paired ttest to analyze matched pairs of subjects. When we move to regression analysis, we often describe the matched subjects as being in a stratum. While the t-test requires 1:1 matching, regression allows m:n matching. Here's an example. Stratum 1 2 3 4 Drug subjects A, B, C F, G, H J N, O Control subjects D, E I, K, L, M P, Q, R, S The statistical analysis should account for the matching. We'll examine two methods for analysis of matched data.: McNemar's test and conditional logistic regression. McMemar's test The chi-square test assumes there is no matching of subjects. When we have matching, we use McNemar's test instead of the chi-square test. Example data from Kleinbaum, Chapter 8. paired.data=matrix(c(30,10,30,30),nr=2, dimnames=list("Case" = c("Present", "Absent"), "Control"= c("Present", "Absent"))) paired.data > paired.data Control Case Present Absent Present 30 30 Absent 10 30 > # mcnemar test with continuity correction mcnemar.test(paired.data,correct=TRUE) McNemar's Chi-squared test with continuity correction data: paired.data McNemar's chi-squared = 9.025, df = 1, p-value = 0.002663 OR=30/10=3 ## Example with no association of case with exposure paired.data2=matrix(c(30,30,30,30),nr=2, dimnames=list("Case" = c("Present", "Absent"), "Control"= c("Present", "Absent"))) paired.data2 Case Present Absent Present 30 30 Absent 30 30 > # mcnemar test with continuity correction mcnemar.test(paired.data2,correct=TRUE) > mcnemar.test(paired.data2,correct=TRUE) McNemar's Chi-squared test data: paired.data2 McNemar's chi-squared = 0, df = 1, p-value = 1 Conditional logistic regression Conditional logistic regression is for matched data. Cases are matched to controls within strata. The likelihood calculation is conditional on the strata, hence conditional logistic regression. Within each stratum, we examine the difference in the exposure variable, X, between case and control(s). If case and control(s) have the same value of X, the stratum is not informative and has no effect on the analysis. library(survival) # Conditional logistic R code. Modified from Shoukri example, page 126 z1= difference of value of X for case vs control in a particular stratum. z1 can take values -1, 0, 1. count is the count of the number of strata that have values -1, 0, 1. Strata where z1=0 do not affect statistical analysis # Use values from Kleinbaum example (30,10,30,30) showing association of case with exposure y=rep(1,3) z1=c(-1,0,1) count=c(10,60,30) count=c(10,0,30) match=glm(y~-1 + z1 , family=binomial("logit"), weights=count) summary(match) Call: glm(formula = y ~ -1 + z1, family = binomial("logit"), weights = count) Deviance Residuals: 1 2 3 5.266 0.000 4.155 Coefficients: Estimate Std. Error z value Pr(>|z|) z1 1.0986 0.3651 3.009 0.00262 ** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 55.452 Residual deviance: 44.987 AIC: 46.987 on 2 on 1 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 # Coefficient estimate is 1.0986, which is the log odds that response Y=1 when X=1. Use coefficient to calculate probability that response Y=1 when X=1. ilogit(1.0986) = 0.75, which is 30/(30+10) = 0.75 # Make a graph prob=c() plot(prob, xlim=c(0,1), ylim=c(0,1), xlab="X variable", ylab="response") x=c(0,1) lines(x, ilogit(1.1*x), lty=1, col="black") # Calculate the estimated odds ratio and 95%CI sum.coef=summary(match)$coef odds.ratio=exp(sum.coef[,1]) upper.ci= exp(sum.coef[,1] + 1.96*sum.coef[,2]) lower.ci= exp(sum.coef[,1] - 1.96*sum.coef[,2]) cbind(odds.ratio,upper.ci,lower.ci) [1,] odds.ratio upper.ci lower.ci 3 6.136618 1.466606 estimate = log(odds.ratio) So odds.ratio = exp(estimate) = exp(1.0896) = 3 # Example with values showing no association of case with exposure y=rep(1,3) z1=c(-1,0,1) count=c(30,0,30) match=glm(y~-1 + z1 , family=binomial("logit"), weights=count) summary(match) Call: glm(formula = y ~ -1 + z1, family = binomial("logit"), weights = count) Deviance Residuals: 1 2 3 6.449 0.000 6.449 Coefficients: Estimate Std. Error z value Pr(>|z|) z1 3.007e-16 2.582e-01 0 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 83.178 Residual deviance: 83.178 AIC: 85.178 on 2 on 1 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 2 # How does CLR handle 1:m matching? Each case contributes m differences within that stratum. Likelihood is product over strata. # First, show calculation for 1:1 matching. Use Faraway's clogit function. page 50 # Kleinbaum data (30,10,30,30) Want 30 case present, control absent Want 10 case absent, control present stratum= rep(1:40, each = 2) disease=rep(0:1, times=40) exposure=c(rep(0:1, times=30),rep(1:0, times=10)) kleinbaum.data= data.frame(stratum,disease,exposure) head(kleinbaum.data) 1 2 3 4 5 6 > stratum disease exposure 1 0 0 1 1 1 2 0 0 2 1 1 3 0 0 3 1 1 tail(kleinbaum.data) 75 76 77 78 79 80 > stratum disease exposure 38 0 1 38 1 0 39 0 1 39 1 0 40 0 1 40 1 0 # Run the conditional logistic analysis with stratum as the strata variable. clogit(disease~ exposure + strata(stratum),data=kleinbaum.data) Call: clogit(disease ~ exposure + strata(stratum), data = kleinbaum.data) coef exp(coef) se(coef) z p exposure 1.10 3 0.365 3.01 0.0026 Likelihood ratio test=10.5 > rm(kleinbaum.data) on 1 df, p=0.00122 n= 80 # Now show 1:m matching. # Use Faraway's clogit function. page 50 for one case matched to two controls For now, use both control present or both control absent Want 30 case present both control absent Want 10 case absent both control present, stratum= rep(1:40, each = 3) disease=rep(c(0,1,1), times=40) exposure=c(rep(c(0,1,1), times=30),rep(c(1,0,0), times=10)) kleinbaum.data2= data.frame(stratum,disease,exposure) head(kleinbaum.data2) 1 2 3 4 5 6 > stratum disease exposure 1 0 0 1 1 1 1 1 1 2 0 0 2 1 1 2 1 1 clogit(disease~ exposure + strata(stratum),data=kleinbaum.data2) Call: clogit(disease ~ exposure + strata(stratum), data = kleinbaum.data2) coef exp(coef) se(coef) z p exposure 1.54 4.65 0.373 4.12 3.8e-05 Likelihood ratio test=19.8 > on 1 df, p=8.68e-06 n= 120 For ordinary logistic regression, beta is the value that maximizes the likelihood of the observed data. The likelihood is calculated as the product of the likelihood of each individual observation. For conditional logistic regression, beta is till the value that maximizes the likelihood of the observed data. However, the likelihood is calculated as the product of the likelihood of each stratum. # Formula for CLR calculation of the likelihood for each stratum. Faraway page 49, Le page 182. LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls Likelihood for model is then product of likelihood for each stratum. Find beta that maximizes likelihood. For Kleinbaum data (30,10,30,30), beta = 1.09 # For 1:1 match For strata with case exposure = 0, control exposure = 1 LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls = exp(beta*0) / exp(beta*1) =1/exp(beta) For strata with case exposure = 1, control exposure = 0 LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls = exp(beta*1) / exp(beta*1) =1 From Le page 182: LB across all strata = exp(beta*n01) / (1+exp(beta))^(n01+n10) For (30,10,30,30) LB across all strata = exp(beta*30) / (1+exp(beta))^(30+10) exp(1.09*30) / (1+exp(1.09))^(30+10) exp(1.2*30) / (1+exp(1.2))^(30+10) exp(1.0*30) / (1+exp(1.0))^(30+10) # plot of beta vs. LB beta.values=c(.6,.7,.8,.9,1,1.09,1.2,1.3,1.4) plot(beta.values, exp(beta.values *30) / (1+exp(beta.values))^(30+10)) # For 1:2 match For strata with case exposure = 0, control exposure = 1, control exposure = 0 LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls = exp(beta*0) / [exp(beta*1) + exp(beta*0)] =1/[1+exp(beta)] For strata with case exposure = 0, control exposure = 1, control exposure = 0 LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls = exp(beta*0) / [exp(beta*1) + exp(beta*1)] =1/[2*exp(beta)] For strata with case exposure = 1, control exposure = 1, control exposure = 0 LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls = exp(beta*1) / [exp(beta*1) + exp(beta*0)] = exp(beta)/[1+exp(beta)] For strata with case exposure = 1, control exposure = 0, control exposure = 0 LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and controls = exp(beta*1) / [exp(beta*0) + exp(beta*0)] = exp(beta)/[1+1] = exp(beta)/2 The model likelihood is then the product of these likelihoods for each stratum. Beta is the value that maximizes the model likelihood.