Paired data McNemars test Conditional logistic

advertisement
Paired data: McNemar's test and conditional logistic regression.
In some experiments, cases are matched to controls.
Examples:



Drug in one eye, placebo in other eye
Litter mates: some get drug, some get placebo
One patient gets both treatments sequentially
Sometimes subjects are matched on factors such as age and sex. Matching subjects
often helps reduce the unexplained variability. For a t-test, we typically use a paired ttest to analyze matched pairs of subjects. When we move to regression analysis, we
often describe the matched subjects as being in a stratum. While the t-test requires 1:1
matching, regression allows m:n matching. Here's an example.
Stratum
1
2
3
4
Drug subjects
A, B, C
F, G, H
J
N, O
Control subjects
D, E
I,
K, L, M
P, Q, R, S
The statistical analysis should account for the matching. We'll examine two methods for
analysis of matched data.: McNemar's test and conditional logistic regression.
McMemar's test
The chi-square test assumes there is no matching of subjects. When we have matching,
we use McNemar's test instead of the chi-square test.
Example data from Kleinbaum, Chapter 8.
paired.data=matrix(c(30,10,30,30),nr=2, dimnames=list("Case" = c("Present", "Absent"),
"Control"= c("Present", "Absent")))
paired.data
> paired.data
Control
Case
Present Absent
Present
30
30
Absent
10
30
>
# mcnemar test with continuity correction
mcnemar.test(paired.data,correct=TRUE)
McNemar's Chi-squared test with continuity
correction
data: paired.data
McNemar's chi-squared = 9.025, df = 1, p-value = 0.002663
OR=30/10=3
## Example with no association of case with exposure
paired.data2=matrix(c(30,30,30,30),nr=2, dimnames=list("Case" = c("Present",
"Absent"), "Control"= c("Present", "Absent")))
paired.data2
Case
Present Absent
Present
30
30
Absent
30
30
>
# mcnemar test with continuity correction
mcnemar.test(paired.data2,correct=TRUE)
> mcnemar.test(paired.data2,correct=TRUE)
McNemar's Chi-squared test
data: paired.data2
McNemar's chi-squared = 0, df = 1, p-value = 1
Conditional logistic regression
Conditional logistic regression is for matched data. Cases are matched to controls within
strata. The likelihood calculation is conditional on the strata, hence conditional logistic
regression.
Within each stratum, we examine the difference in the exposure variable, X, between
case and control(s). If case and control(s) have the same value of X, the stratum is not
informative and has no effect on the analysis.
library(survival)
# Conditional logistic R code. Modified from Shoukri example, page 126
z1= difference of value of X for case vs control in a particular stratum.
z1 can take values -1, 0, 1.
count is the count of the number of strata that have values -1, 0, 1.
Strata where z1=0 do not affect statistical analysis
# Use values from Kleinbaum example (30,10,30,30) showing association of case with
exposure
y=rep(1,3)
z1=c(-1,0,1)
count=c(10,60,30)
count=c(10,0,30)
match=glm(y~-1 + z1 , family=binomial("logit"), weights=count)
summary(match)
Call:
glm(formula = y ~ -1 + z1, family = binomial("logit"), weights = count)
Deviance Residuals:
1
2
3
5.266 0.000 4.155
Coefficients:
Estimate Std. Error z value Pr(>|z|)
z1
1.0986
0.3651
3.009 0.00262 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 55.452
Residual deviance: 44.987
AIC: 46.987
on 2
on 1
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
# Coefficient estimate is 1.0986, which is the log odds that response Y=1 when X=1.
Use coefficient to calculate probability that response Y=1 when X=1.
ilogit(1.0986) = 0.75, which is 30/(30+10) = 0.75
# Make a graph
prob=c()
plot(prob, xlim=c(0,1), ylim=c(0,1), xlab="X variable", ylab="response")
x=c(0,1)
lines(x, ilogit(1.1*x), lty=1, col="black")
# Calculate the estimated odds ratio and 95%CI
sum.coef=summary(match)$coef
odds.ratio=exp(sum.coef[,1])
upper.ci= exp(sum.coef[,1] + 1.96*sum.coef[,2])
lower.ci= exp(sum.coef[,1] - 1.96*sum.coef[,2])
cbind(odds.ratio,upper.ci,lower.ci)
[1,]
odds.ratio upper.ci lower.ci
3 6.136618 1.466606
estimate = log(odds.ratio)
So odds.ratio = exp(estimate) = exp(1.0896) = 3
# Example with values showing no association of case with exposure
y=rep(1,3)
z1=c(-1,0,1)
count=c(30,0,30)
match=glm(y~-1 + z1 , family=binomial("logit"), weights=count)
summary(match)
Call:
glm(formula = y ~ -1 + z1, family = binomial("logit"), weights = count)
Deviance Residuals:
1
2
3
6.449 0.000 6.449
Coefficients:
Estimate Std. Error z value Pr(>|z|)
z1 3.007e-16 2.582e-01
0
1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83.178
Residual deviance: 83.178
AIC: 85.178
on 2
on 1
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 2
# How does CLR handle 1:m matching?
Each case contributes m differences within that stratum. Likelihood is product over
strata.
# First, show calculation for 1:1 matching. Use Faraway's clogit function. page 50
# Kleinbaum data (30,10,30,30)
Want 30 case present, control absent
Want 10 case absent, control present
stratum= rep(1:40, each = 2)
disease=rep(0:1, times=40)
exposure=c(rep(0:1, times=30),rep(1:0, times=10))
kleinbaum.data= data.frame(stratum,disease,exposure)
head(kleinbaum.data)
1
2
3
4
5
6
>
stratum disease exposure
1
0
0
1
1
1
2
0
0
2
1
1
3
0
0
3
1
1
tail(kleinbaum.data)
75
76
77
78
79
80
>
stratum disease exposure
38
0
1
38
1
0
39
0
1
39
1
0
40
0
1
40
1
0
# Run the conditional logistic analysis with stratum as the strata variable.
clogit(disease~ exposure + strata(stratum),data=kleinbaum.data)
Call:
clogit(disease ~ exposure + strata(stratum), data = kleinbaum.data)
coef exp(coef) se(coef)
z
p
exposure 1.10
3
0.365 3.01 0.0026
Likelihood ratio test=10.5
>
rm(kleinbaum.data)
on 1 df, p=0.00122
n= 80
# Now show 1:m matching.
# Use Faraway's clogit function. page 50
for one case matched to two controls
For now, use both control present or both control absent
Want 30 case present both control absent
Want 10 case absent both control present,
stratum= rep(1:40, each = 3)
disease=rep(c(0,1,1), times=40)
exposure=c(rep(c(0,1,1), times=30),rep(c(1,0,0), times=10))
kleinbaum.data2= data.frame(stratum,disease,exposure)
head(kleinbaum.data2)
1
2
3
4
5
6
>
stratum disease exposure
1
0
0
1
1
1
1
1
1
2
0
0
2
1
1
2
1
1
clogit(disease~ exposure + strata(stratum),data=kleinbaum.data2)
Call:
clogit(disease ~ exposure + strata(stratum), data = kleinbaum.data2)
coef exp(coef) se(coef)
z
p
exposure 1.54
4.65
0.373 4.12 3.8e-05
Likelihood ratio test=19.8
>
on 1 df, p=8.68e-06
n= 120
For ordinary logistic regression, beta is the value that maximizes the likelihood of the
observed data. The likelihood is calculated as the product of the likelihood of each
individual observation.
For conditional logistic regression, beta is till the value that maximizes the likelihood of
the observed data. However, the likelihood is calculated as the product of the likelihood
of each stratum.
# Formula for CLR calculation of the likelihood for each stratum.
Faraway page 49, Le page 182.
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
Likelihood for model is then product of likelihood for each stratum.
Find beta that maximizes likelihood.
For Kleinbaum data (30,10,30,30), beta = 1.09
# For 1:1 match
For strata with case exposure = 0, control exposure = 1
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
= exp(beta*0) / exp(beta*1)
=1/exp(beta)
For strata with case exposure = 1, control exposure = 0
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
= exp(beta*1) / exp(beta*1)
=1
From Le page 182:
LB across all strata = exp(beta*n01) / (1+exp(beta))^(n01+n10)
For (30,10,30,30)
LB across all strata = exp(beta*30) / (1+exp(beta))^(30+10)
exp(1.09*30) / (1+exp(1.09))^(30+10)
exp(1.2*30) / (1+exp(1.2))^(30+10)
exp(1.0*30) / (1+exp(1.0))^(30+10)
# plot of beta vs. LB
beta.values=c(.6,.7,.8,.9,1,1.09,1.2,1.3,1.4)
plot(beta.values, exp(beta.values *30) / (1+exp(beta.values))^(30+10))
# For 1:2 match
For strata with case exposure = 0, control exposure = 1, control exposure = 0
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
= exp(beta*0) / [exp(beta*1) + exp(beta*0)]
=1/[1+exp(beta)]
For strata with case exposure = 0, control exposure = 1, control exposure = 0
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
= exp(beta*0) / [exp(beta*1) + exp(beta*1)]
=1/[2*exp(beta)]
For strata with case exposure = 1, control exposure = 1, control exposure = 0
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
= exp(beta*1) / [exp(beta*1) + exp(beta*0)]
= exp(beta)/[1+exp(beta)]
For strata with case exposure = 1, control exposure = 0, control exposure = 0
LB = exp(beta*case.exposure.as.0.1) / Sum of exp(beta*exposure.as.0.1) for case and
controls
= exp(beta*1) / [exp(beta*0) + exp(beta*0)]
= exp(beta)/[1+1]
= exp(beta)/2
The model likelihood is then the product of these likelihoods for each stratum. Beta is
the value that maximizes the model likelihood.
Download