Matched Pair Data Stat 557 Heike Hofmann

advertisement
Matched Pair Data
Stat 557
Heike Hofmann
Outline
•
•
•
•
•
Conditional Logit versus Random Effects
Matched Pair Models for ordinal response
Nominal Response
•
•
Homogeneity
Symmetric Models
Quasi-Symmetry
Quasi-Independence
Conditional Logit
coxph(formula = Surv(rep(1, 6600L), I(voted.for == "o")) ~ time +
strata(id), data = votebigm, method = "exact")
n= 6600, number of events= 3453
coef exp(coef) se(coef)
z Pr(>|z|)
timesecond -0.2918
0.7469
0.1202 -2.429
0.0152 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
timesecond
0.7469
1.339
0.5902
0.9453
Random Effects
Generalized linear mixed model fit by the Laplace
approximation
Formula: obama ~ time + (1 | id)
Data: votebigm
AIC BIC logLik deviance
7080 7100 -3537
7074
Random effects:
Groups Name
Variance Std.Dev.
id
(Intercept) 25.961
5.0952
Number of obs: 6600, groups: id, 3300
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.61306
0.11320
5.416 6.1e-08 ***
timesecond -0.16047
0.09415 -1.704
0.0883 .
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1
Why are the estimates different?
Estimates of fixed effects in the mixed effects model are
biased by factor of order T/(T-1), where T is number of
individuals in a cluster (for pairs T=2)
(Anderson 1980)
Models for Square Contingency Tables
l for Ordinal Response
Matched Pairs: Ordinal
Models for Square Contingency Tables
Model for Ordinal Response
Y and Y are ordinal variables with J>2
•
be ordinal with J categories.
1
2
categories
proportional
oddswith
model:
Let Yt be ordinal
J categories.
POLR
model
(marginal):
Then proportional odds model:
•
•
logit(P(Yt ≤ j)) = αj + βxt
logit(P(Yt ≤ j)) = αj + βxt
Cumulative
odds ratio: odds ratios are constant for all j:
ative
odds cumulative
ratio:
P(Y2 ≤ j)/P(Y2 > j)
log θj =P(Y
log 2 ≤ j)/P(Y2 > j)
= β(x2 − x1 ) = β,
P(Y1 ≤ j)/P(Y1 > j)
log θj = log
= β(x2 − x1 )
P(Y1 ≤ j)/P(Y1 > j)
for x2 = 1 and x1 = 0, independent of j.
= β,
Marginal
Homogeneity
Marginal Homogeneity in Ordinal Model
Models for Square Contingency Tables
Marginal homogeneity is equivalent to zero
•
Marginal homogeneity:
log odds ratio:
β=0
⇐⇒ logit(P(Y1 ≤ j)) = logit(P(Y2 ≤ j)) ∀j
⇐⇒ P(Y1 ≤ j) = P(Y2 ≤ j) ∀j
•
•
•
⇐⇒ πj+ = π+j ∀j
Model Fit: Model Fit based on 1+ (J-1) parameters
based on marginal probabilities πj+ , π+j , j= 1, ..., J,
Overall
we
have
2(J-1)
degrees
of
freedom
overall 2 · (J − 1) degrees of freedom;
proportionalModel
odds model
has (J
− 1) + ����
1 freedom
= J parameters
has J-2
degrees
of
� �� �
αj
model fit is based on df = J − 2.
β
Matched Pairs: Nominal
• Baseline Logistic Regression
log P(Yt = j)/P(Yt = J) = αj + βj xt
• Then β = 0 is test for marginal homogeneity
j
POLR model (marginal):
Models for Square
Contingency Tables
• For nominal Y with J ≥ 3 categories, use J as
baseline
• Baseline Logistic Regression
log P(Yt = j)/P(Yt = J) = αj + βj xt
• Then β =0 is test for marginal homogeneity
j
Testing for marginal
homogeneity
Models for Square Contingency Tables
Testing for Marginal Homogeneity (Bhapkar 1966)
Bhapkar (1966)
Let da = πa+ − π+a with a = 1, ..., I − 1. The covariance matrix
√
cov ( nd) then has elements
vaa = pa+ + p+a − 2paa − (pa+ − p+a )2
vab = −(pab − pba ) − (p+a − pa+ )(p+b − pb+ ) for a �= b
Using asymptotic normality for a multinomial sample, we have
√
n(d − E [d]) ∼ N(0, V )
Under marginal homogeneity E [d] = 0 and W = nd � V −1 d ∼ χ2I −1
Example: Migration Data
Migration Data
95% of the data is on the diagonal.
Residence in 1985
Residence 80
NE
MW
S
W
NE 11607
100
366
124
MW
87 13677
515
302
S
172
225 17819
270
W
63
176
286 10192
Total 11929 14178 18986 10888
Total
12197
14581
18486
10717
55981
• 95% of data is on diagonal
• marginal homogeneity seems given,
is data even symmetric?
Stat 557 ( Fall 2008)
Matched Pair Data
November 4, 2008
10 / 10
multinom(formula = residence ~ time, data = mob.hom, weights = counts)
Coefficients:
(Intercept)
timer85
MW
0.1785305 -0.005810538
S
0.4158239 0.048906665
W
-0.1293560 0.038043575
Std. Errors:
(Intercept)
timer85
MW 0.01227069 0.01746227
S
0.01166544 0.01651006
W
0.01323998 0.01873421
Residual Deviance: 305344.6
AIC: 305356.6
> mob.hom
time residence counts
1 r85
NE 11929
2 r85
MW 14178
3 r85
S 18986
4 r85
W 10888
5 r80
NE 12197
6 r80
MW 14581
7 r80
S 18486
8 r80
W 10717
Full Model
multinom(formula = residence ~ 1, data = mob.hom, weights = counts)
Coefficients:
(Intercept)
MW
0.1756614
S
0.4403043
W
-0.1103649
Std. Errors:
(Intercept)
MW 0.008730452
S 0.008254433
W 0.009366678
> anova(mobility.hom, mobility.main)
Model Resid. df Resid. Dev
Test
1
1
21
305361.2
2 time
18
305344.6 1 vs 2
Marginal Homogeneity
> fit
NE
MW
S
W
1 12196.99 14581.00 18485.99 10717.02
2 11929.00 14178.01 18986.02 10887.98
> fit.hom
NE
MW
S
W
1 12063.00 14379.50 18736 10802.5
2 12063.00 14379.50 18736 10802.5
Df LR stat.
Pr(Chi)
NA
NA
NA
3 16.64982 0.0008341456
Symmetry Model
• H : π = π for all a,b
• as logistic regression:
0
ab
ba
log πab/πba = 0
• as loglinear model
Y
XY
log mab = λ + λX
+
λ
+
λ
a
b
ab
•
Y
XY
XY
with λX
=
λ
and
λ
=
λ
a
a
ab
ba
impose constraints with help of design matrix
Migration
mobility$symm <- ldply(1:nrow(mobility), function(i) {
x <- as.c(mobility$r80[i],mobility$r85[i])
return(paste(sort(x), collapse=","))
})$V1
mob.symm <- glm(counts~symm-1, data=mobility)
> delta(fitted(mob.symm), mobility$counts)
[1] 0.006252121
Agresti argues that symmetry is
violated, but homogeneity is not ...
> mobility
r85 r80 counts symm
1
NE NE 11607 1,1
2
MW NE
100 1,2
3
S NE
366 1,3
4
W NE
124 1,4
5
NE MW
87 1,2
6
MW MW 13677 2,2
7
S MW
515 2,3
8
W MW
302 2,4
9
NE
S
172 1,3
10 MW
S
225 2,3
11
S
S 17819 3,3
12
W
S
270 3,4
13 NE
W
63 1,4
14 MW
W
176 2,4
15
S
W
286 3,4
16
W
W 10192 4,4
Fitted Values
xtabs(counts~r80+r85, data=mobility)
r85
r80
NE
MW
S
W
NE 11607
100
366
124
MW
87 13677
515
302
S
172
225 17819
270
W
63
176
286 10192
xtabs(fitted(mob.symm)~r80+r85, data=mobility)
r85
r80
NE
MW
S
W
NE 11607.0
93.5
269.0
93.5
MW
93.5 13677.0
370.0
239.0
S
269.0
370.0 17819.0
278.0
W
93.5
239.0
278.0 10192.0
Symmetry Model
Testing for Symmetry
Likelihood Equations
µ̂aa = naa
µ̂ab = (nab + nba )/2.
Symmetry can be tested, e.g. using the Pearson statistic:
X2 =
� (nab − µab )2
a,b
µab
=
� (nab − nba )2
a<b
nab + nba
(Bowker 1948)
with df = I (I − 1)/2.
Stat 557 ( Fall 2008)
Symmetric Contingency Tables
November 6, 2008
10 / 18
µ̂ab = (nab + nba )/2.
Symmetry can be tested, e.g. using the Pearson statistic:
Coffee Brand Data
X2 =
� (nab − µab )2
a,b
µab
=
� (nab − nba )2
a<b
nab + nba
(Bowker 1948)
with df = I(I − 1)/2.
• American Market Association
Example: Choice of Coffee The American Market Association conducted a
feinated coffee at two different dates of purchase.
Second Purchase
1st Purchase High Point Taster’s Sanka Nescafe Brim Total
High Point
93
17
44
7
10
171
Taster’s Choice
9
46
11
0
9
75
Sanka
17
11
155
9
12
204
Nescafe
6
4
9
15
2
36
Brim
10
4
12
2
27
55
Total
135
82
231
33
60
541
Fitting the symmetry model can be done explicitly. X 2 = 20.4 with df = 10, indic
poorly.
94
Call:
glm(formula = count ~ symm - 1, family = poisson(log), data = coffee)
Deviance Residuals:
Min
1Q
-2.670e+00 -3.332e-08
Median
0.000e+00
3Q
2.107e-08
Max
2.291e+00
Coefficients:
Estimate Std. Error z value Pr(>|z|)
symm1,1 4.53260
0.10370 43.711 < 2e-16 ***
symm1,2 2.56495
0.19612 13.079 < 2e-16 ***
symm1,3 3.41773
0.12804 26.693 < 2e-16 ***
symm1,4 1.87180
0.27735
6.749 1.49e-11 ***
symm1,5 2.30259
0.22361 10.297 < 2e-16 ***
symm2,2 3.82864
0.14744 25.967 < 2e-16 ***
symm2,3 2.39790
0.21320 11.247 < 2e-16 ***
symm2,4 0.69315
0.49999
1.386
0.166
symm2,5 1.87180
0.27735
6.749 1.49e-11 ***
symm3,3 5.04343
0.08032 62.790 < 2e-16 ***
symm3,4 2.19722
0.23570
9.322 < 2e-16 ***
symm3,5 2.48491
0.20412 12.174 < 2e-16 ***
symm4,4 2.70805
0.25820 10.488 < 2e-16 ***
symm4,5 0.69315
0.50000
1.386
0.166
symm5,5 3.29584
0.19245 17.126 < 2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Doesn’t fit
particularly well
Null deviance: 3063.200
Residual deviance:
22.473
AIC: 157.01
on 25
on 10
degrees of freedom
degrees of freedom
Marginal Homogeneity?
200
200
150
count
count
150
100
50
100
50
0
0
High Point
Taster’s
Sanka
first
Nescafe
Brim
High Point
Taster’s
Sanka
second
Nescafe
market for High Point seems to have changed
Brim
Quasi-Symmetry Model
• Remove constraints on marginals:
• as loglinear model
Y
XY
log mab = λ + λX
+
λ
+
λ
a
b
ab
with
XY
λab
=
XY
λba
Quasi-Symmetry Model
Quasi-Symmetry Model
Quasi-Symmetry
Quasi-Symmetry Model
Y2
Y1 Y2
1
log µab = λ + λY
+
λ
+
λ
a
b
ab ,
Y 1 Y2
1 Y2
with λY
=
λ
for all a, b, i.e. quasi-symmetry model allows as m
ab
ba
Y2
Y1 Y2
1
log µab = λ + λY
+
λ
+
λ
,
b
ab for
symmetry as possible while aaccounting
different marginal distributio
Y 1 Y2
1 Y2
with
λY
=
λ
for all a, b, i.e. quasi-symmetry model allows as much
Likelihood
equations:
ab
ba
symmetry as possible while accounting for different marginal distributions.
i. µ̂a+equations:
= na+ for all a,
Likelihood
i.
= n=
for
a, all b,
a+ n
ii. µ̂a+
µ̂+b
+ballfor
ii. µ̂+b = n+b for all b,
iii.
µ̂ab + µ̂ba = nab + nba .
iii. µ̂ab + µ̂ba = nab
+ nba . Model
Quasi-Symmetry
Equations i. and iii. imply equation ii:
Equations i. and iii.Model
imply equation ii:
Quasi-Symmetry
�
�
iii.
na+n+ n+a
=
nab + nban =
a+ + n+a =
ab
b
b
degrees of freedom
�
+
b
�
iii.
µ̂
+
ab
nba =µ̂ba = µ̂µ̂a+ab++µ̂+a
µ̂.ba
b
df = I 2 − 1 − (I − 1) − (I − 1) − I (I − 1)/2 = (I − 1)(I − 2).
� Symmetric
�� � �Contingency
�� �Tables
Stat 557 ( Fall 2008) � �� �
November 6, 2008
λa+
Stat 557 ( Fall 2008)
λ+a
= µ̂a+ + µ̂+a .
12 / 18
λab ,a<b
Symmetric Contingency Tables
Perfect match on the main diagonal, i.e µ̂aa = naa for all a.
For all other effects:
πab = αa βb γab , where γab = γba .
November 6, 2008
glm(formula = count ~ first + symm - 1, family = poisson(log),
data = coffee)
Deviance Residuals:
Min
1Q
-1.842e+00 -1.925e-01
Median
-3.942e-08
3Q
1.999e-01
Max
1.022e+00
Coefficients:
Estimate Std. Error z value
firstHigh Point
4.5326
0.1037 43.711
firstTaster’s
3.9332
0.3006 13.084
firstSanka
3.8239
0.2426 15.760
firstNescafe
4.2393
0.3658 11.590
firstBrim
3.9372
0.3114 12.642
symm1,2
-1.7122
0.2433 -7.036
symm1,3
-0.8220
0.1799 -4.568
symm1,4
-2.5249
0.3319 -7.608
symm1,5
-1.9760
0.2677 -7.382
symm2,2
-0.1045
0.3348 -0.312
symm2,3
-1.4821
0.3179 -4.662
symm2,4
-3.4047
0.5716 -5.957
symm2,5
-2.0634
0.3788 -5.447
symm3,3
1.2196
0.2556
4.772
symm3,4
-1.8558
0.3628 -5.116
symm3,5
-1.3972
0.3169 -4.409
symm4,4
-1.5312
0.4477 -3.420
symm4,5
-3.4064
0.5734 -5.941
symm5,5
-0.6413
0.3661 -1.752
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
Null deviance: 3063.200
Residual deviance:
9.974
AIC: 152.51
on 25
on 6
Pr(>|z|)
< 2e-16
< 2e-16
< 2e-16
< 2e-16
< 2e-16
1.98e-12
4.92e-06
2.78e-14
1.56e-13
0.754932
3.14e-06
2.57e-09
5.12e-08
1.83e-06
3.12e-07
1.04e-05
0.000626
2.83e-09
0.079808
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
***
.
0.05 ‘.’ 0.1 ‘ ’ 1
degrees of freedom
degrees of freedom
Quasi Independence
Quasi-Independence Model
Quasi-Independence Model
Quasi-Independence
Model
si-Independence Model
• Independence in matched pair data is usually
violated because
heavy
diagonal usually violated (beca
In matchedof
pair
data independence
•
heavily loaded)
tched pair data independence
usually violated (because diagonal
Quasi-Independence:
Quasi-independence model: fit independence for off-diago
y loaded)
cells on the main
separately:
givenmodel:
off-diagonality,
dodiagonal
we for
have
independence?
-independence
fit independence
off-diagonal
cells and fit
Y2
1
on the mainModel
diagonal
separately:
log µab = λ + λY
+
λ
Form
a
b + δa I (a = b) ,
•
1
λY
a
2
λY
b
�
��
independence
log µab = λ +
+
+ δa I (a = b) ,
�� with
�
� I is��
� �function,
where
an indicator
main diagonal
independence
I is an indicator function, with with
�
�
1
I (a = b) =
0
�
�
��
�
main diagonal
if a = b,
otherwise.
glm(formula = counts ~ r80 + r85 + diag, family = poisson(link = log),
data = mobility)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.23642
0.07363 57.533
<2e-16 ***
r80MW
0.52906
0.05444
9.719
<2e-16 ***
r80S
0.65562
0.06160 10.643
<2e-16 ***
r80W
0.03168
0.06141
0.516
0.606
r85MW
0.60431
0.07344
8.229
<2e-16 ***
r85S
1.50949
0.06682 22.592
<2e-16 ***
r85W
0.77773
0.06873 11.316
<2e-16 ***
diag1
5.12294
0.07422 69.026
<2e-16 ***
diag2
4.15367
0.06391 64.989
<2e-16 ***
diag3
3.38648
0.06007 56.372
<2e-16 ***
diag4
4.18353
0.06493 64.430
<2e-16 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 131003.31
Residual deviance:
69.51
AIC: 221.71
on 15
on 5
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
> deviance(mob.qi)
[1] 69.5094
> delta(fitted(mob.qi), mobility$counts)
[1] 0.003282017
Relationship between
Models
Quasi-Independence
estimates for
diagonal=0
Quasi-Symmetry
Homogeneity
Independence
Symmetry
Download