Association Models Stat 557 Heike Hofmann

advertisement
Association Models
Stat 557
Heike Hofmann
Outline
• Uniform Association
• Row/Column Association
• Fitting Association Models in R
• Some Examples
Association Models
• Framework of loglinear models, i.e. we want to
model counts
• For ordinal variable X we have scores
u1≤ u2 ≤... ≤ up (most commonly ui=i)
similarly, for ordinal variable Y assume scores vj
• describe interaction term using scores
• Note: main effects in X are still modeled in
terms of the categories
Association Models
• Uniform association
λXY
ij = βui vj
•
λXY
ij
XY
λ
= βijj ui = βj ui
Z
Y
X
+
λ
+
λ
log
m
=
λ
+
λ
ijk
i
jmodel
k
Row/Column effects
X
Y
Z
XY
XY
log mXY
=
λ
+
λ
+
λ
+
λ
+
λ
XY
ijk
ij
λ i = βijvj k
λij
= βiji vj
λij
= βj ui
Y
Z
XZ
YZ
+
λ
+
λ
+
λ
log mijk = λ + λX
+
λ
j
i
k
ik
jk
•
XY
XY
XY Xλ
Y = βu
Z v λXZ
YβZ v(not
XYlinear)
Column
&
Row
effects
model
=
βu
v
+
λ
+
λ
+
λ
+
λ
log mijk =λλij+ λ=
+
λ
i
j
i
j
ij
i ij ij j
k
ikij
jk
Y XY Z
XZ
YZ
XY
XY Z
log mijk = λ + λX
+
λ
+
λ
+
λ
+
λ
+
λ
+
λ
βj
i
jλij k= βiik
ij
jk
ijk
XY
Y
Z
X
log mijk = λ + λi + λj + λλ
k ij
= βui vj
Example: Mental Health
• SES of parents and mental health of 1660
students recorded.
• SES is measured from A=high to F= low,
• Mental Health is rated in four levels from
well to poor
Mental Health
prodplot(data=mh, count~status+ses, c("vspine","hspine"))+aes(fill=status)
status
impaired
moderate
mild
well
A
B
C
D
E
F
Mental Health
• Model of independence obviously does not fit
• But: there is some pattern discernible:
as socio-economic status improves, mental
health improves steadily
• Idea: make use of ordinal structure in
variables
Prepping the data
> summary(mh)
ses
status
A:4
impaired:6
B:4
moderate:6
C:4
mild
:6
D:4
well
:6
E:4
F:4
count
Min.
: 21.00
1st Qu.: 54.00
Median : 64.50
Mean
: 69.17
3rd Qu.: 82.00
Max.
:141.00
• Make two versions of the same variable:
one as a factor, one as the numeric scores
mh$sesn <- as.numeric(mh$ses)
mh$statusn <- as.numeric(mh$status)
factors
scores
glm(formula = count ~ ses + status + sesn:statusn, family = poisson(link = log),
data = mh)
Deviance Residuals:
Min
1Q
Median
-1.2663 -0.3285
0.2025
3Q
0.3912
Max
1.0820
Mental Health
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
3.83169
0.09545 40.143 < 2e-16 ***
sesB
0.17682
0.09881
1.790 0.07353 .
sesC
0.57035
0.11955
4.771 1.84e-06 ***
sesD
1.08802
0.14528
7.489 6.94e-14 ***
sesE
0.93471
0.17854
5.235 1.65e-07 ***
sesF
0.94357
0.20883
4.518 6.23e-06 ***
statusmoderate 0.26461
0.09399
2.815 0.00487 **
statusmild
1.08882
0.12912
8.432 < 2e-16 ***
statuswell
0.70991
0.17460
4.066 4.79e-05 ***
sesn:statusn
-0.09069
0.01501 -6.043 1.51e-09 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
uniform association
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 217.3998
Residual deviance:
9.8951
AIC: 174.07
on 23
on 14
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
Raw vs Fitted
status
status
status
A
B
C
D
E
F
Combine ses A and B?
impaired
impaired
impaired
moderate
moderate
moderate
mild
mild
well
well
well
AA
AB
B
B
CC
D
D
E
E
FF
glm(formula = count ~ ses + status + sesn:status, family =
poisson(link = log),
data = mh)
Deviance Residuals:
Min
1Q
-0.91405 -0.32206
Median
0.09134
3Q
0.28206
Mental Health
Max
1.01943
Coefficients: (1 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept)
3.40215
0.15025 22.643 < 2e-16 ***
sesB
-0.20810
0.09325 -2.232 0.025634 *
sesC
-0.20056
0.10388 -1.931 0.053526 .
sesD
-0.06977
0.12221 -0.571 0.568039
sesE
-0.61074
0.15445 -3.954 7.68e-05 ***
sesF
-0.99023
0.18850 -5.253 1.49e-07 ***
statusmoderate
0.45446
0.18464
2.461 0.013844 *
statusmild
1.02664
0.16609
6.181 6.36e-10 ***
statuswell
0.82565
0.18492
4.465 8.01e-06 ***
statusimpaired:sesn 0.30682
0.04894
6.270 3.62e-10 ***
statusmoderate:sesn 0.16343
0.04888
3.343 0.000827 ***
statusmild:sesn
0.14509
0.04423
3.281 0.001036 **
statuswell:sesn
NA
NA
NA
NA
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 217.3998
Residual deviance:
6.2808
AIC: 174.45
on 23
on 12
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
row association
Row effects model
well
status
mild
impaired
status
Difference to the
uniform association
model barely visible
moderate
mild
well
moderate
Goodness of fit test
impaired
Analysis of Deviance Table
Model 1:
Model 2:
Resid.
1
2
count ~ ses + status + sesn:statusn
count ~ ses + status + sesn:status
Df Resid. Dev Df Deviance Pr(>Chi)
14
9.8951
12
6.2808 2
3.6144
0.1641
A
B
C
ses
D
E
F
Conclusion: stick with the
uniform association model,
maybe collapse levels A & B
into one
Party Affiliation
Gender
Female:6
Male :6
Race
Black:6
White:6
Party
Democrat
:4
Independent:4
Republican :4
Number
Min.
: 4.00
1st Qu.: 14.25
Median : 91.50
Mean
: 83.42
3rd Qu.:130.50
Max.
:176.00
Party
Female
Male
Party
Democrat
Democrat
Independent
Independent
Republican
Republican
Black
White
Introduce score
variables
Any binary variable is automatically also an
ordinal variable:
# two uniform associations?
party$Partyn <- as.numeric(party$Party)
# 'order' in gender and race doesn't matter, since it is binary:
party$Gendern <- as.numeric(party$Gender)
party$Racen <- as.numeric(party$Race)
party.uniform <- glm(Number~Party+Race+Gender+Racen:Partyn+Gendern:Partyn,
family=poisson(link=log), data=party)
Party Affiliation
glm(formula = Number ~ Party + Race + Gender + Racen:Partyn +
Gendern:Partyn, family = poisson(link = log), data = party)
Deviance Residuals:
1
2
0.148681 -0.014165
8
9
-0.165061 -0.163533
3
-0.117567
10
-0.380321
4
-0.008349
11
0.135880
5
0.107635
12
0.449677
6
0.076275
7
-0.111428
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
2.600823
0.145496 17.876 < 2e-16 ***
PartyIndependent -2.862458
0.304212 -9.409 < 2e-16 ***
PartyRepublican -5.410051
0.618489 -8.747 < 2e-16 ***
RaceWhite
-0.005983
0.228950 -0.026 0.979151
GenderMale
-0.576295
0.158528 -3.635 0.000278 ***
Racen:Partyn
1.135962
0.147876
7.682 1.57e-14 ***
Partyn:Gendern
0.289683
0.075876
3.818 0.000135 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 711.35267
Residual deviance:
0.48532
AIC: 82.555
on 11
on 5
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
Uniform association
model is almost perfect!
Party & Ideology
Self-proclaimed ideology and Party affiliation
Affil
Republican
Independent
Democrat
Conservative
Moderate
Liberal
Models
Analysis of Deviance Table
Model 1: count ~ Affil + Ideology
Model 2: count ~ Affil + Ideology + Affiln:Ideon
Model 3: count ~ Affil + Ideology + Affil:Ideon
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1
4
105.662
2
3
17.618 1
88.044 < 2.2e-16 ***
3
2
2.815 1
14.803 0.0001194 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• Main effect obviously not good (not
independent)
• Uniform fits better, but is not a good fit yet
• Row effects for Affiliation almost perfect fit
Party & Ideology
Raw vs Fit
Affil
Conservative
Moderate
Liberal
Affil
Republican
Republican
Independent
Independent
Democrat
Democrat
Conservative
Moderate
Liberal
Next time:
Poisson rate regression
Download