Module 7: Multilevel Models for Binary Responses Concepts

advertisement
Module 7 (Concepts): Multilevel Models for Binary Responses
Introduction
Module 7: Multilevel Models for Binary
Responses
Module 7 (Concepts): Multilevel Models for Binary Responses
Introduction
C7.6
C7.7
Modul es 1-3, 5, 6
Contents
Introduction .............................................................................................. 2
Introduction to the Example Dataset ............................................................... 3
C7.1
Two-level Random Intercept Model for Binary Responses............................. 4
C7.1.1
C7.1.2
C7.1.3
Generalised linear random intercept model .......................................... 4
Random intercept logit model .......................................................... 5
Example: Between-state variation in voting intentions in the US ................ 7
C7.2 Latent Variable Representation of a Random Intercept Model for Binary
Responses............................................................................................... 11
C7.2.1
C7.2.2
C7.2.3
C7.2.4
C7.3
Introduction
Two-level random intercept threshold model ....................................... 12
Comparison of a single-level and multilevel threshold model .................... 12
Impact of adding a level 1 explanatory variable to a two-level model ......... 15
Variance partition coefficient in terms of y* ........................................ 16
m ial
A parti cul ar issue i n the anal ysiso f pr oporti ons is the pr esenceo f extrabi no
vari ati on, caus ed by a vi ol ati ono f the assu
m pti on that the bi nary respons es on
whi ch a proporti on isb as ed ar ei ndependent. I t was s uggestedi n Modul e6 that
one way to all ow for cl usteri ng (non-i ndependence) due to om itted gr oup-l evel
pr edi ctors is to fit a m ultil evel m odel wi t hg r oup-l evel rando
m effects. W e
p urs ue
thi s appr oach her e, but our focus is on showi ng ho w m ultil evel m odelsc a n be
appli ed m o
r e gener ally to t wo-l evel bi nary respons e dat a wi t h predi ctors that can
be defi ned at bot h l evel 1 and l evel 2.
So
m eexam ples of res earch questi ons that can bee x pl or ed through mul til evel
m odels for bi nary respons es are:
x
W hat is the ext ent o f bet ween-state vari ati on in US voting pr ef er ences
( Republi can vs. Dem ocrat)? C a n between-stated i f f er ences i n voting be
expl ai ned by di fferences i n the et hni c or reli gi ous com positi on of states? D o
i ndi vi dual-level vari abl es such as age and gender have di fferent eff ects i n
di fferent stat es?
x
Does the use of dental heal th servi ces (e.g. whet her a person visited a denti st
i n the l ast year) vary acr oss ar eas? To what e xt ent ar ea ny di fferences bet ween
ar eas attribut abl e tob et ween-ar ea di fferences i n the pr ovi si on of subsidi sed
m ic co
m positi on of
servi ces or di ff erences i n the dem ographi c and soci o-econo
resi dents?
Population-Averaged and Cluster-Specific Effects .................................... 17
C7.3.1
C7.3.2
C7.3.3
Marginal model for clustered binary data ............................................ 18
Interpretation of coefficients from random effects and marginal models ..... 19
Example: Comparison of marginal and random intercept models fitted to the
US election data.......................................................................... 23
C7.4
Predicted Probabilities from a Multilevel Model ...................................... 24
C7.5
A Two-level Random Slope Model ........................................................ 27
C7.5.1
C7.5.2
C7.5.3
A random slope logit model ............................................................ 27
Example: Allowing the relationship between income and voting intentions in
the US to vary across states ............................................................ 28
Two random coefficients: Allowing income and urban-rural differentials in
voting intentions to vary across states ............................................... 32
1
With many thanks to Rebecca Pillinger, George Leckie, Kelvyn Jones and Harvey Goldstein for
comments on earlier drafts.
Centre for Multilevel Modelling, 2009
Comparison of estimation procedures ................................................ 39
Some practical guidelines on the choice between estimation procedures ..... 40
I n Modul e 6 we saw ho w m ultipl e regr essi on m odelsf or conti nuous res pons es can
be gener alised to handl e bi nary respons es. At the end of them odul e ( C6. 8), we
then consider ed m odel s for grouped or clustered bi nary dat a wher e the respons e
vari abl e isa pr oporti on and the expl anatory vari abl es ar e defi ned at the gr oup
l evel. T h e appli cation of these m odels was ill ustrat ed i n an anal ysiso f the
pr oporti on of vot ers in each state i ntendi ng to vot e for George Bus h, i ncl udi ng as
pr edi ctors the pr oporti on of non- whi te respondents i n a state and the proporti on
who reported regul ar attendance at reli gi ous servi ces.
Pre-requisites box
x
A random intercept model with a level 2 explanatory variable .................. 35
Cross-level interactions ................................................................. 36
Estimation of Binary Response Models .................................................. 39
C7.7.1
C7.7.2
Concepts
Fiona Steele1
Centre for Mul til evel Modelli ng
Adding Level 2 Explanatory Variables: Contextual Effects .......................... 35
C7.6.1
C7.6.2
1
I n bot h of the above exa
m ples, the study popul ati ons have a t wo-l evel hi erarchi cal
structur e wi t h i ndi vidual s at l evel 1 and ar eas at l evel 2, but struct ur es can have
m ore than t wo l evel s and m ay be non- hi erarchi cal (see Modul e 4). I n thi s m odul e,
Centre for Multilevel Modelling, 2009
2
Module 7 (Concepts): Multilevel Models for Binary Responses
Introduction
Module 7 (Concepts): Multilevel Models for Binary Responses
Introduction
as i n Modul e 5 for conti nuous res pons es, we consi der onl y model s for two-l evel
hi erarchi cal structur es.
x
Type of regi on of residence (0 = rural, 1 = urban)
x
Mari tal status (1 = currentl y m arri ed or cohabi ti ng, 2 = wi do wed or di vorced,
3 = not currentl y li vi ng wi th a partner and neverm arried)
x
Fr equency of attendance at religi ous services (0 = l ess than weekl y or never,
1 = weekl y orm ore)
The aim of thi s m odul e is to bri ng together m ultilevel m odels for conti nuous
respons es (Modul e 5) and si ngl e-l evel m odels for bi nary responses ( Modul e 6). W e
shall see that m any of the extensi ons to the basi c m ultil evel m odel i ntroduced i n
Modul e 5 – for exa
m ple rando
m sl opes and cont extual eff ects – appl y al so to bi nary
respons es. H o wever, ther e are so
m eim portant new issues to consi der i n the
i nter pr etati on and estim ati on of m ultil evel bi nary respons em odels.
and one state-levele xpl anatoryv ari abl e, cal cul ated by aggr egati ng an i ndi vi duall evel vari abl e gi vi ng the frequency of attendance at reli gi ous servi ces:
Introduction to the Example Dataset
x
W ewi ll illustrate m e
t hods for anal ysi ng bi nary respons es usi ng dat a fro
m the 2004
Nati onal Annenberg El ecti on Study ( NAES04), a US survey desi gned to track the
dyna
m ics of publi co pi ni ono v er the 2004 presi denti al c a
m
pai gn.
S ee
http: //www. annenbergpubli cpol i cycent er. org for further details of the NAES.
C7.1 Two-level Random Intercept Model for Binary
Responses
I n thi s m o
dul e (as i nM o dul e 6) w e anal yse data from the Nati onal Rolli ng Cr ossSecti on of N A ES04. T h e respons e vari abl e for our anal ysis is bas ed on voti ng
i ntenti ons i n the 2004 gener al el ecti on (vari abl e cRC03), whi ch was asked of
respondents i ntervi ewed bet ween 7 October 2003a n d 27 January 2004. T he
questi on was wor ded as foll ows:
x
Thinking about the general election for president in November 2004, if that
election were held today, would you vote for George W. Bush or the
Democratic candidate?
C7.1.1
Generalised linear random intercept model
Consi der a t wo-l evel structur e wher e a total of ni ndi vi dual s (at level 1) ar en est ed
wi thi n Jgr oups (at l evel 2) wi th nji ndi vi dual s i n group j. Thr oughout thi s m odul e
we us e ‘group’ as a gener al term for any level 2 uni t, e. g. an area or a school. W e
denot e by yijt he respons e for i ndi vi dual ii n gr oup j, and by xij an i ndi vi dual -l evel
expl anat ory vari abl e. R e c all fro
m C5. 2, equati on (5. 4), the rando
m i ntercept
m odel for conti nuous y:
y ij
The response opti ons wer e: Bus h, De
m ocrat, Ot her, Woul d not vot e, or Depends.
A sm all num ber of res pondents r eported that they di d not kno w or ref used to
ans wer theq u esti on. Don’t knows and refusal s wer ee x cl uded fro
m the anal ysi s,
and the rem aini ng categori es wer e co
m bined to obtai n a bi nary vari abl e coded 1
for Bush and 0 other wise.
Pr oporti on of respondents who attend reli gious servi ces at l east once a week
E 0 E 1x ij u j eij
(7.1)
wher e theg r oup effects or l evel 2 resi dual s uja n d the l evel 1 resi dual s eija r e
assu
m ed to be i ndependent and to foll ow norm al distri buti ons wi th zerom e
ans:
u j ~ N(0,V u2 ) and eij ~ N(0,V e2 ) .
I n Modul e 6 we anal ysed data fro
m threes t at es. W eno w extend the anal ysis
sa
m ple toi ncl ude all 49 states i n thes t udy, cont ai ni ng a total of 1 4, 169
respondents.
W ecan al so expr ess the m odel in term sof the m ean or expected valueof yijf or an
:
i ndi vi dual in group j and wi th value xij on x
W econsi der si x individual-level expl anat ory vari abl es:
E(y ij | x ij , u j )
x
Annual househol d i ncom e, grouped i nto ni ne cat egories (1 = l esst han $10k,
2 = $10- 15K, 3 = $15-25K, 4 = $25- 35K, 5 = $35- 50K, 6 = $50- 75K, 7 = $75100K, 8 =$ 1 00- 150K, 9 = $150k or m o
r e). T hi s v ari abl e is treat ed as
conti nuous i n all anal yses and is centred around its sam plem ean of 5.23
x
Sex (0 =m al e, 1 = fem ale)
x
Age i n years (m ean centred)
Centre for Multilevel Modelling, 2009
E 0 E1x ij u j .
(7.2)
For a bi nary response yij, we have E( yij| xij, uj) = ij= Pr( yij= 1) and a generalised
linear random intercept modelf or the dependency of t he respons e pr obability ij
on xij i s written:
F 1(S ij )
E 0 E 1x ij u j
(7. 3)
wher e F-1 (“ Fi nverse”) is the link functi on, taken to be the inverse cum ulati ve
di stri buti on functi ono f a known di stri buti on (seeC 6. 3. 1). I n Modul e 6, we
consi der edt hr ee li nk functi ons: the l ogit, probi t and co
m ple
m entary l og-l og( cl og-
3
Centre for Multilevel Modelling, 2009
4
Module 7 (Concepts): Multilevel Models for Binary Responses
C7.1 Two-level Random Intercept Model for Binary Responses
Module 7 (Concepts): Multilevel Models for Binary Responses
C7.1 Two-level Random Intercept Model for Binary Responses
l og) functions. Her ew e wi ll focus on thel ogi t li nk, wi t h so
m e di scussi on of the
pr obi t, but everythi ng we sayf or the logi t appli es equall y to the ot her li nk
functi ons.
than the over all i ntercept dependi ng on wh et her uji s gr eat er or l ess than zer o. As
m )eff ect, gr oup
i n the conti nuous respons e case, we ref er to ujas the group (rando
resi dual, or l evel 2 resi dual. The vari ance of the int ercepts acr oss gr oups is
,
var(u j ) V u2 , whi ch is ref erred toa s the bet ween- group vari ance adj ustedf or x
The key poi nt to not ea bout (7. 3) is that, al though thel eft hand si de is a nonli near
transf orm tai on of ij, t he ri ght hand si de takes the sa
m eform as that of (7. 2) for
conti nuous y, i.e. it isl i near i n term sof the par a
m eters E 0 a nd E 1 a nd the l evel 2
resi dual s uj. Ther ef ore thi s sim pl e random i ntercept m odel for bi nary yc a n be
ext ended in the sa
m e ways that we consi der ed i nM o dul e 5f or continuous y,
i ncl udi ng the addi ti on of further expl anatory vari abl es defi ned at l evel 1 or 2,
cr oss-l evel i nteracti ons, and rando
m slopes (coeffi ci ents).
C7.1.2
Random intercept logit model
I n a l ogi tm odel F-1( ij) is the l og-odds that y= 1 (see C6. 3. 2), so (7. 3) beco
m es
§ S ij
log¨
¨1 S
ij
©
·
¸
¸
¹
E 0 E 1x ij u j
l evel 2 variance.)
W ecan obtai n estim a
t es of ujt hat can be pl otted wi th confi dence i nterval s to see
whi ch gr oups ar e si gnifi cantl y bel ow or above the aver age of zer o (a cater pill ar
pl ot). These estim ates ar e i nterpreted i n the sa
m eway as for conti nuous r es pons e
m odels (see C5. 1. 2 and C5. 2. 2); the onl y di fference is that i n a l ogi t m o
del they
repr esent gr oup eff ects on the log- odds scal e.
I n anal ysi ng m ultil evel data, we ar e often i nterested i n the am ount of vari ati on
that can be attri buted to the di fferent l evel s i n the dat a structur e and the ext ent
to whi ch vari ati on at a gi ven l evel can bee x pl ai nedb y expl anat ory vari abl es. I n
Modul e 5 (C5. 1. 1) we m et the variance partition coefficientw h i ch m easures the
pr oporti on of the total vari ance that is duet o di fferences bet ween gr oups. Ther e
is no uni que way of d efi ni ng a VPC for bi nary data, but we s hall consi der one
appr oach in C7. 2. 4. ( The pr obl m
e is anal ogous to the di ffi culty i n defi ning R 2f or
bi nary data - see C6. 4.)
(7.4)
wher e u j ~ N(0,V u2 ) .
Interpretation of E 0 and E 1
E 0 i s i nter pr eted as the log-oddst hat y= 1 when x= 0 and u= 0 andi s ref erredt o
as the overall intercepti n the linear rel ations hi p between the log- odds and x
. If
we take the exponenti al of E 0 , exp( E 0 ), we obtai n the oddst hat y =1 for x= 0
and u= 0.
As i n the si ngl e-l evel m odel, E 1 i s t he eff ect o f a 1- uni t change in xo n thel ogodds that y= 1, but i t is no wt he eff ect of xa f t er adj usti ng for (or hol di ng
constant) the gr oup effect u
. I f we ar e hol di ng uc onstant, then we ar e l ooki ng at
the eff ect of xf or i ndi vi dual s wi thi n the sam egr oup so E 1 i s us ually ref erred to as
a cluster-specific effect. I n C7.3 we wi ll co
m pare thi s cl uster-specifi c effect wi t h
the eff ect o f xa v er agi ng acr oss groups (the population-average effect) . T h es e
eff ects ar ee qual for a m ultil evel conti nuous response m odel, so that i n Modul e 5
we m ade no di sti ncti on bet ween the
m ,but they wi ll not be equal for a gener alised
li nearm ultil evel m odel (unl ess V u2 0 ).
As i n a singl e-l evel l ogit m odel, exp( E 1 ) can be i nterpr et ed as an odds rati o,
co
m pari ng the odds that y= 1 fort wo i ndi vidual s (i n the sa
m egroup) wi t h x
- val ues
spaced 1 uni t apart.
Interpretation of uj
W hile E 0 i s the over all i ntercept in the li near rel ati onshi p bet ween the l og-odds
and x
, thei nt ercept for a gi veng r oup ji s E 0 u j w hi ch wi ll be hi gher or l ower
Centre for Multilevel Modelling, 2009
the bet ween- group resi dual vari ance, or s m
i
ply the l evel 2 resi dual vari ance.
( Qui te often ‘resi dual ’ is o
m itted and we say ‘l evel 2 vari ance’, but re
m e
mber that
if the m odel cont ai ns e x pl anat ory vari abl es then V u2 i s a l ways the unexplained
5
Predicted response probabilities
As i n the singl e-l evel case, we can re- organi se (7.4) to obtai n ane x pr essi onf or the
respons e probability:
S ij
exp( E 0 E 1x ij u j )
1 exp( E 0 E 1x ij u j )
.
(7.5)
(See equati on (6.10) i n C6. 3. 2f or the si ngl e-l evel v er si on, i.e. wi t hout gr oup
eff ects.)
W ecan cal cul ate the pr edi cted res pons e probability for i ndi vi dual ii n gr oup jb y
substituti ng the estim ates of E 0 , E 1 a n d ujo bt ai ned fro
m the fitted m odel a s
foll ows:
Sˆij
exp( Eˆ0 Eˆ1x ij uˆ j )
.
1 exp( Eˆ0 Eˆ1x ij uˆ j )
W e can al so m ake predi cti ons for ‘i deal’o r ‘typi cal’ i ndi vi dual s wi t h specifi c
co
m binati ons of -x val ues, but we al so need to m akea deci si ono n what val ue to
substitute for uj. W ewi ll di scuss predi cted pr obabilities i n C7. 4.
Centre for Multilevel Modelling, 2009
6
Module 7 (Concepts): Multilevel Models for Binary Responses
C7.1 Two-level Random Intercept Model for Binary Responses
Module 7 (Concepts): Multilevel Models for Binary Responses
C7.1 Two-level Random Intercept Model for Binary Responses
C7.1.3
W etherefor e concl ude that ther e is si gnifi cant variati on bet ween states i n the
pr oporti on who i ntend to vote for Bush.
Example: Between-state variation in voting intentions in the
US
W eill ustrate the appli cati on and i nter pret ati on of t h e rando
m i ntercept l ogi t
m odel (7.4) i n an anal ysis of voti ng i ntentions i n the 2004 US gener al el ecti on. A
t wo-l evel m odel is us ed to al l ow for correl ati on bet ween voti ng i ntenti ons of
i ndi vi dual sl i vi ng i n the sa
m estat e, and to expl ore the ext ent o f bet ween-state
vari ati on i n voti ng i ntenti ons.
Null model (without explanatory variables)
Tabl e 7. 1 sho ws the res ul ts fro
m f i tti ng a mul til evel l ogi t m odel for the pr obability
of voti ng for Bus h wi th state rando
m eff ects but noe x pl anat ory vari abl es. Thi s
‘null’ m odel is so
m etim es ref erred to as a vari ance co
m ponentsm odel. The odds
of voti ng Bus h for an ‘average’ state ( wi th uj= 0) ar ee s tim ateda s exp(- 0.107) =
0. 90, and the correspondi ng probability is 0. 9/( 1+0. 9) = 0.47.
Table 7.1. Multilevel logit model for voting Bush, with state effects, US 2004
Parameter
E 0 (Constant)
V u2 (Between-state variance)
Estimate
-0.107
Standard error
0.049
0.091
0.023
The bet ween-state vari ance i n the l og- odds of voti ng Bus h is estim ated as 0. 091
wi th a standar d error of 0. 023. T h er e are vari ous ways that we m ight test the
si gnifi cance of the bet ween-state vari ance, and thea p pr oaches avail abl e to us
depend ont h e al gorith
m us edt o fit the m odel. W e di scuss al gorithm s and
soft war e i nC 7. 7 and in the Techni cal Appendi x. I deall y we woul d us e a li keli hood
rati o test (as i n the conti nuous r es pons e case), but t hi s opti on is onl y avail abl e
when m axim u
m li kelihood estim ati on is used. Because the estim ates i n Tabl e 7. 1
wer e obt ai ned usi ng a quasi-li keli hood procedur e2, we wi ll use a W ald test. The
W ald test was described i n C6.5. 5 for testi ng coeffici ents i n as i ngl e-l evel m odel,
but it can be us ed tot est hypothes es about any m odel par a
m eter. W hen us ed to
test a hypot hesi s about a vari ance para
m et er (e.g. the bet ween-state vari ance),
the test isc r ude because it depends ont he questi onabl e assu
m pti on that the
o
e
vari ance estim ate isn orm ally di stri buted.3 N e v erthel ess, it will gi ve us s m
i ndi cati on of the strengt h of thee vi dence for state effects. The W ald test statisti c
is the square of the Z-rati o, i.e.( 0. 091/0.023) 2= 15.65 whi ch isc o
m paredwi t h a
chi -squar ed di stri bution on 1 degr ee of freedo
m ,gi ving a p- val ue l ess than0 . 001.
2
Second order penalized quasi-likelihood (PQL2) – see C7.7 and the Technical Appendix for details.
We are referring here to the sampling distribution of the estimated variance. Imagine taking
repeated samples of respondents within states, and fitting a multilevel logit model to each sample.
You will get a different estimate of the between-state variance each time. The distribution of this
variance estimate across samples is the sampling distribution which, in a Wald test, is assumed
normal. The sampling distribution of a variance estimate is in fact positively skewed (the right tail
of the distribution is longer) because variances must be greater than zero. The Wald test performs
particularly poorly when the level 2 variance estimate is close to its boundary of zero.
3
Centre for Multilevel Modelling, 2009
7
Anot her issue to consi der when testi ng vari ance par am eters is that vari ances ar e
by defi nition non- negati ve. The n ul l hypothesi s is that V u2 0 , but the al ter nati ve
hypot hesi si s one-si ded ( V u2 ! 0 ) rather than t wo-si ded ( V u2 z 0 ). One suggested
appr oach to the pr obl m
e is to hal ve the p- val ue obtai ned from co
m paring the
li keli hood rati o statisti c wi t h a chi-squar ed di stri bution ( s ee Sni j ders and Bosker
(1999, Secti on 6. 2) for a di scussi on). N ot e that the above appli es tot ests of
vari ance para
m eters i n m ultil evel m odels for any type of outco
m evari abl e, not
just bi nary y.
V u2 i s the between-statev ari ance in the l og- odds of voting Bus h, but it is di ffi cul t
to assess the si ze of the state effects when usi ng thel og- odds scal e. I nstead we
can cal cul ate pr edi cted pr obabiliti es of voting Bus h, usi ng (7.5) with no x
- vari abl e,
assu
m ing different val ues for the state effect u.j W e
h ave al ready cal cul at ed the
pr edi cted pr obabilityf or an ‘aver age’ state wi t h uj= 0. Under the assu
m pti on that
ujf oll ow an orm al di stri buti on, w e woul de x pect appr oxim ately 95% of states to
have a val ue of ujwi thi n 2 standar d devi ati ons of the m ean of zer o, i.e. bet ween
appr oxim atel y 2Vˆu 2 0.091 0.603 a n d +0. 603. T h is type of i nt erval is
so
m etim esc al l ed a coverage interval. Substituti ng i n( 7. 5) these val ues for uja nd
our estim a
t e for E 0 from Tabl e 7.1 we obtain the foll owi ng predi cti ons.
For a state 2 standar d devi ati ons bel ow the m ean:
exp(0.107 0.603)
Sˆ
0.33
1 exp(0.107 0.603)
For a state 2 standar d devi ati ons above the m ean:
exp(0.107 0.603)
Sˆ
0.62
1 exp(0.107 0.603)
W ewoul d ther ef or e expect thep r oporti on voti ng Bus h to li e bet ween 0.33 and
0. 62 i n them iddl e 95% of states.
m t he null model .
W eno w exa
m ine estim ates of the state effects, û j , obtai ned fro
Fi gur e 7. 1 is a ‘cater pi ll ar pl ot’ with the state eff ects shown i n rank or der toget her
wi th 95% confi dence i nterval s. Thi s pl ot is i nter pr eted i n the sa
m eway as for a
conti nuous r es pons e m odel (seeC 5. 1. 2), but the l evel 2 resi dual s ar e now state
eff ects on the l og- odds scal e. As bef or e, a state whos e confi dence i nterval does
not overl ap the li ne at zer o (repr esenti ng the m ean log- odds of voti ng Bush acr oss
all states) is sai d to di ffer si gnifi cantl y from the average at the5 % l evel. I n thi s
case, m any of the confi dence i nterval s incl ude zero and there ar e no obvi ous
outli ers wi th es peci ally l arge û j . T h e threes t at es wi th the l owest pr obability of
voti ng Bush (l argest negati ve val ues of û j ) ar e W ashi ngt on DC, Rhode Isl and and
Massachus etts, whil et h e three wi t h the hi ghest response pr obability( l ar gest
Centre for Multilevel Modelling, 2009
8
This document is only the first few pages of
the full version.
To see the complete document please go to
learning materials and register:
http://www.cmm.bris.ac.uk/lemma
The course is completely free. We ask for a
few details about yourself for our research
purposes only. We will not give any details to
any other organisation unless it is with your
express permission.
Download