Module 7 (Concepts): Multilevel Models for Binary Responses Introduction Module 7: Multilevel Models for Binary Responses Module 7 (Concepts): Multilevel Models for Binary Responses Introduction C7.6 C7.7 Modul es 1-3, 5, 6 Contents Introduction .............................................................................................. 2 Introduction to the Example Dataset ............................................................... 3 C7.1 Two-level Random Intercept Model for Binary Responses............................. 4 C7.1.1 C7.1.2 C7.1.3 Generalised linear random intercept model .......................................... 4 Random intercept logit model .......................................................... 5 Example: Between-state variation in voting intentions in the US ................ 7 C7.2 Latent Variable Representation of a Random Intercept Model for Binary Responses............................................................................................... 11 C7.2.1 C7.2.2 C7.2.3 C7.2.4 C7.3 Introduction Two-level random intercept threshold model ....................................... 12 Comparison of a single-level and multilevel threshold model .................... 12 Impact of adding a level 1 explanatory variable to a two-level model ......... 15 Variance partition coefficient in terms of y* ........................................ 16 m ial A parti cul ar issue i n the anal ysiso f pr oporti ons is the pr esenceo f extrabi no vari ati on, caus ed by a vi ol ati ono f the assu m pti on that the bi nary respons es on whi ch a proporti on isb as ed ar ei ndependent. I t was s uggestedi n Modul e6 that one way to all ow for cl usteri ng (non-i ndependence) due to om itted gr oup-l evel pr edi ctors is to fit a m ultil evel m odel wi t hg r oup-l evel rando m effects. W e p urs ue thi s appr oach her e, but our focus is on showi ng ho w m ultil evel m odelsc a n be appli ed m o r e gener ally to t wo-l evel bi nary respons e dat a wi t h predi ctors that can be defi ned at bot h l evel 1 and l evel 2. So m eexam ples of res earch questi ons that can bee x pl or ed through mul til evel m odels for bi nary respons es are: x W hat is the ext ent o f bet ween-state vari ati on in US voting pr ef er ences ( Republi can vs. Dem ocrat)? C a n between-stated i f f er ences i n voting be expl ai ned by di fferences i n the et hni c or reli gi ous com positi on of states? D o i ndi vi dual-level vari abl es such as age and gender have di fferent eff ects i n di fferent stat es? x Does the use of dental heal th servi ces (e.g. whet her a person visited a denti st i n the l ast year) vary acr oss ar eas? To what e xt ent ar ea ny di fferences bet ween ar eas attribut abl e tob et ween-ar ea di fferences i n the pr ovi si on of subsidi sed m ic co m positi on of servi ces or di ff erences i n the dem ographi c and soci o-econo resi dents? Population-Averaged and Cluster-Specific Effects .................................... 17 C7.3.1 C7.3.2 C7.3.3 Marginal model for clustered binary data ............................................ 18 Interpretation of coefficients from random effects and marginal models ..... 19 Example: Comparison of marginal and random intercept models fitted to the US election data.......................................................................... 23 C7.4 Predicted Probabilities from a Multilevel Model ...................................... 24 C7.5 A Two-level Random Slope Model ........................................................ 27 C7.5.1 C7.5.2 C7.5.3 A random slope logit model ............................................................ 27 Example: Allowing the relationship between income and voting intentions in the US to vary across states ............................................................ 28 Two random coefficients: Allowing income and urban-rural differentials in voting intentions to vary across states ............................................... 32 1 With many thanks to Rebecca Pillinger, George Leckie, Kelvyn Jones and Harvey Goldstein for comments on earlier drafts. Centre for Multilevel Modelling, 2009 Comparison of estimation procedures ................................................ 39 Some practical guidelines on the choice between estimation procedures ..... 40 I n Modul e 6 we saw ho w m ultipl e regr essi on m odelsf or conti nuous res pons es can be gener alised to handl e bi nary respons es. At the end of them odul e ( C6. 8), we then consider ed m odel s for grouped or clustered bi nary dat a wher e the respons e vari abl e isa pr oporti on and the expl anatory vari abl es ar e defi ned at the gr oup l evel. T h e appli cation of these m odels was ill ustrat ed i n an anal ysiso f the pr oporti on of vot ers in each state i ntendi ng to vot e for George Bus h, i ncl udi ng as pr edi ctors the pr oporti on of non- whi te respondents i n a state and the proporti on who reported regul ar attendance at reli gi ous servi ces. Pre-requisites box x A random intercept model with a level 2 explanatory variable .................. 35 Cross-level interactions ................................................................. 36 Estimation of Binary Response Models .................................................. 39 C7.7.1 C7.7.2 Concepts Fiona Steele1 Centre for Mul til evel Modelli ng Adding Level 2 Explanatory Variables: Contextual Effects .......................... 35 C7.6.1 C7.6.2 1 I n bot h of the above exa m ples, the study popul ati ons have a t wo-l evel hi erarchi cal structur e wi t h i ndi vidual s at l evel 1 and ar eas at l evel 2, but struct ur es can have m ore than t wo l evel s and m ay be non- hi erarchi cal (see Modul e 4). I n thi s m odul e, Centre for Multilevel Modelling, 2009 2 Module 7 (Concepts): Multilevel Models for Binary Responses Introduction Module 7 (Concepts): Multilevel Models for Binary Responses Introduction as i n Modul e 5 for conti nuous res pons es, we consi der onl y model s for two-l evel hi erarchi cal structur es. x Type of regi on of residence (0 = rural, 1 = urban) x Mari tal status (1 = currentl y m arri ed or cohabi ti ng, 2 = wi do wed or di vorced, 3 = not currentl y li vi ng wi th a partner and neverm arried) x Fr equency of attendance at religi ous services (0 = l ess than weekl y or never, 1 = weekl y orm ore) The aim of thi s m odul e is to bri ng together m ultilevel m odels for conti nuous respons es (Modul e 5) and si ngl e-l evel m odels for bi nary responses ( Modul e 6). W e shall see that m any of the extensi ons to the basi c m ultil evel m odel i ntroduced i n Modul e 5 – for exa m ple rando m sl opes and cont extual eff ects – appl y al so to bi nary respons es. H o wever, ther e are so m eim portant new issues to consi der i n the i nter pr etati on and estim ati on of m ultil evel bi nary respons em odels. and one state-levele xpl anatoryv ari abl e, cal cul ated by aggr egati ng an i ndi vi duall evel vari abl e gi vi ng the frequency of attendance at reli gi ous servi ces: Introduction to the Example Dataset x W ewi ll illustrate m e t hods for anal ysi ng bi nary respons es usi ng dat a fro m the 2004 Nati onal Annenberg El ecti on Study ( NAES04), a US survey desi gned to track the dyna m ics of publi co pi ni ono v er the 2004 presi denti al c a m pai gn. S ee http: //www. annenbergpubli cpol i cycent er. org for further details of the NAES. C7.1 Two-level Random Intercept Model for Binary Responses I n thi s m o dul e (as i nM o dul e 6) w e anal yse data from the Nati onal Rolli ng Cr ossSecti on of N A ES04. T h e respons e vari abl e for our anal ysis is bas ed on voti ng i ntenti ons i n the 2004 gener al el ecti on (vari abl e cRC03), whi ch was asked of respondents i ntervi ewed bet ween 7 October 2003a n d 27 January 2004. T he questi on was wor ded as foll ows: x Thinking about the general election for president in November 2004, if that election were held today, would you vote for George W. Bush or the Democratic candidate? C7.1.1 Generalised linear random intercept model Consi der a t wo-l evel structur e wher e a total of ni ndi vi dual s (at level 1) ar en est ed wi thi n Jgr oups (at l evel 2) wi th nji ndi vi dual s i n group j. Thr oughout thi s m odul e we us e ‘group’ as a gener al term for any level 2 uni t, e. g. an area or a school. W e denot e by yijt he respons e for i ndi vi dual ii n gr oup j, and by xij an i ndi vi dual -l evel expl anat ory vari abl e. R e c all fro m C5. 2, equati on (5. 4), the rando m i ntercept m odel for conti nuous y: y ij The response opti ons wer e: Bus h, De m ocrat, Ot her, Woul d not vot e, or Depends. A sm all num ber of res pondents r eported that they di d not kno w or ref used to ans wer theq u esti on. Don’t knows and refusal s wer ee x cl uded fro m the anal ysi s, and the rem aini ng categori es wer e co m bined to obtai n a bi nary vari abl e coded 1 for Bush and 0 other wise. Pr oporti on of respondents who attend reli gious servi ces at l east once a week E 0 E 1x ij u j eij (7.1) wher e theg r oup effects or l evel 2 resi dual s uja n d the l evel 1 resi dual s eija r e assu m ed to be i ndependent and to foll ow norm al distri buti ons wi th zerom e ans: u j ~ N(0,V u2 ) and eij ~ N(0,V e2 ) . I n Modul e 6 we anal ysed data fro m threes t at es. W eno w extend the anal ysis sa m ple toi ncl ude all 49 states i n thes t udy, cont ai ni ng a total of 1 4, 169 respondents. W ecan al so expr ess the m odel in term sof the m ean or expected valueof yijf or an : i ndi vi dual in group j and wi th value xij on x W econsi der si x individual-level expl anat ory vari abl es: E(y ij | x ij , u j ) x Annual househol d i ncom e, grouped i nto ni ne cat egories (1 = l esst han $10k, 2 = $10- 15K, 3 = $15-25K, 4 = $25- 35K, 5 = $35- 50K, 6 = $50- 75K, 7 = $75100K, 8 =$ 1 00- 150K, 9 = $150k or m o r e). T hi s v ari abl e is treat ed as conti nuous i n all anal yses and is centred around its sam plem ean of 5.23 x Sex (0 =m al e, 1 = fem ale) x Age i n years (m ean centred) Centre for Multilevel Modelling, 2009 E 0 E1x ij u j . (7.2) For a bi nary response yij, we have E( yij| xij, uj) = ij= Pr( yij= 1) and a generalised linear random intercept modelf or the dependency of t he respons e pr obability ij on xij i s written: F 1(S ij ) E 0 E 1x ij u j (7. 3) wher e F-1 (“ Fi nverse”) is the link functi on, taken to be the inverse cum ulati ve di stri buti on functi ono f a known di stri buti on (seeC 6. 3. 1). I n Modul e 6, we consi der edt hr ee li nk functi ons: the l ogit, probi t and co m ple m entary l og-l og( cl og- 3 Centre for Multilevel Modelling, 2009 4 Module 7 (Concepts): Multilevel Models for Binary Responses C7.1 Two-level Random Intercept Model for Binary Responses Module 7 (Concepts): Multilevel Models for Binary Responses C7.1 Two-level Random Intercept Model for Binary Responses l og) functions. Her ew e wi ll focus on thel ogi t li nk, wi t h so m e di scussi on of the pr obi t, but everythi ng we sayf or the logi t appli es equall y to the ot her li nk functi ons. than the over all i ntercept dependi ng on wh et her uji s gr eat er or l ess than zer o. As m )eff ect, gr oup i n the conti nuous respons e case, we ref er to ujas the group (rando resi dual, or l evel 2 resi dual. The vari ance of the int ercepts acr oss gr oups is , var(u j ) V u2 , whi ch is ref erred toa s the bet ween- group vari ance adj ustedf or x The key poi nt to not ea bout (7. 3) is that, al though thel eft hand si de is a nonli near transf orm tai on of ij, t he ri ght hand si de takes the sa m eform as that of (7. 2) for conti nuous y, i.e. it isl i near i n term sof the par a m eters E 0 a nd E 1 a nd the l evel 2 resi dual s uj. Ther ef ore thi s sim pl e random i ntercept m odel for bi nary yc a n be ext ended in the sa m e ways that we consi der ed i nM o dul e 5f or continuous y, i ncl udi ng the addi ti on of further expl anatory vari abl es defi ned at l evel 1 or 2, cr oss-l evel i nteracti ons, and rando m slopes (coeffi ci ents). C7.1.2 Random intercept logit model I n a l ogi tm odel F-1( ij) is the l og-odds that y= 1 (see C6. 3. 2), so (7. 3) beco m es § S ij log¨ ¨1 S ij © · ¸ ¸ ¹ E 0 E 1x ij u j l evel 2 variance.) W ecan obtai n estim a t es of ujt hat can be pl otted wi th confi dence i nterval s to see whi ch gr oups ar e si gnifi cantl y bel ow or above the aver age of zer o (a cater pill ar pl ot). These estim ates ar e i nterpreted i n the sa m eway as for conti nuous r es pons e m odels (see C5. 1. 2 and C5. 2. 2); the onl y di fference is that i n a l ogi t m o del they repr esent gr oup eff ects on the log- odds scal e. I n anal ysi ng m ultil evel data, we ar e often i nterested i n the am ount of vari ati on that can be attri buted to the di fferent l evel s i n the dat a structur e and the ext ent to whi ch vari ati on at a gi ven l evel can bee x pl ai nedb y expl anat ory vari abl es. I n Modul e 5 (C5. 1. 1) we m et the variance partition coefficientw h i ch m easures the pr oporti on of the total vari ance that is duet o di fferences bet ween gr oups. Ther e is no uni que way of d efi ni ng a VPC for bi nary data, but we s hall consi der one appr oach in C7. 2. 4. ( The pr obl m e is anal ogous to the di ffi culty i n defi ning R 2f or bi nary data - see C6. 4.) (7.4) wher e u j ~ N(0,V u2 ) . Interpretation of E 0 and E 1 E 0 i s i nter pr eted as the log-oddst hat y= 1 when x= 0 and u= 0 andi s ref erredt o as the overall intercepti n the linear rel ations hi p between the log- odds and x . If we take the exponenti al of E 0 , exp( E 0 ), we obtai n the oddst hat y =1 for x= 0 and u= 0. As i n the si ngl e-l evel m odel, E 1 i s t he eff ect o f a 1- uni t change in xo n thel ogodds that y= 1, but i t is no wt he eff ect of xa f t er adj usti ng for (or hol di ng constant) the gr oup effect u . I f we ar e hol di ng uc onstant, then we ar e l ooki ng at the eff ect of xf or i ndi vi dual s wi thi n the sam egr oup so E 1 i s us ually ref erred to as a cluster-specific effect. I n C7.3 we wi ll co m pare thi s cl uster-specifi c effect wi t h the eff ect o f xa v er agi ng acr oss groups (the population-average effect) . T h es e eff ects ar ee qual for a m ultil evel conti nuous response m odel, so that i n Modul e 5 we m ade no di sti ncti on bet ween the m ,but they wi ll not be equal for a gener alised li nearm ultil evel m odel (unl ess V u2 0 ). As i n a singl e-l evel l ogit m odel, exp( E 1 ) can be i nterpr et ed as an odds rati o, co m pari ng the odds that y= 1 fort wo i ndi vidual s (i n the sa m egroup) wi t h x - val ues spaced 1 uni t apart. Interpretation of uj W hile E 0 i s the over all i ntercept in the li near rel ati onshi p bet ween the l og-odds and x , thei nt ercept for a gi veng r oup ji s E 0 u j w hi ch wi ll be hi gher or l ower Centre for Multilevel Modelling, 2009 the bet ween- group resi dual vari ance, or s m i ply the l evel 2 resi dual vari ance. ( Qui te often ‘resi dual ’ is o m itted and we say ‘l evel 2 vari ance’, but re m e mber that if the m odel cont ai ns e x pl anat ory vari abl es then V u2 i s a l ways the unexplained 5 Predicted response probabilities As i n the singl e-l evel case, we can re- organi se (7.4) to obtai n ane x pr essi onf or the respons e probability: S ij exp( E 0 E 1x ij u j ) 1 exp( E 0 E 1x ij u j ) . (7.5) (See equati on (6.10) i n C6. 3. 2f or the si ngl e-l evel v er si on, i.e. wi t hout gr oup eff ects.) W ecan cal cul ate the pr edi cted res pons e probability for i ndi vi dual ii n gr oup jb y substituti ng the estim ates of E 0 , E 1 a n d ujo bt ai ned fro m the fitted m odel a s foll ows: Sˆij exp( Eˆ0 Eˆ1x ij uˆ j ) . 1 exp( Eˆ0 Eˆ1x ij uˆ j ) W e can al so m ake predi cti ons for ‘i deal’o r ‘typi cal’ i ndi vi dual s wi t h specifi c co m binati ons of -x val ues, but we al so need to m akea deci si ono n what val ue to substitute for uj. W ewi ll di scuss predi cted pr obabilities i n C7. 4. Centre for Multilevel Modelling, 2009 6 Module 7 (Concepts): Multilevel Models for Binary Responses C7.1 Two-level Random Intercept Model for Binary Responses Module 7 (Concepts): Multilevel Models for Binary Responses C7.1 Two-level Random Intercept Model for Binary Responses C7.1.3 W etherefor e concl ude that ther e is si gnifi cant variati on bet ween states i n the pr oporti on who i ntend to vote for Bush. Example: Between-state variation in voting intentions in the US W eill ustrate the appli cati on and i nter pret ati on of t h e rando m i ntercept l ogi t m odel (7.4) i n an anal ysis of voti ng i ntentions i n the 2004 US gener al el ecti on. A t wo-l evel m odel is us ed to al l ow for correl ati on bet ween voti ng i ntenti ons of i ndi vi dual sl i vi ng i n the sa m estat e, and to expl ore the ext ent o f bet ween-state vari ati on i n voti ng i ntenti ons. Null model (without explanatory variables) Tabl e 7. 1 sho ws the res ul ts fro m f i tti ng a mul til evel l ogi t m odel for the pr obability of voti ng for Bus h wi th state rando m eff ects but noe x pl anat ory vari abl es. Thi s ‘null’ m odel is so m etim es ref erred to as a vari ance co m ponentsm odel. The odds of voti ng Bus h for an ‘average’ state ( wi th uj= 0) ar ee s tim ateda s exp(- 0.107) = 0. 90, and the correspondi ng probability is 0. 9/( 1+0. 9) = 0.47. Table 7.1. Multilevel logit model for voting Bush, with state effects, US 2004 Parameter E 0 (Constant) V u2 (Between-state variance) Estimate -0.107 Standard error 0.049 0.091 0.023 The bet ween-state vari ance i n the l og- odds of voti ng Bus h is estim ated as 0. 091 wi th a standar d error of 0. 023. T h er e are vari ous ways that we m ight test the si gnifi cance of the bet ween-state vari ance, and thea p pr oaches avail abl e to us depend ont h e al gorith m us edt o fit the m odel. W e di scuss al gorithm s and soft war e i nC 7. 7 and in the Techni cal Appendi x. I deall y we woul d us e a li keli hood rati o test (as i n the conti nuous r es pons e case), but t hi s opti on is onl y avail abl e when m axim u m li kelihood estim ati on is used. Because the estim ates i n Tabl e 7. 1 wer e obt ai ned usi ng a quasi-li keli hood procedur e2, we wi ll use a W ald test. The W ald test was described i n C6.5. 5 for testi ng coeffici ents i n as i ngl e-l evel m odel, but it can be us ed tot est hypothes es about any m odel par a m eter. W hen us ed to test a hypot hesi s about a vari ance para m et er (e.g. the bet ween-state vari ance), the test isc r ude because it depends ont he questi onabl e assu m pti on that the o e vari ance estim ate isn orm ally di stri buted.3 N e v erthel ess, it will gi ve us s m i ndi cati on of the strengt h of thee vi dence for state effects. The W ald test statisti c is the square of the Z-rati o, i.e.( 0. 091/0.023) 2= 15.65 whi ch isc o m paredwi t h a chi -squar ed di stri bution on 1 degr ee of freedo m ,gi ving a p- val ue l ess than0 . 001. 2 Second order penalized quasi-likelihood (PQL2) – see C7.7 and the Technical Appendix for details. We are referring here to the sampling distribution of the estimated variance. Imagine taking repeated samples of respondents within states, and fitting a multilevel logit model to each sample. You will get a different estimate of the between-state variance each time. The distribution of this variance estimate across samples is the sampling distribution which, in a Wald test, is assumed normal. The sampling distribution of a variance estimate is in fact positively skewed (the right tail of the distribution is longer) because variances must be greater than zero. The Wald test performs particularly poorly when the level 2 variance estimate is close to its boundary of zero. 3 Centre for Multilevel Modelling, 2009 7 Anot her issue to consi der when testi ng vari ance par am eters is that vari ances ar e by defi nition non- negati ve. The n ul l hypothesi s is that V u2 0 , but the al ter nati ve hypot hesi si s one-si ded ( V u2 ! 0 ) rather than t wo-si ded ( V u2 z 0 ). One suggested appr oach to the pr obl m e is to hal ve the p- val ue obtai ned from co m paring the li keli hood rati o statisti c wi t h a chi-squar ed di stri bution ( s ee Sni j ders and Bosker (1999, Secti on 6. 2) for a di scussi on). N ot e that the above appli es tot ests of vari ance para m eters i n m ultil evel m odels for any type of outco m evari abl e, not just bi nary y. V u2 i s the between-statev ari ance in the l og- odds of voting Bus h, but it is di ffi cul t to assess the si ze of the state effects when usi ng thel og- odds scal e. I nstead we can cal cul ate pr edi cted pr obabiliti es of voting Bus h, usi ng (7.5) with no x - vari abl e, assu m ing different val ues for the state effect u.j W e h ave al ready cal cul at ed the pr edi cted pr obabilityf or an ‘aver age’ state wi t h uj= 0. Under the assu m pti on that ujf oll ow an orm al di stri buti on, w e woul de x pect appr oxim ately 95% of states to have a val ue of ujwi thi n 2 standar d devi ati ons of the m ean of zer o, i.e. bet ween appr oxim atel y 2Vˆu 2 0.091 0.603 a n d +0. 603. T h is type of i nt erval is so m etim esc al l ed a coverage interval. Substituti ng i n( 7. 5) these val ues for uja nd our estim a t e for E 0 from Tabl e 7.1 we obtain the foll owi ng predi cti ons. For a state 2 standar d devi ati ons bel ow the m ean: exp(0.107 0.603) Sˆ 0.33 1 exp(0.107 0.603) For a state 2 standar d devi ati ons above the m ean: exp(0.107 0.603) Sˆ 0.62 1 exp(0.107 0.603) W ewoul d ther ef or e expect thep r oporti on voti ng Bus h to li e bet ween 0.33 and 0. 62 i n them iddl e 95% of states. m t he null model . W eno w exa m ine estim ates of the state effects, û j , obtai ned fro Fi gur e 7. 1 is a ‘cater pi ll ar pl ot’ with the state eff ects shown i n rank or der toget her wi th 95% confi dence i nterval s. Thi s pl ot is i nter pr eted i n the sa m eway as for a conti nuous r es pons e m odel (seeC 5. 1. 2), but the l evel 2 resi dual s ar e now state eff ects on the l og- odds scal e. As bef or e, a state whos e confi dence i nterval does not overl ap the li ne at zer o (repr esenti ng the m ean log- odds of voti ng Bush acr oss all states) is sai d to di ffer si gnifi cantl y from the average at the5 % l evel. I n thi s case, m any of the confi dence i nterval s incl ude zero and there ar e no obvi ous outli ers wi th es peci ally l arge û j . T h e threes t at es wi th the l owest pr obability of voti ng Bush (l argest negati ve val ues of û j ) ar e W ashi ngt on DC, Rhode Isl and and Massachus etts, whil et h e three wi t h the hi ghest response pr obability( l ar gest Centre for Multilevel Modelling, 2009 8 This document is only the first few pages of the full version. To see the complete document please go to learning materials and register: http://www.cmm.bris.ac.uk/lemma The course is completely free. We ask for a few details about yourself for our research purposes only. We will not give any details to any other organisation unless it is with your express permission.