Problems with infinite solutions in logistic regression Ian White MRC Biostatistics Unit, Cambridge UK Stata Users’ Group London, 12th September 2006 1 Introduction • I teach logistic regression for the analysis of case-control studies to Epidemiology Master’s students, using Stata • I stress how to work out degrees of freedom – e.g. if E has 2 levels and M has 4 levels then you get 3 d.f. for testing the E*M interaction • Our practical uses data on 244 cases of leprosy and 1027 controls – previous BCG vaccination is the exposure of interest – level of schooling is a possible effect modifier – in what follows I’m ignoring other confounders 2 Leprosy data -> tabulation of d outcome 0=control, | 1=case | Freq. Percent Cum. ------------+----------------------------------0 | 1,027 80.80 80.80 1 | 244 19.20 100.00 ------------+----------------------------------Total | 1,271 100.00 -> tabulation of bcg exposure BCG scar | Freq. Percent Cum. ------------+----------------------------------Absent | 743 58.46 58.46 Present | 528 41.54 100.00 ------------+----------------------------------Total | 1,271 100.00 -> tabulation of school possible effect modifier Schooling | Freq. Percent Cum. ------------+----------------------------------0 | 282 22.19 22.19 1 | 606 47.68 69.87 2 | 350 27.54 97.40 3 | 33 2.60 100.00 ------------+----------------------------------Total | 1,271 100.00 3 Main effects model . xi: logistic d i.bcg i.school i.bcg _Ibcg_0-1 i.school _Ischool_0-3 Logistic regression Log likelihood = -572.86093 (naturally coded; _Ibcg_0 omitted) (naturally coded; _Ischool_0 omitted) Number of obs LR chi2(4) Prob > chi2 Pseudo R2 = = = = 1271 97.50 0.0000 0.0784 -----------------------------------------------------------------------------d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_Ibcg_1 | .2908624 .0523636 -6.86 0.000 .204384 .4139314 _Ischool_1 | .7035071 .1197049 -2.07 0.039 .5040026 .9819836 _Ischool_2 | .4029998 .0888644 -4.12 0.000 .2615825 .6208704 _Ischool_3 | .09077 .0933769 -2.33 0.020 .0120863 .6816944 -----------------------------------------------------------------------------. estimates store main 4 Interaction model . xi: logistic d i.bcg*i.school i.bcg _Ibcg_0-1 i.school _Ischool_0-3 i.bcg*i.school _IbcgXsch_#_# Logistic regression Log likelihood = -570.90012 (naturally coded; _Ibcg_0 omitted) (naturally coded; _Ischool_0 omitted) (coded as above) Number of obs LR chi2(7) Prob > chi2 Pseudo R2 = = = = 1271 101.43 0.0000 0.0816 -----------------------------------------------------------------------------d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_Ibcg_1 | .2248804 .0955358 -3.51 0.000 .0977993 .5170913 _Ischool_1 | .6626409 .1234771 -2.21 0.027 .4599012 .9547549 _Ischool_2 | .4116581 .1027612 -3.56 0.000 .2523791 .6714598 _Ischool_3 | 1.28e-08 1.42e-08 -16.41 0.000 1.46e-09 1.12e-07 _IbcgXsch_~1 | 1.448862 .7046411 0.76 0.446 .5585377 3.758385 _IbcgXsch_~2 | 1.086848 .6226504 0.15 0.884 .3536056 3.340553 _IbcgXsch_~3 | 4.25e+07 . . . . . -----------------------------------------------------------------------------Note: 17 failures and 0 successes completely determined. . estimates store inter 5 The problem . table bcg school, by(d) ---------------------------------0=control | , 1=case | and BCG | Schooling scar | 0 1 2 3 ----------+----------------------0 | Absent | 141 257 129 17 Present | 57 229 182 15 ----------+----------------------1 | Absent | 77 93 29 Present | 7 27 10 1 ---------------------------------6 LR test . xi: logistic d i.bcg i.school LR chi2(4) = 97.50 Log likelihood = -572.86093 . estimates store main . xi: logistic d i.bcg*i.school LR chi2(7) = 101.43 Log likelihood = -570.90012 . estimates store inter . lrtest main inter Likelihood-ratio test (Assumption: main nested in inter) LR chi2(2) = Prob > chi2 = 3.92 0.1407 7 What is Stata doing? (guess) • Recognises the information matrix is singular • Hence reduces model df by 1 • In other situations Stata drops observations – if a single variable perfectly predicts success/failure – this happens if the problematic cell doesn’t occur in a reference category – then Stata refuses to perform lrtest, but we can force it to do so – Stata still gets df=2; can use df(3) option 8 . gen bcgrev=1-bcg . xi: logistic d i.bcgrev*i.school i.bcgrev _Ibcgrev_0-1 i.school _Ischool_0-3 i.bcg~v*i.sch~l _IbcgXsch_#_# (naturally coded; _Ibcgrev_0 omitted) (naturally coded; _Ischool_0 omitted) (coded as above) note: _IbcgXsch_1_3 != 0 predicts failure perfectly _IbcgXsch_1_3 dropped and 17 obs not used Logistic regression Log likelihood = -570.90012 Number of obs LR chi2(6) Prob > chi2 Pseudo R2 = = = = 1254 94.12 0.0000 0.0762 -----------------------------------------------------------------------------d | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_Ibcgrev_1 | 4.446809 1.889136 3.51 0.000 1.933895 10.22502 _Ischool_1 | .9600749 .4312915 -0.09 0.928 .3980361 2.315729 _Ischool_2 | .4474097 .2307071 -1.56 0.119 .1628482 1.229215 _Ischool_3 | .5428571 .6013396 -0.55 0.581 .0619132 4.75979 _IbcgXsch_~1 | .6901971 .3356713 -0.76 0.446 .2660717 1.79039 _IbcgXsch_~2 | .920092 .5271167 -0.15 0.884 .2993516 2.82801 -----------------------------------------------------------------------------. est store interrev . lrtest interrev main observations differ: 1254 vs. 1271 r(498); . lrtest interrev main, force Likelihood-ratio test (Assumption: main nested in interrev) LR chi2(2) = Prob > chi2 = 3.92 0.1407 9 What’s right? • Zero cell suggests small sample so asymptotic c2 distribution may be inappropriate for LRT – true in this case: have a bcg*school category with only 1 observation – but I’m going to demonstrate the same problem in hypothetical example with expected cell counts > 3 but a zero observed cell count • Could combine or drop cells to get rid of zeroes – but the cell with zeroes may carry information • Problems with testing boundary values are well known – e.g. LRT for testing zero variance component isn’t c21 – but here the point estimate, not the null value, is on the boundary 10 Example to explain why LRT makes some sense . tab x y, chi2 exact | y x | 0 1 | Total -----------+----------------------+---------0 | 10 20 | 30 1 | 0 10 | 10 -----------+----------------------+---------Total | 10 30 | 40 Pearson chi2(1) = Fisher's exact = 1-sided Fisher's exact = 4.4444 Pr = 0.035 0.043 0.035 11 -20 -18 Model: logit P(y=1|x) = a + bx Difference in log lik = 3.4 -26 -24 -22 LRT = 6.8 on 0 df? 0 5 beta 10 12 Example to explore correct df using Pearson / Fisher as gold standard . tab x y, chi2 exact | y x | 0 1 | Total -----------+----------------------+---------1 | 6 0 | 6 2 | 3 6 | 9 3 | 3 6 | 9 -----------+----------------------+---------Total | 12 12 | 24 Pearson chi2(2) = Fisher's exact = 8.0000 Pr = 0.018 0.029 • All expected counts ≥3 13 • Don’t want to drop or merge category 1 - contains the evidence for association! . xi: logistic y i.x i.x _Ix_1-3 Logistic regression Log likelihood = -11.457255 (naturally coded; _Ix_1 omitted) Number of obs LR chi2(2) Prob > chi2 Pseudo R2 = = = = 24 10.36 0.0056 0.3113 -----------------------------------------------------------------------------y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_Ix_2 | 1.61e+08 1.61e+08 18.90 0.000 2.27e+07 1.14e+09 _Ix_3 | 1.61e+08 . . . . . -----------------------------------------------------------------------------Note: 6 failures and 0 successes completely determined. . est store x . xi: logistic y Logistic regression Log likelihood = -16.635532 Number of obs LR chi2(0) Prob > chi2 Pseudo R2 = = = = 24 0.00 . 0.0000 -----------------------------------------------------------------------------y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+--------------------------------------------------------------------------------------------------------------------------------------------14 . est store null LRT . xi: logistic y i.x Log likelihood = -11.457255 . est store x . xi: logistic y Log likelihood = -16.635532 . est store null . lrtest x null Likelihood-ratio test (Assumption: null nested in x) LR chi2(1) = Prob > chi2 = 10.36 0.0013 15 Comparison of tests | y x | 0 1 | Total -----------+----------------------+---------1 | 6 0 | 6 2 | 3 6 | 9 3 | 3 6 | 9 -----------+----------------------+---------Total | 12 12 | 24 Pearson chi2(2) = Fisher's exact = LR chi2(1) = 8.0000 P P 10.36 P (using 2df: P Clearly LRT isn’t great. But 1df is even worse than 2df = = = = 0.018 0.029 0.0013 0.0056) 16 Note • In this example, we could use Pearson / Fisher as gold standard. • Can’t do this in more complex examples (e.g. adjust for several covariates). 17 My proposal for Stata • lrtest appears to adjust df for infinite parameter estimates: it should not • Model df should be incremented to allow for any variables dropped because they perfectly predict success/failure – Don’t need to increment log lik as it is 0 for the cases dropped • Can the ad hoc handling of zeroes by xi:logistic be improved? 18 Conclusions for statisticians • Must remember the c2 approximation is still poor for these LRTs – typically anti-conservative? (Kuss, 2002) • Performance of LRT can be improved by using penalised likelihood (Firth, 1993; Bull, 2006) - like a mildly informative prior – worth using routinely? • Gold standard: Bayes or exact logistic regression (logXact)? 19 The end 20 Output for example with 2-level x . logit y x Log likelihood = -19.095425 -----------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_cons | .6931472 .3872983 1.79 0.074 -.0659436 1.452238 ------------------------------------------------------------------------------ . estimates store x . logit y Log likelihood = -22.493406 -----------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------_cons | 1.098612 .3651484 3.01 0.003 .3829346 1.81429 -----------------------------------------------------------------------------. estimates store null . lrtest x null df(unrestricted) = df(restricted) = 1 r(498); . lrtest x null, force df(1) Likelihood-ratio test (Assumption: null nested in x) LR chi2(1) = Prob > chi2 = 6.80 0.0091 21