Computational North-Holland Statistics & Data Analysis 1.5 (1993) l-11 Bartlett correction factors in logistic regression models Lawrence H. Moulton * University of Michigan, Ann Arbor, MI, USA Lisa A. Weissfeld University of Pittsburgh, Pittsburgh, PA, USA Roy T. St. Laurent ** Unicersity of Michigan, Ann Arbor, MI, USA Received March 1991 Revised July 1991 Abstract: Bartlett correction factors for likelihood ratio tests of parameters in conditional and unconditional logistic regression models are calculated. The resulting tests are compared to the Wald, likelihood ratio, and score tests, and a test proposed by Moolgavkar and Venzon in Modern Statistical Methods in Chronic Disease Epidemiology, (Wiley, New York, 1986). Keywords: Conditional logistic regression; Likelihood ratio test; Score test; Wald test. 1. Introduction Epidemiologic case-control studies frequently have small sample sizes due to limited numbers of cases. It is common in such studies for logistic regression analyses to be employed, using maximum likelihood estimation and likelihood ratio, score, or Wald tests of parameter effects. However, these procedures are only asymptotically valid, and depend on the accuracy of the large sample approximations employed. The Wald test is attractive due to the ease with which it may be constructed. However, it is not invariant to parameter transformations. In addition, as Jennings (1986) illustrates it can exhibit unreliable behavior in that for a given sample size, parameter estimates farther from the null value may be judged less significant than those closer to the null value (see also Hauck and Donner, 1977). Correspondence to: Roy St. Laurent, Ph.D., Department of Biostatistics, School of Public Health, University of Michigan, 109 South Observatory, Ann Arbor, MI 48109-2029, USA. * Now with the Department of International Health at The Johns Hopkins University. * * Authorship is randomly ordered. 0167-9473/93/$06.00 0 1993 - Elsevier Science Publishers B.V. All rights reserved 2 L.H. Moulton et al. / Bartlett correction factors Numerous methods for drawing inferences about the parameters in the logistic regression model have been developed for use in small sample situations. Cox (1970) has described an exact approach to inference about a single parameter of the model. This approach has been extended and rendered more feasible via efficient algorithms for calculating joint and conditional distributions of the sufficient statistics (Bayer and Cox, 1979; Tritchler, 1984; Kirji, Mehta and Patel, 1987). However, the method still may take hours of computation, and current software permits only binary covariates to be included in the regression model. Moolgavkar and Venzon (1986) have proposed a small sample correction to Wald-based confidence regions motivated by differential geometric considerations. These regions, based on transformations of the standard Wald regions, are obtained by solving a system of quasi-linear differential equations and may be inverted in the usual fashion to obtain a test of hypothesis. Another approach to parameter inference is to improve upon the small sample performance of the likelihood ratio test. Bartlett (1937) proposed multiplying the likelihood ratio test statistic by a correction factor so that its distribution under the null hypothesis would more closely follow that of a x2 random variable. This approach, generalized by Lawley (1956), consists of matching the first moment of the test statistic to that of a x2 random variable. The effect is to bring the moments of the corrected statistic to within O(n-*) of those of the appropriate chi-squared distribution (Lawley, 1956). In this paper we give explicit expressions for the Bartlett correction factor in tests of location parameters in conditional and unconditional logistic regression models. In addition, we evaluate the performance of the resulting corrected likelihood ratio test statistic and compare its performance to the usual likelihood ratio, Wald, and score test statistics as well as the Moolgavkar-Venzon modified Wald procedure. 2. Calculation of Bartlett correction factors 2.1. Introduction Let L(p) denote the log lik_elihood function for the p-vector p based on II observations, and let L and L, denote the value of the log likelihood evaluated at the true parameter vector /3, and the maximum likelihood estimate a, respectively. Lawley (1956) showed that under mild regularity conditions 2E(L, -L) =p + lp + O(n-*>, where the correction term cp is of order n-l and may be written (Cordeiro, 1983) Ep = KrSKtU( $Krst,- Kg; + Kzs"'} - KrsKIUKuW 3 L.H. Moulton et al. / Bartlett correctionfactors Here all of the indices run from 1 to p and the usual summation applied. The K’S are defined in terms of L(p) as convention is and so on, where p, is the rth component of p. In this notation, the information matrix A = { - K,~) and its inverse is denoted A - ’ = ( -Key). Suppose that PT = (pr,r,. . . , &, &, . . . , &_J and under the null hypothesis p2,r = * ** = &,_, = 0. Let L, be the maximized log likelihood under the null hypothesis, and lq be the correction term calculated und_er the null model. Then dividing the usual likelihood ratio test statistic 2(L, - LJ by the Bartlett factor 1 + (eP - eq)/(p - q) will improve the approximation of its distribution by a Xi_, random variable under the null model. Generally the correction factor will be a function of j3 and should be estimated from the null model via maximum likelihood. 2.2. Unconditional logistic regression The unconditional logistic regression model may be written i=l logit P( yi = 1 I xi) =x:/3, 7* * * 94 (1) where xi is the p-vector of covariates for the ith observation, the yi are independent Bernoulli random variables, and p is a p-dimensional parameter vector. The log likelihood is then L(P) = 2 [ Yi log{&/(l - Q} + lo@ - eJ] 9 (2) i=l with Oi= exp( .rTp)/{ 1 + exp( XT@)}, Cordeiro (1983, equation 81, gives an expression for lp applicable generalized linear model. Applying his result to (21, we have Ep = +tr( HZ:) + +I,TF(2Z’3’ + 3Z,ZZ,)Fl,, to any (3) where the n x n matrices F, H, Z, and Z, are given by F = V(l,, - 201, H= 1/(61/-L,), Z =X(XTVX-lXT, and Z, is a diagonal matrix composed of the diagonal elements of Z. Here I/= diag{8,(1 - O,)}and 8 = diag{O,} are n x n matrices and 1, is an n-vector of 1’s. 2.3. Conditional logistic regression Case-control epidemiologic studies often employ matching or stratification in order to increase efficiency and reduce bias. In a regression framework, the preferred analysis of highly stratified data involves conditioning on stratum 4 L.H. Moulton et al. / Bartlett correction factors membership (Breslow et al., 1978). The resulting conditional is then maximized to yield conditional maximum likelihood Consider a case-control study with K matched sets (or stratum consists of one case and R controls, yt = K( R + 11, case in each stratum, so that yOk = 1, and yjk = 0 for j # and j = 0,. . . , R. We consider the logistic model likelihood function estimates. strata), where each j = 0 will denote the 0. Here i = 1,. . ., K logit P( yjk = 1 I xjk) = ak +x$p, where (Ye depends on the values of the matching variables. Conditioning 1: R structure of the stratum leads to elimination of CX~ resulting conditional log likelihood function L(p) = ; 5 Yjk lOg(Pjk) k=l j=O = 2 l”g(POk)’ on the in the (4) k=l Here pjk depends upon the parameter vector p through the pair of transformations rjk = exp(xiT,p) and pj, = 5-jk/C5-mk (notation of Pregibon, 1984). Here xjk is the p-vector of covariates for the jth individual in the kth matched set and rjk is the corresponding odds ratio. The model depends upon /3 only through the odds ratio, therefore xsp need not include an unknown constant, as it is not estimable. The conditional logistic regression model is not a generalized linear model, hence the expressions for the correction term lp developed by Cordeiro (1983) cannot be applied to (4). Since pok + . . . +pRk = 1, R Xjk = Xjk - c pmkxmk, m=O can be thought of as ‘centered’ x’s. The Bartlett adjustment for the conditional model is a function of the covariates only through the ijk’s. For example, the p X p information matrix A, may be written A = { -K,,} = CA,, where A, = &Z..~~~_?~~ijTk =J?~W,~k. Here the (R + 1) xp matrix of centered covariates is and W, is an (R + 1) x (R + 1) diagonal matrix with ~~~ x, = t&k,. . . , &IT, matrix can be on the diagonal, j = 0,. .., R. It follows that the information written A =zTIV%, where W is block diagonal with blocks IV,, . . . , W,, and 2? is the K(R + 1) xp matrix of centered covariates combined over matched sets. The correction term for the one-to-R matched design is where Tk = tr(A,A-‘1, and the (R + 1) X (R + 1) matrix U = cw,<gk A,12?z)(2)Wk. In addition, P =_f(J?TW%-l%T and Pd = diag(pii>, where pii is the ith diagonal element of P, i = 1,. . . , K(R + 1). Pc3) is the matrix whose elements are the cube of the elements of P. A derivation of this result can be found in the Appendix. Note that our calculations are carried out with respect 5 L.H. Moulton et al. / Bartlett correction factors to the number of strata, K. The moments of the resulting statistics thus are corrected to order 0(K2> (Lawley, 1956). It is informative to write (5) as Ep = I-2 4P13 likelihood ratio ++p;,-+p,, where p:, = lnTpdTWPVT’dl,, pi, = l~WPC3)Wl,,, and p4 = l~P~WPdl, - 2(1:+, UIR+,) - CT:. The terms p:,, pt, measure the squared skewness, and p4 measures the kurtosis of the score vector (McCullagh and Cox, 1986). 3. Numerical comparisons 3.1. Study design To assess the agreement between the nominal and actual level of the adjusted likelihood ratio test statistic in unconditional and conditional logistic regression, and to explore the power of the test, a numerical study was conducted comparing the score, likelihood ratio, and Wald tests to the Bartlett corrected likelihood ratio test when testing a single element of p to be equal to zero. In addition, these results were compared to the confidence interval based test of Moolgavkar and Venzon (1986). For each of several sample sizes iz, B replicates of the basic experiment were performed. For each replicate, an n-vector x of independent pseudo-random Uniform (0, 1) deviates was generated. For y1I 10, the exact level and power were calculated eased on a complete enumeration of the possible 2” response vectors y, each weighted by its likelihood. Computing restrictions required simulation of the y vectors from the logistic model when y1> 10. These values were generated from another set of Uniform (0, 1) deviates. The multiplicativecongruential uniform pseudo-random generator of the GAUSSTM system was employed. For each sample size, we report the observed level and power as the mean rejection rate of the null hypothesis over the B replicates. 3.2. Unconditional logistic regression results Here we used the model XT@= PO + plxi, and compared the level and power of the five methods for testing the null hypothesis p1 = 0 at nominal level (Y= 0.05. For this model with p = 2, the Moolgavkar-Venzon limits for p require the simultaneous solution of the following equations: 6 L.H. Moulton et al. / Bartlett correction factors Table 1 Comparison of average rejection rates for five tests, at nominal level (Y= 0.05, of HO : PI = 0 in an unconditional logistic regression model. n = total number of observations. Columns 3-7, average rejection rate X 100. Column 8, maximum standard error across the five averages x 100. Wald MoolgavkarVenzon Score Likelihood ratio Bartlett corrected Maximum std. error PI n 0 10 20 40 0 3.35 4.56 9.00 7.22 6.15 5.68 5.58 5.31 9.01 6.42 5.73 6.35 5.28 5.14 0.27 0.25 0.15 1 10 20 40 0 4.94 12.52 10.96 12.72 16.22 7.42 9.73 14.31 11.76 11.22 15.37 8.44 9.52 14.21 0.28 1.05 0.70 2 10 20 40 0 6.09 30.24 13.97 24.71 39.45 11.01 15.99 34.77 18.15 19.19 37.41 13.35 16.33 35.25 0.79 3.56 1.45 with p(O) = p^, L(p) and oi defined in (1) and (2). Here Z,,, is the (1 - Lu/2)th percentage point of the standard normal distribution and sg = - 1, + 1 for the lower and upper confidence limits respectively. These equations are integrated from t = 0 to 1. The solution to these equations can be obtained numerically using the Adams-Gear algorithm (Moolgavkar and Venzon, 1986). Results are presented for 12= 10, 20, 40, with PO = 0 for all runs. In Table 1 we give the estimates of level, and power for /3r = 1, 2, for each of the five methods. Table 1 also includes the maximum standard error of the mean over the five tests, calculated from the B replicates. For y1= 10, B = 25 replicates were used, while for II = 20 and 40, we took B = 5 replicates of 5000 simulations each. We see that the poor performance of the Wald statistic was markedly improved by the Moolgavkar-Venzon correction. The score test and the Bartlett corrected likelihood ratio test both outperform the likelihood ratio test in the level (pr = 0) runs, the latter being too liberal. Even though the Bartlett correction factor is designed to improve performance in small samples, at the smallest sample size (n = 10) it performed slightly worse than the score test. However, for y1= 20 and 40, rejection rates for the Bartlett corrected likelihood ratio test were closest to the nominal level. The likelihood ratio test and Moolgavkar-Venzon test are both rather liberal, which translates into greater power, as seen in the & = 1 and & = 2 runs, where their rejection rates are all greater than those of the score and Bartlett corrected likelihood ratio test. For these runs, the power for the score and Bartlett corrected likelihood ratio test are very close. Considering the relative ease with which the score test may be calculated, and the similarity between its behavior and that of the Bartlett corrected likelihood ratio test, based on these results it would appear that the score test is the preferred method of testing in this situation. 7 L.H. Moulton et al. / Bartlett correction factors Table 2 Comparison of average rejection rates for five tests, at nominal level (Y= 0.05, of H,, : p, = 0 in a conditional logistic regression model. R = 1, K = number of strata. Columns 3-7, average rejection rate x 100. Column 8, maximum standard error across the five averages X 100. Pl 0 Wald K 5 0 6 7 8 9 0 10 20 40 0 MoolgavkarVenzon Score Likelihood ratio Bartlett corrected Maximum std. error 2.12 3.89 1.00 4.25 6.44 7.63 8.36 8.25 6.97 5.77 1.25 3.00 3.81 3.75 4.06 4.17 4.77 4.82 10.75 9.00 8.19 6.91 6.94 6.71 5.87 5.29 7.50 6.63 6.19 5.44 5.31 5.29 5.13 4.93 0.68 0.64 0.41 0.38 0.32 0.24 0.51 0.14 0 0 0 1 5 6 7 8 9 10 20 40 0 0 0 0 0 0 6.34 21.76 0.96 5.21 8.13 10.23 12.13 12.82 16.23 27.17 1.78 4.33 5.87 6.34 7.08 7.63 12.38 24.71 13.22 11.84 11.48 10.67 11.19 11.34 14.54 25.99 9.43 9.01 8.95 8.70 8.92 9.31 13.14 25.04 0.81 0.77 0.63 0.60 0.44 0.47 1.77 1.44 2 5 6 7 8 9 10 20 40 0 0 0 0 0 0 20.85 65.06 1.16 7.07 12.38 17.60 21.69 24.20 40.18 70.97 3.18 7.96 11.51 13.58 15.51 17.31 33.69 68.38 19.57 19.25 20.16 20.66 22.44 23.64 37.27 69.86 14.45 15.31 16.37 17.54 18.72 20.27 34.95 68.82 1.40 1.40 1.37 1.44 1.19 1.30 6.13 3.20 3.3. Conditional logistic regression results Here the Moolgavkar-Venzon related differential equations. integrating confidence interval is obtained by solving two For p = 1, the interval endpoints are found by from t = 0 to 1, with p(O) = p^ and Tj,(t> = exp{xiT,P(t>}. Numerical solutions are obtained using the Adams-Gear algorithm. For the conditional model, the performance of the five methods was compared in the one-to-one matched case-control (R = 1) model with one covariate via a test of the null hypothesis p, = 0 at nominal level (Y= 0.05. Results are presented for K = 5(1)10, 20, 40 matched sets. Table 2 gives the simulation estimates of level and power (for p, = 1, 2) for each of the five 8 L.H. Moulton et al. / Bartlett correction factors methods and the maximum standard error of the mean, across the five tests. For K I 10, B = 25 replicates were used, while for K = 20 and 40, B = 5 replicates were run of 5000 simulations each. In general, the simulation results for the conditional model parallels those of the unconditional model. For small sample sizes, the score and Wald tests are generally too conservative, while the likelihood ratio test and Bartlett corrected likelihood ratio test are too liberal. The Moolgavkar-Venzon procedure performs rather erratically (moving from conservative to liberal as K increases) and poorly overall when compared to most other methods. The rejection rates for the Bartlett corrected likelihood ratio test, with the exception of the K = 6 run, are closest to the nominal level. In the power runs, p, = 1, 2, the Bartlett corrected likelihood ratio test clearly performed better than the score test. The Moolgavkar-Venzon test and likelihood ratio test exhibited the highest power, but this performance should be discounted by their liberal results in the level runs. 4. Discussion For the conditional analysis of one-to-one matched data with a single covariate (p = l), the Bartlett correction for the null hypothesis /3i = 0 has a particularly simple form. Let zjk =xjk -X,k, the stratum mean centered covariate, and let S; be the variance of xjk within the kth stratum. Then & = & = Tf/K and skewness and kurtosis p4 = (f, - 3C*)/K, w h ere q1 and q2 are the standardized of Zjk (j=O )...) R,k=l,..., K), respectively, and C is the coefficient of variation of S; (k = 1, . . . , K). Thus pi = (59: - 3q2 + 9C2)/(12K). In the case of one-to-one matching (R = 11, q, = 0 and E, may be simplified to El = 1 - c @Ok_-Xlkj4 2 ‘Ok - ‘lk)*)* ’ where sums in the numerator and denominator run over k = 1,. . . , K. This expression may also be written in terms of deviations xjk -X,k. In either form, the correction is essentially a measure of the kurtosis of the distribution of the covariates. If in addition, xjk is a binary covariate, then pi = (2m)-’ where m is the number of strata in which xok + xlk (0 I m I K), so that the Bartlett factor becomes 1 + (2m)-‘. Here the likelihood ratio test statistic may be written 2m{log2 + r logr + (1 - r) log(1 - r)}, where r is the proportion of the m strata for which xok # xlk and xok = 1 (so that 1 - r is the proportion for which Xik = 1). Performing calculations similar to those of Frydenberg and Jensen (1989) for this likelihood, we found the order of the correction in this case to be less than O(Kp’). This agrees with their findings that in the case of fully discrete data, the Bartlett correction offers little, if any, improvement. More- L.H. Moulton et al. / Bartlett correction factors 9 over, for up to at least K = 50 strata, the tail-area probability of the Bartlett corrected likelihood ratio test is the same as for that of the score (McNemar) test at size (Y= 0.05. This stands in contrast to our results for the continuous covariate case (Table 2), where the Bartlett corrected likelihood ratio test performs better than the score test. The results of our runs for both the conditional and unconditional models indicate that the computational burden of obtaining the Moolgavkar-Venzon interval may not be worth its modest performance when used for testing. Instead of trying to improve the Wald test, better results are had by improving the likelihood ratio test via the relatively simple Bartlett correction. Many factors may conspire to render even a data set with a large number of observations one in which standard large-sample asymptotics do not provide sufficiently accurate approximations. These include extreme response probabilities, many estimated parameters with respect to the number of observations, sparse corners of the covariate space, and link functions that yield relatively flat likelihood functions. In such situations, it may be prudent to use a Bartlett corrected likelihood ratio test. For the logit link case we have studied here, the score test is also to be preferred over the Wald and likelihood ratio test. Appendix: Derivation of (5) From the definitions established: of pjk and ijk following (4), these identities are easily (A *I) 5 /-Q&k = 0, j=O a (A.2) a R rijk= w - C Pmkxmk’Tmk= - m=O C P,ki,k’FTmk’ m=O Taking second partial derivatives the identities above, gives of the log likelihood (A.3) in (4), and making use of where ijkr is the rth element of the p-vector +Cjk.As the elements of the second partial derivative matrix in (A.41 are non-stochastic, it follows that the right-hand side of (A.41 is -K,,. Hence A= 2 k=l ~~~~~~~~~~ j=O which also may be written A = X’JKf. L.H. Moulton et al. / Bartlett correction factors From (A.41 it also follows that &) = ~,,t, K:) = ~,,t~, the expression for the Bartlett correction may be written Ep = ~K~~K~~K,,~, - Kr’KtUKuW{~KrtuKSUW + ~K,~~K,,~}. and so on. Therefore (A.3 Taking partial derivatives of K,~ from (A.41, and employing the identities (A.lNA.31, the expression for a mixed third partial derivative of L(p) is written a”L(p) K K + ‘St = a&a&a& C k=l R C (A-6) Pjkijkrijks’j.4t’ j=() The expression for a mixed fourth partial derivative of L(p) is obtained from in a similar fashion, yielding K rSt K rStU = d”L(P) ap,ap,ap,ap, = f 5 k=l - pjkijkrijksijktijku j=O 5 k=l @krsAktu +AkrtAksu +AkruAkst), (A-7) where Akrs = Substituting the expressions (A.41, (A.6) and (A.7) into (AS) and employing a fair amount of matrix algebra inspired by McCullagh (1987, page 2181, and equation (11) of Cordeiro (19831, we obtain the matrix form for the Bartlett correction given in (5). CT=()pjk%jkr%jks. References Bartlett, M.S., Properties of sufficiency and statistical tests, Proceedings of the Royal Statistical Society A, 160 (1937) 268-282. Bayer, L., and C. Cox, Algorithm AS142: Exact tests of significance in binary regression models, Appl. Statist., 28 (1979) 319-324. Breslow, N.E., N.E. Day, K.T. Halvorsen, R.L. Prentice and C. Sabai, Estimation of multiple relative risk functions in matched case-control studies, Amer. J. Epidemiol., 108 (1978) 299-307. Cordeiro, G.M., Improved likelihood ratio statistics for generalized linear models, J. Roy. Statist. Sot. B, 45 (1983) 404-413. Cox, D.R., The Analysis of Binary Data (Methuen, London, 1970). Frydenberg, M., and J.L. Jensen, Is the ‘improved likelihood ratio statistic’ really improved in the discrete case?, Biometrika, 76 (1989) 655-661. Hauck, W.W., and A. Donner, Wald’s test as applied to hypotheses in logit analysis, J. Am. Statist. Assoc., 72 (1977) 851-853. Jennings, D.E., Judging inference adequacy in logistic regression, J. Am. Statist. Assoc., 81 (1986) 471-476. Kirji, K.F., C.R. Mehta and N.R. Patel, Computing distributions for exact logistic regression, J. Amer. Statist. Assoc., 82 (1987) 1110-1117. L.H. Moulton et al. / Bartlett correction factors Lawley, D.N., A general method for approximating to the distribution 11 of likelihood ratio criteria, Biomettika, 43 (1956) 295-303. McCullagh, P., Tensor Methods in Statistics (Chapman and Hall, London, 19871. McCullagh, P., and D.R. Cox, Invariants and likelihood ratio statistics, Ann. Statist., 14 (1986) 1419-1430. Moolgavkar, S.H., and D.J. Venzon, Confidence regions for case-control and survival studies with general relative risk functions, in: S.H. Moolgavkar and R.L. Prentice (Eds.), Modern Statistical Methods in Chronic Disease Epidemiology (Wiley, New York, 1986). Pregibon, D., Data analytic methods for matched case-control studies, Biometrics, 40 (1984) 639-651. Tritchler, 709-711. D., An algorithm for exact logistic regression, J. Amer. Statist. Assoc., 79 (1984)