Simple Adjustments to Reduce Bias and Mean Squared Error Associated With Regression-Based Odds Ratio and Relative Risk Estimators by Robert H. Lyles and Ying Guo Technical Report 09-04 November 2009 Department of Biostatistics and Bioinformatics Rollins School of Public Health 1518 Clifton Road, N.E. Emory University Atlanta, Georgia 30322 Telephone: (404) 727-1310 FAX: (404) 727-1370 e-mail: rlyles@emory.edu Simple Adjustments to Reduce Bias and Mean Squared Error Associated With Regression-Based Odds Ratio and Relative Risk Estimators Robert H. Lyles and Ying Guo Department of Biostatistics and Bioinformatics, The Rollins School of Public Health of Emory University, 1518 Clifton Rd. N.E., Atlanta, GA 30322 (ph: 404-727-1310; fax: 404-727-1370; email: rlyles@sph.emory.edu) Acknowledgements and Funding Support R.H.L. was supported in part by NIEHS grant 2R01-ES012458-5. Y.G. was partially supported under NIH grant R01-MH079448-01. 1 Abstract In most practical situations, maximum likelihood estimators (MLEs) of regression coefficients stemming from standard logistic, Poisson, and Cox proportional hazards models are reasonably reliable in terms of their effective bias and approximate normal sampling distributions. While this generally leads to satisfactory coverage properties for the usual normal theory-based confidence intervals, a straightforward argument demonstrates that standard estimators of odds ratios and relative risks obtained by direct exponentiation are biased upward in an explicitly predictable way. This bias produces a propensity toward misleadingly large effect estimates in practice. We propose correction factors that apply in the same manner to each of these regression settings, such that the resulting estimators remain consistent and yield demonstrably reduced bias, variability, and mean squared error. Our initial proposed estimator targets mean unbiasedness in the traditional sense, while the usual exponential transformation-based MLE is geared toward approximate median unbiasedness. We also propose a class of estimators that provide reduced mean bias and squared error, while allowing the investigator to control the risk of underestimating the true measure of effect. We discuss pros and cons of the usual estimator, and use simulation studies and real-data examples to compare its properties to those of the proposed alternatives. Key Words: Bias; Odds ratio; Regression; Relative risk 2 Logistic, Poisson, and Cox proportional hazards regression are among the most traditional and widely-used analytic techniques in many areas of research. Although useful for prediction and other purposes, these models are most often applied with the primary aim of estimating and making inferences about odds ratios or relative risks. Standard application of these techniques takes advantage of unconditional, conditional, or partial likelihood methods, yielding consistent estimators that support convenient inference based on asymptotically normal sampling distributions. (1-3) As is well known, traditional estimators of regression coefficients and resulting measures of effect are not typically unbiased outside the linear model setting. Most prior research devoted toward bias reduction tends to target bias on the regression coefficient scale that arises due to small samples. For example, McCullagh(4) characterized the approximate bias in maximum likelihood estimators of generalized linear model regression coefficients. Firth(5) proposed penalized maximum likelihood to reduce such bias in logistic regression. King and Zeng(6) describe bias-reduced estimators for probabilities based on logistic regression with rare events data, and recommend using ratios of these to improve estimates of relative risk. We present an approach aimed at bias reduction when estimating measures of effect in logistic, Poisson, or Cox regression settings, among others. Almost any frequent user of these statistical models has at times encountered suspiciously large odds or risk ratio estimates. We argue that a tendency toward large point estimates results directly from the source of bias addressed here, due to exponentiating estimated regression coefficients to transform to the measure-of-effect scale. In fact, statisticians are well aware that upward bias can be induced by exponentiation, since this follows as a direct consequence of Jensen’s inequality.(7) Our goal is to demonstrate the nature and form of this bias, and to develop a class of reduced-bias estimators. 3 Our initial proposal targets mean unbiasedness, which inevitably requires a sacrifice of the approximate median unbiasedness that is a property of standard exponential transformationbased estimators. As a result, more than 50% of the proposed bias-corrected estimates would be expected to fall below the true odds or risk ratio upon repeated sampling from a study population. As a compromise, we also introduce bias-reduced estimators that provide mean bias, precision, and mean squared error (MSE) benefits while directly controlling the probability of underestimating a true measure of effect. The focus of this article differs from that of previous work, in the following respects. First, we directly target bias in estimated odds ratios and relative risks, as opposed to bias in estimated regression coefficients, because it is on the former scale that researchers generally report and interpret estimates. Thus, a strong case can be made in favor of the view that classical estimation criteria (mean bias, variability, MSE) remain particularly relevant on that scale. Second, we assume the investigator is working in a scenario in which sampling anomalies [e.g., the separation problem in logistic regression(9)] are rare, there is relatively little bias in the usual regression coefficient estimators, and in which the typical normal approximations for their sampling distributions are reasonably adequate. This assumption is defensible in many realworld applications assuming appropriate models and well-designed studies. Our proposal encourages thought about the merits of seeking mean versus median unbiasedness on the measure-of-effect scale, while offering alternatives when the former is deemed desirable and/or a compromise between these two performance measures is sought. We believe that the specific relevance of this distinction to standard regression settings is not widely recognized, and that most analysts naturally tend to associate the criterion of bias with the traditional view. Practicing statisticians and investigators who commonly apply these standard 4 regression models may be surprised at the potential extent to which the usual odds and risk ratio estimators sacrifice unbiasedness, precision, and MSE. METHODS Bias-Corrected Point Estimation We focus on the wide variety of problems in which an estimate of effect [e.g., an odds ratio (OR) or relative risk (RR)] is typically obtained by exponentiating an estimator whose sampling distribution is asymptotically normal. This includes fundamental problems based on 2×2 tables(10) and potentially extends to modeling exercises for longitudinal or otherwise correlated data.(11-13) However, we restrict attention here to traditional models for data obtained under independence of experimental units, e.g., logistic, Poisson, or Cox models,(1-3) where the estimators to be exponentiated correspond to regression coefficients. Thus, we consider the models k g[E(Y | X x)] β 0 β j x j j 1 , (1) where the logit or log link functions g(.) are standard for logistic or Poisson regression, respectively, or k ln[h(t | X x)] ln[h 0 (t)] β j x j j 1 , (2) where h(.) and h0(.) represent the hazard and baseline hazard functions in Cox regression. When making inference about an OR or RR [say, ψj = exp(βj)] based on model (1) or (2), we typically make use of the following standard asymptotic result pertaining to the sampling distribution of a maximum likelihood (ML) or partial ML estimator: β̂ j ~ N(β j , σ 2j ) 5 , (3) where σ 2j represents the variance of β̂ j across repeated samples. The estimator routinely employed is the MLE for the desired measure of effect, i.e., ψ̂ j e β̂ j (j=1,…, k) (4) Aside from the implications of Jensen’s inequality and the potential for ψ̂ j in equation (4) to fail to exist (or “blow up”) due to rare sampling outcomes,(9) it is easy to show that this standard estimator has a fundamental built-in tendency toward positive bias. In particular, the distributional result in equation (3) dictates that the sampling distribution of ψ̂ j is approximately lognormal such that E(ψ̂ j ) e β j σ 2j / 2 (5) We note that expression (5) may also be derived via Taylor series arguments given that E(β̂ j ) β j , without the assumption of normality for β̂ j . Although in large samples the sampling variability ( σ 2j ) tends to zero so that ψ̂ j is consistent, this standard estimator is clearly biased upward in practice. In particular, ψ̂ j is geared toward median unbiasedness(14) rather than mean unbiasedness, as the median of the approximate lognormal sampling distribution is e βj but the mean is larger. This fact is also clear from the symmetry of the approximate normal distribution for β̂ j , which is (approximately) both mean and median unbiased. The implication is that the traditional estimator ψ̂ j is essentially equally likely to over- or underestimate the true measure of effect across repeated samples, but can be subject to overestimation errors of very large magnitude. These errors might be viewed as especially detrimental given the emphasis 6 (e.g, in epidemiology) often placed on interpreting measure-of-effect point estimates. Figure 1 is a plot of the bias factor (e σ 2j / 2 ) against j, the standard deviation of the normal sampling distribution associated with β̂ j . Note that the bias is minimal for small values of j, but quickly increases. The expectation of ψ̂ j will be approximately 150%, 200%, and 300% of the true ψj if j = 0.9, 1.2, and 1.5, respectively. To approximately eliminate the positive bias of ψ̂ j , we propose the following straightforward “corrected” estimator based on expression (5): ψ̂ j, corr = e σ̂ 2j / 2 ψ̂ j , (6) where σ̂ 2j is the square of the estimated standard error associated with β̂ j when fitting model (1) or (2). The estimator in (6) remains consistent, and should be preferred to the standard estimator if one values the ideal of traditional as opposed to median unbiasedness. Because the bias indicated in expression (5) is always positive, the proposed correction reduces the point estimate regardless of whether the standard estimate ψ̂ j is greater than or less than one. One criticism of (6) might be that it incorporates exponentiation of the term σ̂ 2j / 2 , thus arguably introducing some bias of a similar nature to that originally targeted. While corresponding adjustments to (6) could be contemplated, we maintain that this source of bias is generally ignorable because the sampling variability of σ̂ 2j tends to be very small (much smaller than that of β̂ j itself). Our empirical studies have been uniformly consistent with this view, leading us to favor the simple adjustment in (6). The result in (5) has some connection with what has been termed the “retransformation” 7 problem. For example, if one fits a linear regression model to a log-transformed dependent variable (Y) assuming i.i.d. normal errors, the expectation of Y will clearly involve a 2 multiplicative factor of the form e σ / 2 . There has been extensive discussion in the literature about the problem of correctly estimating E(Y) in such situations, with and without direct specification of the transformation or the distributional form of the errors.(15-16) Major distinctions in the current application are that the result in (5) applies to the sampling distribution of β̂ j rather than to the distribution of Y, it is inherently robust due to well-established large sample theory, and it is used here toward the estimation of measures of effect (i.e., ORs, RRs) rather than E(Y). Bias-Reduced Estimators Controlling the Risk of Underestimation The estimator in (6) is designed to produce minimal traditional (mean) bias, given that the normal approximation (3) is reasonable. It is asymptotically equivalent to the usual ML estimator, but is virtually guaranteed to yield lower sampling variability in practice because it multiplies by a correction factor constrained between 0 and 1. Naturally, these reductions in bias and variability also imply reduced MSE. The benefits in these three well-emphasized estimation criteria are made at the expense of sacrificing the approximate median unbiasedness that characterizes the usual point estimator ψ̂ j . Thus, while it is appealing to achieve potentially substantial reductions in mean bias, variance, and squared error, one may be reluctant to admit a drastic departure from median unbiasedness. The approximate lognormal distribution characterizing ψ̂ j readily allows us to contemplate a class of estimators permitting access to some of the benefits inherent in the biascorrected estimator (6), while exerting targeted control over the extent to which median 8 unbiasedness is forfeited. In this direction, note that (3) implies that Pr(ψ̂ j, corr e βj ) Φ(σ j / 2) , (7) where Φ(.) represents the standard normal cumulative distribution function. Thus, in practice Φ(σ̂ j / 2) , which always exceeds 0.5, provides a reasonable estimate of the probability that the use of ψ̂ j, corr would underestimate the true measure of effect. Larger values of σ̂ j imply more upward bias in ψ̂ j and a consequently larger downward adjustment via ψ̂ j, corr . This in turn suggests that ψ̂ j, corr will deviate more markedly from median unbiasedness; i.e., the probability in (7) may be substantially greater than 0.5. To control this risk of underestimation, consider an estimator of the form ψ̂ j, p e c ψ̂ j for some constant c and specified probability p ( 0.5) and suppose we wish to ensure that Pr(ψ̂ j, p ψ j ) p . Here, (3) implies that c = jzp, where zp is the 100p-th percentile of the standard normal distribution. This leads to a class of estimators, i.e., ψ̂ j, p e σ̂ j z p ψ̂ j , where consistency is maintained regardless of the value chosen for p. To maximize potential improvements in mean bias and squared error while controlling the risk of underestimating j, we propose the following bias-reduced estimator: ψ̂ *j,corr max(ψ̂ j,corr , ψ̂ j,p ) , (8) with p judiciously selected by the investigator. For example, one who is unwilling to make any concession of median unbiasedness takes p=0.5, so that ψ̂*j, corr ψ̂ j,.50 ψ̂ j , the usual estimator. On the other hand, suppose one is 9 willing to tolerate approximately a 60% chance of underestimating the true effect in return for gains in bias, efficiency, and MSE. Then p=0.6 yields ψ̂ *j, corr max(e σ̂ 2j / 2 ψ̂ j , e σ̂ j z .60 ψ̂ j ) , where z.60=0.253. It is readily seen that (8) is equivalent to ψ̂*j, corr ψ̂ j, corr if σ̂ j 2 z p otherwise ψ̂ j, p Thus, the bias-reduced estimator targets a full bias correction as in (6) as long as σ̂ j 2 z p ; otherwise, it tempers the bias correction factor to a degree commensurate with the probability (p) of underestimation that is deemed acceptable. The latter approach still promises mean bias, precision, and squared error improvements over ψ̂ j , albeit not to the full extent available via (6). Notes Regarding Interval Estimation and Invariance It is important to note that our proposals are geared toward point estimation on the OR or RR scale, and imply no adjustments to the usual regression coefficient estimate ( β̂ j ) or its standard error. That is, we would follow standard practice if reporting an estimate on the log scale, but use the correction factors in (6) or (8) to address the bias introduced due to exponentiating β̂ j before reporting or interpreting the estimate on the natural scale. As the proposed estimators are predicated on the normal approximation (3), they also imply no direct argument against the confidence interval that typically accompanies ψ̂ j in (4), obtained by exponentiating the following bounds: β̂ j z1 α/ 2 σ̂ j , (9) where σ̂ j is the usual standard error estimate. We focus on typical settings in which this interval 10 should provide acceptable performance, and where alternative intervals such as those based on likelihood ratios17 would be expected to perform similarly. Point estimates based on (6) or (8) will simply be shifted further to the left within these intervals. We note in passing that an argument almost identical to that underlying (5) accurately predicts that the usual upper and lower limits obtained by exponentiating the bounds in (9) will be biased upward for the true population percentiles (e β j z1 α/ 2 σ j ) that they purport to estimate. This tendency, most pronounced for the upper limit, can contribute to exceedingly wide intervals in the same way that such bias can produce excessively large point estimates. An adjustment similar to that in (6) can be made to nearly eliminate this bias as well. However, we find that attempts to take advantage of the resulting bias-corrected upper and lower limit estimators to yield narrower average interval widths require sacrifices in confidence interval coverage balance, as the adjusted interval is necessarily shifted to the left. In our opinion, the forfeiture of coverage balance is much harder to justify than, for example, some sacrifice of median unbiasedness with respect to point estimation. If a covariate Xj is binary, then it is well known that the MLE ( ψ̂ j ) for the OR or RR possesses a certain invariance to coding changes. For example, if the coding of Xj is switched from (0,1) to (1,0), then the MLE for βj changes sign and ψ̂ j is correspondingly inverted. In the case of logistic regression, such inversion occurs regardless of the nature of the covariate Xj if the coding of the outcome Y changes, e.g., from (0,1) to (1,0). Importantly, this type of invariance is not a property of the bias-corrected estimator (6), or of the bias-reduced estimator (8). In other words, neither ψ̂ j, corr nor ψ̂*j, corr should be inverted to obtain an estimate of j upon recoding of a covariate or outcome. Doing so would produce an estimator that is no longer 11 bias-corrected on the inverted scale (and in fact more biased on that scale than the usual MLE), thus negating the purpose of the estimation method. The proper approach when using (6) or (8) is to compute the corresponding point estimates directly after first selecting the scale for reporting. SIMULATION STUDIES Tables 1-3 summarize simulation experiments examining the proposed point estimators, with 5,000 independent replications generated in each scenario. In the case of logistic regression, we simulated three covariates as follows: X1 ~ N(0, 0.22), X2 ~ Bernoulli(p), and X3 ~ Uniform(0, 0.5). X3 was generated independently of X1 and X2, but we introduced correlation between X1 and X2 by taking p=0.15 in the event that X1 > 0 and p=0.85 in the event that X1 < 0. For simulations under Poisson regression, data were generated under model (1) with a log link and the covariate distributions remained the same except with X1 ~ N(0, 0.12) and X3 ~ Uniform(0, 0.25). For Cox regression, we generated survival times with constant baseline hazard, 33% random censoring, and the same covariate distributions used for the Poisson regression simulations. The parameters 1, 2, and 3 were set as 2, 1, and 0.5, corresponding to ORs or RRs of 1=7.39, 2=2.72, and 3=0.61. Assumed sample sizes varied somewhat depending on the model considered (n=200, 100, and 250 for logistic, Poisson, and Cox regression, respectively). These sampling conditions produced similar average standard errors for the MLEs of the regression coefficients under each type of model, with those corresponding to X1 and X3 large enough to illustrate the potential for marked differences between the standard and proposed bias-corrected and bias-reduced estimators. The second column of Table 1 illustrates minimal empirical bias in the MLEs of the logistic regression coefficients ( β̂ ). However, the positive mean bias of the standard OR estimators obtained via exponentiation is quite marked, especially for those corresponding to 12 larger sampling variances (i.e., for ψ̂1 and ψ̂ 3 ). In particular, the mean of ψ̂1 (7.39+6.04=13.43) across the 5,000 simulations is nearly double the true value of 7.39, while the mean for ψ̂ 3 (0.61+0.45=1.06) is actually on the wrong side of the null despite a sample size of 200. In contrast, we see dramatically reduced bias associated with the corrected estimator ( ψ̂ corr ) proposed in equation (6). In fact, most of the remaining positive bias in ψ̂1,corr is attributable to the slight positive small-sample bias associated with β̂1 . As expected, the standard OR estimator comes close to achieving median unbiasedness, while the proposed bias-corrected estimator approaches mean unbiasedness. In the case of 1, the proportion of estimates falling below the true OR of 7.39 was 48% for the usual MLE, as compared with 66% for ψ̂1,corr . The right-most column summarizes the performance of the alternative estimator ψ̂1*, corr in eqn. (8), where we take p=0.6. This estimator provides a clear compromise between ψ̂1 and ψ̂1,corr in terms of mean and median bias. Note that the method is quite effective at controlling the percentage of estimates falling below the true value of 1 at a level less than or approximately equal to the desired threshold of 60%. Conclusions in the case of 3 are very similar to what we see with 1, despite the negative value of β3; that is, mean bias and MSE are dramatically reduced for ψ̂ 3, corr relative to ψ̂ 3 , while a compromise that controls the resulting median bias via p=0.60 meets the desired objective and provides an effective intermediate. In the case of 2, the average standard error of approximately 0.38 is small enough that there is relatively little difference between ψ̂ 2 and ψ̂ 2, corr , and the slight median bias incurred with ψ̂ 2, corr produces no distinction between 13 ψ̂ 2, corr and ψ̂*2, corr in any of the 5,000 replications (i.e., in this case one can directly target mean unbiasedness without inducing median unbiasedness beyond the tolerated level). We note that the simulation results agree remarkably well with the approximate result in (7). For example, in Table 1 we find the following empirical estimates: σ̂1 0.98, σ̂ 2 0.38, and σ̂ 3 1.07 . Inserting these into the calculation Φ(σ̂ j / 2) suggests that 69%, 58%, and 70% of the estimates based on ψ̂1, corr , ψ̂ 2, corr , and ψ̂ 3, corr , respectively, should fall below the true ORs. These closely match the observed percentages of 66%, 56%, and 70% in Table 1, also suggesting that Φ(σ̂ j / 2) provides a reasonable estimate of the probability of underestimation when using the bias-corrected estimator (6) in practice. The results in Table 2 (Poisson regression) and Table 3 (Cox regression) are qualitatively very similar to those in Table 1, as were simulation results under a variety of other conditions (not shown). As in Tables 1-3, these experiments continued to confirm what is clear from equation (5), i.e., that the magnitude of positive bias associated with the standard estimator of j and the extent of the adjustments implemented via (6) or (8) depend upon the average standard error associated with β̂ j . Figure 2 compares histograms representing 4,000 standard and bias-corrected OR estimates ( ψ̂ 3 and ψ̂ 3, corr ) based on a replication of the simulation study summarized in Table 1, where vertical lines mark the true OR of 0.61. Note the longer and heavier tail associated with the histogram of standard estimates, yielding an empirical mean of 1.06. In contrast, the mean of the 4,000 corrected estimates was 0.61, identical to the true OR. A plot comparing histograms for ψ̂1 and ψ̂1, corr was almost identical visually. 14 The proposed bias-corrected and bias-reduced estimators offer remarkable improvements in mean squared error (MSE), due to the dual benefits of reduced bias and reduced variation in the point estimate. For example, Table 1 reflects simulation-based MSE estimates of 413.81 and 2.55 for ψ̂1 and ψ̂ 3 , respectively. Contrasting these with the values of 101.42 and 0.76 for ψ̂1, corr and ψ̂ 3, corr produces MSE efficiency estimates of only 25% and 30% when comparing * the traditional to the bias-corrected OR estimator [eqn. (6)]. The estimated MSEs for ψ̂1, corr and ψ̂*3, corr are 219.24 and 1.42, respectively, suggesting that the MLEs are only 53% and 56% MSE efficient relative to an estimator [eqn. (8)] that maintains a controlled and tolerable level of median bias. Tables 1-3 provide corresponding estimates for all simulation scenarios, further confirming substantial mean bias, precision, and MSE advantages of the proposed estimators. EXAMPLE Birth Weight Data As a practical example, we analyze publicly available data from a well-known study of low birth weight. Data on 189 births recorded at Baystate Medical Center in Massachusetts were altered by the authors of the source text(8) to protect confidentiality. For this example, we restrict attention to 100 births for which the mother required no first-trimester physician visits. The binary outcome characterizes birth weight (1 if 2500 g, 0 if < 2500 g), and we consider two covariates: the natural log of the mother’s weight at her last menstrual period (logLWT), and the mother’s history (HX) of premature labor (1 if any, 0 if none). Table 4 displays estimates of the adjusted ORs for logLWT and HX. Note particularly in the case of logLWT that the large positive regression parameter and its sizable standard error yield a remarkably high estimated OR of 6.02 based on standard methods. The bias-corrected OR 15 estimate [eqn. (6)] of 3.29 is reduced substantially (by 45%) relative to the usual MLE. Based on the estimated standard error of 1.10, however, the approximation in (7) suggests that approximately 71% of repeated samples from this population would yield a bias-corrected estimate for logLWT that falls below the true OR. The right-most column of Table 4 provides the bias-reduced estimate [eqn. (8)], upon limiting this percentage to the more moderate value of approximately 60%. The resulting estimate of 4.56 is also based on a method that reduces mean bias and MSE, but reflects a tempered adjustment to limit the extent to which median unbiasedness is sacrificed. In the case of the second covariate (HX), the bias-corrected and biasreduced estimates ( ψ̂ corr and ψ̂*corr , respectively) agree with each other to two decimal places based on eqn. (8), and are not dramatically different from the standard OR estimate. Additional Simulations to Mimic Example While the previous simulation studies demonstrate the properties of the bias-corrected and bias-reduced estimators in three different settings (logistic, Poisson, and Cox regression), those settings were selected for illustration as opposed to being based on real motivating data. It is thus informative to repeat the exercise under conditions similar to those in the low birth weight example. For the simulation studies summarized in Table 5, covariate data on “log(LWT)” and “HX of pre-term labor” were generated so as to closely mimic their observed joint distribution in the example data set consisting of 100 subjects. The outcome was generated based on a logistic regression of “high birth weight” on log(LWT) and HX, with adjusted ORs matching the biascorrected estimates (3.29 and 0.14, respectively) from the example (see Table 4). Results given in the top half of Table 5 are based on 5,000 simulated data sets, each of sample size 100 as in the example. The performances of the three adjusted OR estimators corresponding to log(LWT) continue to follow the general patterns seen in Tables 1-3. 16 Specifically, substantial positive bias is seen to be associated with ψ̂ , corresponding to an average estimate of (3.29+6.95)= 10.24. This standard OR estimator also displays extreme variability and empirical MSE, but minimal median bias as expected. The bias-corrected estimator ( ψ̂ corr ) nearly eliminates the mean bias and provides drastically reduced variability and MSE, at the expense of downward median bias. The bias-reduced estimate ( ψ̂*corr ) fits nicely between the other two, while maintaining the risk of underestimating the true OR at or near the specified level of 60%. These general features are maintained upon increasing the sample size to 200 (lower half of Table 5), despite the expected overall reductions in bias, variability, and MSE. The three estimators perform much more similarly in the case of the “HX” variable, though overall impressions remain the same. DISCUSSION We have presented a straightforward and readily justified approach to reducing bias when estimating common measures of effect (i.e., odds ratios, relative risks) based on regression analysis. In contrast to most prior research on bias reduction due to small or aberrant samples,(2; 4-6) our approach is geared toward the common scenario in which the usual normal asymptotics associated with regression coefficient estimators are deemed to provide a reasonable approximation. We believe our proposal is unique relative to prior work in terms of the simplicity and ease of use characterizing the suggested estimators, as well as the magnitude of their impact upon bias, variability and MSE on the measure-of-effect scale. Our conclusions regarding the standard estimator ( ψ̂ j e β̂ j ) may be summarized in terms of a list of pros and cons relative to the alternatives in (6) and (8). On the plus side, ψ̂ j is familiar and convenient, and achieves approximate median unbiasedness. It is also an ML 17 estimator, thus possessing familiar desirable asymptotic properties as well as transformation invariance characteristics (e.g., 1 /ψ̂ j is the MLE for 1 /ψ j ). The proposed alternative estimators are comparable in terms of ease of computation, and they share the same asymptotic properties as the MLE but lack the invariance property under typical sample sizes. On the other hand, ψ̂ j is subject to guaranteed and potentially extreme positive (mean) bias in practice. Its sampling variability is certain to be higher than that of the proposed alternative estimators. It follows that ψ̂ j will also carry a higher MSE, often markedly, given its relative deficiencies in terms of both bias and variability (see Tables 1-3 for illustration). The key result in equation (5) clarifies the explicit form of the positive bias characterizing ψ̂ j , and the manner in which it sacrifices traditional mean unbiasedness in favor of median unbiasedness. In our view, the median unbiasedness criterion has the disadvantage of ignoring the magnitude of extreme estimates in the sampling distribution, thus producing an incomplete and perhaps misleading performance measure for an estimator. We also note that the question of invariance holds further relevance for grasping the relative merits of the standard estimator as opposed to the proposed alternatives. Because of the invariance of the MLE and because ORs and RRs are ratio measures, for example, one may be prone to view a hypothetical point estimate that doubles the true value and one that halves it as equally erroneous. However, one can argue that the statistical criteria upon which we focus (bias, variability, MSE) do not lose their relevance simply because we are dealing with a ratio measure. These objective criteria do not view such proportionality errors as equivalent, but rather tend to give equal weight to equivalent additive deviations from the true value on the measure-of-effect scale. Doing so provides protection against the large errors in estimation that come into play due to the right-skewed sampling distribution on that scale. 18 When standard errors are large, eqn. (7) suggests that the bias-corrected estimator (6) may require a substantial deviation from median unbiasedness in order to achieve approximate mean unbiasedness. The plug-in estimator Φ(σ̂ j / 2) provides a convenient way to assess the extent of such sacrifice. Concern over this may make the class of estimators in (8) particularly appealing, because they permit the investigator to exert explicit control over the approximate risk of underestimating true measures of effect in practice. Our example and simulation studies demonstrate that marked gains in mean bias, variability, and MSE efficiency can still be obtainable when tolerating only a moderate forfeiture of median unbiasedness. The fact that clear improvements in these important estimation criteria are so readily available is eye-opening, and could be quite valuable. We believe that the OR (or RR) scale is an appropriate one upon which to seek them, given that it is so commonly the scale of reporting and interpretation in research that relies on these types of regression analysis. 19 REFERENCES 1. Cox DR, Oakes D. Analysis of survival data. London: Chapman & Hall, 1984. 2. McCullagh P, Nelder JA. Generalized linear models, 2nd edition. New York: Chapman & Hall, 1989. 3. Agresti A. Categorical data analysis, 2nd edition. New York: John Wiley & Sons, 2002. 4. McCullagh P. Tensor methods in statistics. London: Chapman & Hall, 1987. 5. Firth D. Bias reduction of maximum likelihood estimates. Biometrika 1993;80:27-38. 6. King G, Zeng L. Logistic regression in rare events data. Political Analysis 2001;9:137163. 7. Jensen JLWV. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta Mathematica 1906;30:175-193. 8. Hosmer DW, Lemeshow S. Applied logistic regression, second edition. New York: John Wiley & Sons, 2000. 9. Allison PD. Convergence problems in logistic regression. In Numerical Issues in Statistical Computing for the Social Scientist (Altman, M., Gill, J., and McDonald, M.P., eds.). Hoboken, NJ: John Wiley & Sons, 2004 (pp. 238-252). 10. Rosner B. Fundamentals of biostatistics, 5th edition. Pacific Grove, CA: Duxbury, 2000. 11. Molenberghs G, Verbeke G. Models for discrete longitudinal data. New York: Springer-Verlag, 2005. 12. Davidian M, Giltinan DM. Nonlinear models for repeated measurement data. New York: Chapman & Hall, 1995. 13. Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13-22. 14. Read CB. Median unbiased estimators. In Encyclopedia of Statistical Sciences, Volume 5 (Kotz, S. and Johnson, N.L., eds.). New York: John Wiley & Sons, 1985 (pp. 424-426). 15. Duan N. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistical Association 1983;79:605-610. 16. Manning WG. The logged dependent variable, heteroscedasticity, and the retransformation problem. Journal of Health Economics 1998;17:283-295. 20 17. Venzon DJ, Moolgavkar SH. A method for computing profile-likelihood based confidence intervals. Applied Statistics 1988;37:87-94. * † ‡ ¶ Table 1. Simulation Results: Logistic Regression* Performance of Estimators: Mean Bias [Median Bias] (SD) {Empirical MSE} Percentage of estimates < true value β̂ ψ̂ corr ‡ ψ̂ † Variable ψ̂*corr ¶ Mean (SD) 6.04 [0.51] 0.60 [2.36] 2.99 [1.17] X1 (19.42) (10.05) (14.50) 2.09 (0.98) {413.81} {101.42} {219.24} (1=2, 1=7.39) 48% 66% 58% 0.31 [0.06] 0.09 [0.14] 0.09 [0.14] X2 (1.26) (1.15) (1.15) 1.03 (0.38) {1.69} {1.33} {1.33} (2=1, 2=2.72) 48% 56% 56% 0.45 [0.01] 0.00 [0.26] 0.20 [0.15] X3 (1.53) (0.87) (1.17) 0.51 (1.07) {2.55} {0.76} {1.42} (3=.5, 3=0.61) 51% 70% 60% Based on 5,000 replications with n=200 in each case; Covariate distributions described in text Usual MLE for adjusted OR Bias-corrected estimate [eqn. (6)] Bias-reduced estimate [eqn. (8)]; Using p=0.60 to limit proportion of estimates falling below true RR to approximately 60% or less 21 * † ‡ ¶ Table 2. Simulation Results: Poisson Regression* Performance of Estimators: Mean Bias [Median Bias] (SD) {Empirical MSE} Percentage of estimates < true value β̂ ψ̂ corr ‡ ψ̂ † Variable ψ̂*corr ¶ Mean (SD) 0.19 [2.73] 1.56 [1.67] 4.01 [0.08] X1 1.99 (14.00) (8.32) (10.78) (0.94) {212.18} {69.24} {118.60} (1=2, 1=7.39) 51% 68% 61% 0.07 [0.00] 0.02 [0.05] 0.02 [0.05] X2 1.01 (0.57) (0.56) (0.56) {0.34} (0.20) {0.31} {0.31} (2=1, 2=2.72) 50% 53% 53% 0.51 [0.00] 0.24 [0.15] 0.00 [0.27] X3 (1.94) (1.02) (1.46) 0.49 {4.02} (1.09) {1.05} {2.18} (3=.5, 3=0.61) 50% 71% 60% Based on 5,000 replications with n=100 in each case; Covariate distributions described in text Usual MLE for adjusted OR Bias-corrected estimate [eqn. (6)] Bias-reduced estimate [eqn. (8)]; Using p=0.60 to limit proportion of estimates falling below true RR to approximately 60% or less 22 * † ‡ ¶ Table 3. Simulation Results: Cox Regression* Performance of Estimators: Mean Bias [Median Bias] (SD) {Empirical MSE} Percentage of estimates < true value β̂ ψ̂ corr ‡ ψ̂ † Variable ψ̂*corr ¶ Mean (SD) 4.21 [0.08] 1.78 [1.44] 0.14 [2.44] X1 2.01 (14.68) (9.17) (11.47) (0.93) {233.09} {84.02} {134.75} (1=2, 1=7.39) 49% 67% 59% 0.08 [0.02] 0.03 [0.03] 0.03 [0.03] X2 1.01 (0.54) (0.53) (0.53) {0.30} (0.19) {0.28} {0.28} (2=1, 2=2.72) 48% 52% 52% 0.47 [0.00] 0.22 [0.14] 0.02 [0.25] X3 (1.53) (0.88) (1.18) 0.48 {2.57} (1.04) {0.78} {1.43} (3=.5, 3=0.61) 50% 69% 59% Based on 5,000 replications with n=250 in each case; Covariate distributions described in text Usual MLE for adjusted OR Bias-corrected estimate [eqn. (6)] Bias-reduced estimate [eqn. (8)]; Using p=0.60 to limit proportion of estimates falling below true RR to approximately 60% or less 23 * † ‡ ¶ Table 4. Logistic Regression: Analysis of Birth Weight Data* Point Estimates β̂ ψ̂ corr ‡ ψ̂ † Variable ψ̂*corr ¶ (SE) 1.80 logLWT 6.02 3.29 4.56 (1.10) HX premature 1.77 0.17 0.14 0.14 labor (0.64) Data from Hosmer and Lemeshow,8 restricting to mothers with no first trimester physician visits (N=100); OR for one-unit increase in logLWT Usual MLE for adjusted OR Bias-corrected estimate [eqn. (6)] Bias-reduced estimate [eqn. (8)]; Using p=0.60 to limit proportion of estimates falling below true RR to approximately 60% or less 24 Table 5. Simulation Results Mimicking Low Birth Weight Example * Performance of Estimators: Mean Bias [Median Bias] (SD) {Empirical MSE} Percentage of estimates < true value Sample β̂ ψ̂ corr ‡ ψ̂ † Variable ψ̂*corr ¶ size (n) Mean (SD) 6.95 [0.14] 0.18 [1.76] 3.73 [0.77] logLWT 1.26 (34.66) (7.58) (21.99) (1.36) {1249.80} {57.48} {497.23} (1=1.19, 1=3.29) 49% 72% 58% 100 0.02 [0.02] 0.01 [0.03] 0.00 [0.03] HX premature labor (0.12) (0.09) (0.10) 2.07 {0.02} (0.82) {0.01} {0.01} (2=1.97, 2=0.14) 54% 65% 63% Sample β̂ ψ̂ corr ‡ ψ̂ † Variable ψ̂*corr ¶ size (n) Mean (SD) 1.94 [0.08] 0.14 [0.91] 0.86 [0.59] logLWT 1.22 (6.62) (3.90) (5.09) (0.91) {47.59} {15.21} {26.66} (1=1.19, 1=3.29) 49% 66% 58% 200 0.01 [0.00] 0.00 [0.02] 0.00 [0.02] HX premature labor (0.07) (0.07) (0.07) 2.00 {0.006} (0.46) {0.004} {0.004} (2=1.97, 2=0.14) 53% 61% 61% * Based on 5,000 replications in each case; Covariate distributions described in text † Usual MLE for adjusted OR ‡ Bias-corrected estimate [eqn. (6)] ¶ Bias-reduced estimate [eqn. (8)]; Using p=0.60 to limit proportion of estimates falling below true RR to approximately 60% or less 25 Figure Legends: Figure 1. Plot of the bias factor (e σ 2j / 2 ) characterizing the usual estimator e β̂ j , versus the standard deviation (j) of the approximate sampling distribution of β̂ j . Figure 2. Histograms of 4,000 standard and bias-corrected estimates ( ψ̂ 3 and ψ̂ 3, corr ), based on repeating the simulation study summarized in Table 1. Normal kernel density estimates accompany each histogram. The mean of the standard estimates is 1.06, markedly exceeding the true OR of 0.61. The mean of the corrected estimates is 0.61. 26 Fig. 1: Plot of Bias Factor vs. j Bias Factor 3.0 2.5 2.0 1.5 1.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Sigma 27 1.0 1.1 1.2 1.3 1.4 1.5 Fig. 2: Histograms of Standard and Bias-Corrected Estimates 28