The use of the bootstrap in the analysis of case-control studies with missing data Volkert Siersma*, Christoffer Johansen** *Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark **Institute of Cancer Epidemiology, The Danish Cancer Society, Copenhagen, Denmark Abstract Background. Valid inference and efficiency are concerns when there are missing values in the risk factors of case-control studies. Complete-case analysis is inefficient and often biased for these studies. Multiple imputation is a more efficient approach, but valid confidence intervals require the complex assumption that the imputation is proper which is sometimes hard to ascertain. Computationally intensive resampling methods assume less of the imputation method in exchange for computation time. Methods. A practical bootstrap method is presented to conduct inference in multivariate casecontrol studies when risk factors have incomplete data. This is illustrated through two case studies with considerable missing data in some risk factors of interest. The first study illustrates the applicability of the bootstrap method compared to complete-case analysis and multiple imputation. The second study illustrates the limitations of the bootstrap method. Results. The bootstrap approach gives very similar results to multiple imputation. Conclusion. The bootstrap approach can be preferable when the imputation procedure cannot be ascertained fully proper, but merely to result in unbiased estimates. Key words: nonparametric bootstrap, bootstrap confidence intervals, missing values, multiple imputation, matched case-control study Missing values are common in epidemiological data and even well designed and executed experiments can feature a considerable number of missing values. A study design often used to assess the effects of multiple factors on the risk for a relatively rare disease is the matched casecontrol design. With this design individuals with the disease are sampled and for each case, one or more controls are sampled, similar to the case in certain characteristics but without the disease. The risk factors are assessed often using conditional logistic regression1,2. In these studies risk factors are sampled retrospectively and observations may be incomplete often because of causes beyond the scope of the problem addressed in the study. Many analysts exclude any subject with missing values from the data and proceed using methods for data without missing values. This so-called complete-case analysis is the default for many, if not all, statistical computer packages when faced with data containing missing values. This approach gives biased estimates for case-control studies if the occurrence of missing values depends on both the case-control identifier and the risk factors3,4. Additionally, complete-case analysis is inefficient because of the loss of information by excluding subjects with incomplete observations. Several incompletely observed risk factors can easily leave only few subjects with complete data. Multiple imputation5,6 is a method that overcomes inefficiency and claims valid inference when values are missing at random – MAR5 – i.e. the fact that the value is missing is unrelated to the actual value that is missing; while still being just as general as the complete-case approach and computationally inexpensive, and therefore popular7. In this approach, several complete datasets are created by filling in incomplete observations using information from existing data. These completed datasets are then analysed by standard methods. The results are combined with Rubin’s rule6,8 where the mean of the obtained estimates has a variance estimated by a sum of between- and within-repetition variances. Few repetitions are needed because the simulation error is relatively small compared to the overall uncertainty and Rubin’s rule accounts explicitly for this error6,9. The variance estimate obtained by multiple imputation is criticised for inconsistency with possible progressive bias in certain settings10. Additionally, the condition that the imputations are proper6, a complex requirement for valid inference, is often hard to establish in practice9,11. Multiple imputation was originally devised for large public-use datasets, where trained statisticians create a, for computational and logistical reasons, limited number of completed datasets for public dispersion for possibly many end-users with access only to standard statistical software. For case-control studies the design demands often one end-user and an analysis, which on modern computers can use much more simulated datasets than originally devised. This opens up to computationally intensive resampling methods to circumvent the above criticism. The nonparametric bootstrap is a general method of inference for statistics with an in principle unknown distribution12-15. The data generation process is mimicked through sampling with replacement from the original sample to obtain a replica dataset of the same size. Assuming that the original data is representative of the total population, the parameter estimates from many resampled replica datasets construct an empirical distribution for that estimate which is used for inference. The use of the nonparametric bootstrap for inference on imputation estimators has been acknowledged before7,16-18 , but generally discarded as being too cumbersome computationally. We present a practical bootstrap method for inference in multivariate case-control studies when risk factors have incomplete data. This is done through two case studies with considerable missing data in risk factors of interest. The first study illustrates the applicability of the bootstrap method compared to complete-case analysis and multiple imputation. The second study illustrates the limitations of the bootstrap method. RSV infection – the nonparametric bootstrap Respiratory Syncytial Virus (RSV) infection causes hospitalisation during the first two years of life for some 2 percent of children born each year. A matched case-control study19 features all 1272 hospitalisations for RSV infection in two Danish counties in the 5-year period from 1990 to 1995. Whenever possible, five controls are randomly chosen from the Danish central personal register matched on gender, age and municipality. There are 6075 controls in the study. Potential risk factors are gestational age, birth weight, household size and space, the mother’s smoking habits and level of maternal antibodies against RSV. Two factors have a sizable portion of missing observations. For the smoking factor 38% of the entries are missing since this information was only collected in the later part of the study. Maternal antibodies are thought to have a possible effect only in the first three months after birth. Therefore this measurement is ordered only for the 286 children that are hospitalised during the first three months of their lives and is available for 233 of these. Additionally, since this information is expensive, it is obtained for only one corresponding, but randomly chosen, control. As maternal antibodies have disappeared after three months, this value is assumed zero for children over three months of age. A multivariate conditional logistic regression model is estimated for complete data using a Cox proportional hazard procedure present in many statistical software packages#. The estimated log(OR)s with corresponding confidence intervals are subsequently transformed to OR scale. To accommodate for non-linearities, continuous factors are made ordinal; if no natural categories exist, four categories approximately corresponding to the quartiles of the distribution of the risk factor are chosen. For these categories three sets, corresponding to three approaches to incomplete data, of ORs relative to the hypothesised lowest risk category of each risk factor, with corresponding confidence intervals are listed in Table 1. Complete-case estimates are reported in the first column of Table 1 (CC). These are obtained by excluding the children in which either the smoking status of the mother, or an antibody titer are missing from the data sample, and applying the estimation procedure to these reduced data. This approach is consistent here because missing smoking information depends solely on calendar time and missing blood tests are due to administrative irregularities considered random3,4. The completecase method is the default of most statistical computer packages and thus requires no extra time to implement, and only one call of the estimation procedure. Complete-case analysis of the RSV data shows significant effects of many relevant risk factors and could support a final conclusion. Other methods however could through efficiency gains give more evidence for the effect of birth weight and better assess the influence of crowding and maternal antibody titer, the latter being the most costly information. Multiple imputation estimates are reported in the second column of Table 1 (MI). In this approach M datasets are simulated by replacing missing values in the original data sample S by qualified guesses through an imputation procedure imp(S). These completed datasets are then analysed separately with the complete data procedure and the resulting M estimates are combined using # phreg in SAS or coxph in Splus, for example Table 1: Estimated Odds Ratios (OR) with corresponding 95% Confidence Intervals (CI) in a multivariate conditional logistic regression for the risk factors for hospitalisation for RSV infection. The estimates are constructed through Complete-case analysis (CC), Multiple Imputation (MI) and NonParametric Bootstrap (NPB), respectively . Risk factor level Gestational age <33 weeks NPB OR (95% CI) 4.65 (2.44-8.85) 3.88 (2.41-6.25) 3.75 (2.74-7.75) 1.64 (0.96-2.81) 1.73 (1.17-2.57) 1.66 (1.20-2.82) 36-37 weeks 1.31 (0.91-1.89) 1.43 (1.07-1.92) 1.40 (1.10-1.97) 38-39 weeks 1.13 (0.92-1.39) 1.18 (0.99-1.40) 1.16 (1.00-1.40) 1.00 1.00 1.00 <3.0 kg 1.61 (1.11-2.31) 1.42 (1.06-1.91) 1.46 (1.10-1.98) 3.0-3.5 kg 1.28 (0.94-1.75) 1.15 (0.89-1.47) 1.16 (0.90-1.51) 3.5-4.0 kg 1.12 (0.82-1.54) 1.06 (0.83-1.26) 1.07 (0.83-1.38) >4.0 kg Space per member of household MI OR (95% CI) 33-35 weeks >39 weeks Birth weight CC OR (95% CI) 1.00 1.00 1.00 <22 m2 1.36 (1.01-1.82) 1.10 (0.88-1.38) 1.09 (0.87-1.42) 22-28 m2 1.27 (0.97-1.67) 1.14 (0.91-1.42) 1.14 (0.92-1.48) 28-36 m2 1.06 (0.81-1.39) 1.03 (0.83-1.26) 1.02 (0.82-1.29) >36 m2 1.00 1.00 1.00 Age difference 0-2 years with next older sibling 2-4 years 1.70 (1.26-2.28) 1.76 (1.40-2.20) 1.74 (1.45-2.32) >4 years 1.45 (1.11-1.89) 1.23 (0.99-1.52) 1.22 (1.01-1.56) adult (no sibs) Maternal antibody titer 1.61 (1.26-2.05) 1.64 (1.34-1.99) 1.62 (1.40-2.07) 1.00 1.00 1.00 <210 1.35 (0.69-2.64) 1.22 (0.71-2.11) 1.57 (0.78-2.22) 210-275 1.23 (0.63-2.40) 1.68 (0.97-2.91) 1.85 (1.08-2.74) 275-330 1.65 (0.86-3.15) 1.75 (1.04-2.95) 1.92 (1.22-2.85) >330 Smoking status smoking of the mother non-smoking 1.00 1.00 1.00 1.64 (1.36-1.98) 1.56 (1.19-2.05) 1.57 (1.32-1.98) 1.00 1.00 1.00 Rubin’s rule6,8 to arrive at ORs and standard errors for the classes of the risk factors. The multiple imputation approach in Table 1 uses M=10 simulated datasets. Imputation of the two missing factors is based on sequential random draws from the conditional probability distribution of the risk factor given all complete variable sets in the study20,21. First, the smoking factor is imputed by draws from a Bernoulli distribution for the probability of a smoking mother conditional on all complete information – i.e. the risk factors without missing values, the case-control identifier and the matching variables, but not the maternal antibody titer – using a logistic regression model estimated from the part of the data for which the smoking information is not missing. Thereafter, the antibody titer is imputed by a draw from a linear regression model, estimated on that part of the data for which the antibody titer is not missing, on all complete information, now including the newly imputed smoking factor. This imputation scheme focuses on easy implementation rather than being proper6. Single-outcome procedures for linear and logistic regression are standard in statistical computer packages and usually produce model predictions for missing outcome when all covariates are present. The procedure is constructed by iteration of these standard procedures and functions that produce random draws from probability distributions. This imputation scheme is proper for a single factor with missing values, under an assumption of ignorability, i.e. missing values in a factor depend in a similar way as its observed values on the other variables in the study9. The proposed sequential imputation algorithm cannot be proper since not all available information is used to impute the smoking factor. Proper imputation would be approached by iterating the above imputation procedure between the factors with missing information; by additionally using the imputed antibody count in the logistic model for the smoking factor in a next iteration step20,21. The uniterated imputation procedure used here is rendered first-moment proper – gives unbiased estimates – by an assumption of independence of smoking and antibody titer conditional on all other variables. The multiple imputation procedure takes some time to implement. Software that performs the iterations of the sequential method is available however20,22. The estimation procedure and the imputation procedure are called M=10 times, combining the results only once. This gives usually small computing times also for more elaborate imputation schemes. Multiple imputation is more efficient compared to complete-case analysis, as evidenced by narrower confidence intervals (Table 1). Especially, evidence to support a (non-linear) effect of maternal antibodies is caused by efficiency gains. Effects of birth weight and crowding are slightly lower than in the complete-case analysis, which might be the result of incorrectly assumed independences. Estimates obtained by a non-parametric bootstrap are reported in the last column of Table 1 (NPB). The bootstrap method for inference on imputation estimates from an incomplete dataset S is determined by a resampling procedure res(S) and an imputation procedure imp(S). Bootstrap replica estimates ( b ) = (imp(res( S ))) are obtained by application of the imputation procedure on datasets obtained through the resampling procedure. A large number B replicas construct an empirical distribution, that approximates the distribution of the parameters .& A 95% confidence interval is estimated by the 2.5% and 97.5% percentiles of the replica estimates ( b ) ; improvements to this percentile method exist24. B=1000 bootstrap replicas are constructed in the RSV study to obtain confidence intervals15. A condition for the bootstrap approach to give valid inference is consistency of the estimate , which is obtained through the imputation procedure imp(S) described above, which is argued to provide consistent estimates. Bootstrap point estimates can be obtained either by applying the imputation procedure to the original dataset ( S ) = (imp( S )) , or by a central moment measure (mean or median) of the bootstrap replica estimates. The former, reported in Table 1, stays close to the original data, but the latter is more robust to imputation variance and generally less biased18. The focus of the bootstrap approach on inference, not estimation, is underlined by this ambivalence. The resampling scheme res(S) should mimic as closely as possible the actual sampling of the data12,14. In a matched case-control study, the controls are not chosen randomly from the population. Cases are therefore sampled with replacement from children hospitalised for RSV and thereafter, ideally, for each case, controls are sampled with replacement from all children not hospitalised that match the case, i.e. have the same sex, have the same birth month and live in the same municipality. This can be approximated by sampling with replacement from the matched sets – i.e. the sets of cases and their controls – since the matching in the RSV study is tight, such that a resampled dataset consists of just as many matched sets as in the original dataset. Approximate random sampling by matched sets increases the computation speed and is used here. The non-parametric bootstrap described above is not more complex than multiple imputation to implement. It is however much more computer intensive. The estimation, imputation and resampling procedure are all called B=1000 times, which gives long computing times. Our implementation of the bootstrap procedure applied to the RSV study gave results in Table 1 after approximately one hour*. These computing times should not be an obstacle in practice. The procedure can be speeded up by better implementation and faster computers. The nonparametric bootstrap gives very similar results to multiple imputation. The largest difference is seen in the estimates for the levels of the maternal antibodies. Whereas the confidence intervals are in agreement, or even slightly narrower for the bootstrap approach, the point estimates are more different, probably because of imputation bias in the estimate used. Missing values in a considerable part of the data because of organisational and financial reasons, as in the RSV study, are often encountered. Complete-case analysis gives unbiased estimates and inference, if the missing values do not depend on the case-control identifier, at the expense of efficiency. The latter is seen in Table 1 from the wider confidence intervals compared to both other methods. Even though, conclusions from the three analyses do not differ much. The information on the maternal antibodies is the most costly however and the wide confidence intervals in the complete-case analysis are caused by inefficiency. Both multiple imputation and the nonparametric bootstrap give smaller confidence intervals, and their results are strikingly similar. This indicates Shao and Sitter23 construct a bootstrap procedure for already imputed data, relevant when imputed data is the only data available, but not relevant in a situation when imputer and analyst are the same person. * Pentium 600MHz, 128Mb RAM & that both approaches perform well, in spite of recent critique on multiple imputation inference10, a possibly improper imputation procedure and approximate resampling techniques. Welding exposure – limitations of the nonparametric bootstrap A second study assesses the risk of occupational exposure to welding on breast cancer25 with considerable missing values in the risk factors. This matched case-control study features 1326 cases of breast cancer from the two occupational cohorts for the period 1985-1994 or 1975-1993 from Sweden or Denmark respectively. From these cohorts one control, free of breast cancer at the corresponding case’s date of diagnosis, is chosen randomly, matched on sex, age and nationality. Information on exposure to various forms of welding and solvents is obtained by questionnaires sent out to, sometimes former, employers. An overview is given in Table 2. Up to 52% of the information on exposure to the risk factors is missing, as companies could not remember exactly what a certain person was working with maybe decades ago. In the Danish data the total employment period up to diagnosis is identified from pension fund records compulsory since 1964. For Swedish data however only information about the employment time after 1984 is present as this was collected from tax returns 1985 through 1994. Table 2 : Data description and estimated Odds Ratios (OR), relative to the baseline level of no exposure, with corresponding 95% Confidence Intervals (CI) in a multivariate conditional logistic regression for the increased risk of development of breast cancer for exposure to various forms of welding. The last column (# undefined) lists the number of bootstrap replica data samples out of 1000 that resulted in undefined estimates for the corresponding exposure. risk factor Country cases controls Denmark 733 733 Sweden 593 593 male 14 14 1312 1312 14 31 771 753 yes 13 3 no 733 724 yes 2 2 no 734 723 yes 57 74 no 582 563 missing exposure OR (95% CI) # undefined 0 (0%) matching variables Sex 0 (0%) female Resistance yes welding no Arch welding Other welding Solvents 1083 (41%) 1179 (44%) 1376 (52%) 1191 (45%) low 0.41 (0.13-0.95) 0 high 0.36 (0.11-0.76) 0 low 5.10 (1.13- ) 1 high 9.57 (1.71- ) 0 low 0.98 (0.00- ) 32 high 67.4 (0- ) 574 low 0.75 (0.46-1.10) 0 high 0.91 (0.74-1.11) 0 The missing values in this welding exposure study depend on seniority and duration of the employment, but not on the disease identifier. Consistent complete-case inference is therefore possible. This is however not a real option because of efficiency loss: a complete-case analysis only uses information from 1227 (46%) individuals, spreading the already sparse positive indications of welding and solvents exposure thin. Both multiple imputation and the nonparametric bootstrap are efficient options and the latter is attempted implemented as in the RSV study. Limitations of the nonparametric bootstrap are illustrated below. Following the paradigm for the nonparametric bootstrap, confidence intervals for the welding exposures are constructed from the imputation estimates from many datasets sampled with replacement from the available matched pairs. Note that this approximate resampling procedure is exact here. We aim at estimating ORs for low and high exposure to each welding factor, defined as less, respectively more than 2000 hours of total occupational exposure to welding or solvents. The exposure is calculated from the approximate weekly exposure, if exposed at all, and the total employment period. The total employment period is only partly known for the Swedish part of the data. The Danish employment data transformed to be comparable to the Swedish data, i.e. the total employment times truncated to form employment times after 1984, show no evidence of a significant difference from the Swedish data (Wilcoxon p-value 0.3448). An assumed similarity in employment times between the two countries is used in the imputation procedure. The incomplete Swedish total employment information is exchanged with a random draw from the distribution observed in the Danish part of the data, restricted to the employment after 1984 from the Swedish data and upwards. This nonstandard part of the imputation procedure interferes with available general imputation software and increases implementation efforts. Relatively few subjects in the welding exposure study indicate any exposure. This might imply that regressions where several welding exposure indicators are included are badly identified. This might not be the case with the original dataset, but there is a risk of data separation in resampled datasets. Sequential regression imputation20,21 , now using ordinal logistic regression for the three possible classes of exposure, is again attempted in the welding exposure study. Blindly applying this paradigm to a data sample where two factors are separated prohibits the imputation of a positive indication for one factor when the other factor is present; a strong model assumption based when these factors are sparse and separated often by coincidence. The imputation procedure is constructed such that no sparse factors are included as dependent variables in the imputation models thereby circumventing separation problems. This is justified by an assumption that there are no systematic correlations between exposure to various forms of welding or solvents, all sparse factors. This implies an imputation procedure where the ordinal logistic models depend only on the casecontrol identifier, age and country of residence; sex is also left out of the imputation models since the sample consists predominantly of women. To estimate the ORs for the various forms of welding or solvents all exposures are included in the multivariate conditional logistic regression for each bootstrap replica data sample. Due to the resampling, there is a possibility that for either cases or controls or both a certain exposure is not present in the replica data sample such that estimates for this factor cannot be identified. The probability of having at least one situation of no exposure indication at all in the collection of B bootstrap resamples is given by N n N B N 1 1 exp( n) 1 1 N where N is the total sample size and n the number of exposure. This probability has relatively fast convergence for N . It then follows that, in studies with a decent sample size and using a suggested15 B=1000, the above probability is 4.4% for n=10. If, as a rule of thumb, a 5% possibility of having no observations with exposure indication to a certain factor in at least one of the resampled datasets is accepted, at least 10 exposure indications, the sum of cases and controls, should be in the original dataset. Consequently, if either cases or controls have less than 10 exposure indications for a certain factor, a more than 5% probability of data separation exists resulting in infinite parameter estimates. B The above rule of thumb on total exposure indication is violated for the data of the welding exposure study as there are only four indications of other forms of welding, of which only one in the high class of exposure. When for a certain exposure in a bootstrap replica data sample no cases and controls have a positive indication, estimates are undefined for these exposures. To calculate correct confidence intervals, undefined bootstrap replica estimates have to be taken into account. This is done by calculating adjusted percentiles of the distribution of the defined estimates such that with 95% confidence the estimate is defined and within the interval. This adjustment is implemented by taking the (100- )% confidence interval from the defined estimates, where is defined as u B max 0 , 0.05 B B u with B the number of bootstrap resamples and u the number of undefined estimates. Undefined estimates appear for both exposure classes of other forms of welding and for the low exposure class of arch welding (Table 2) and their confidence intervals are shown adjusted accordingly. The exposure to other forms of welding is seen so sparse that the confidence interval is effectively all possible numbers, i.e. no effect estimate can be given with any confidence. The results of the bootstrap approach in Table 2 show a significant protective effect of resistance welding on breast cancer, but a harmful effect of arch welding. The confidence intervals are generally wide reflecting the low number of indications in each exposure level. A conclusion is that there is an effect of resistance welding and arch welding, but effect size or the existence of a dosage effect cannot be determined. The welding exposure study illustrates the breakdown of the bootstrap procedure on several points. Non-standard imputation and more exposure variables with missing values, additional to increased implementation efforts, increases the time needed for computation. For B=1000 the implemented bootstrap took 3 hours¥. Imputation through sequential regression20,21 using all available information, although possible even when complete separation exists, will depend too much on how sparse factors enter these models. Not including sparse factors as covariates in the imputation regression models can overcome this problem, and seems defendable when mutual exclusion cannot be argued for, but undermines the consistency if factual correlations are disregarded. With very sparse exposure, some resampled data will have no exposure indications at all for a certain factor resulting in unidentified estimates, but bootstrap confidence intervals can still be obtained. Of course, in the welding exposure study, discarding the exposure to other forms of welding and ¥ Pentium 600MHz, 128Mb RAM deleting the few individuals with this exposure could be a justified approach without much loss of efficiency for the other estimates in the face of this increase in complexity. Discussion Multiple imputation as a general approach for handling missing data has come under fire by critics claiming that proper imputations, necessary for valid inference, are difficult to produce, especially in data where multiple factors are deficient9,11, and even then multiple imputation is biased in some cases10. Computer intensive resampling techniques are a real alternative for many case-control studies with missing values, where imputer and analyst are one person and analysis is performed on one computer. The nonparametric bootstrap needs only a first-moment proper imputation scheme for valid confidence intervals, i.e. an imputation resulting in unbiased estimates, which is much easier obtained and assessed than fully proper imputation. The main point of critique on the use of the bootstrap to obtain confidence intervals for imputation estimates has been the long computing times7,18,26. These times need not be unacceptable when a simple imputation procedure can be used and estimation of the bootstrap replica estimates is straightforward, like in the RSV study. The multiple imputation drawbacks as stated above are hereby avoided. The welding exposure study shows that sparse data can, due to separation in resampled data, force the imputation procedure to use less information than wanted and thereby undermining the consistency of the imputation estimate. When the study is such that the imputation can be assumed to give consistent imputation estimates, then sparse risk factors only inflate the bootstrap confidence intervals when exposure is very rare, in which case the analyst could favourably consider to eliminate that factor from the analysis. A complete-case approach often gives consistent analysis in matched case-control studies with missing values in the risk factors3,4. If evidence from this analysis is not enough to support a final conclusion, a bootstrap approach can be used if a first-moment proper imputation procedure and subsequently consistent estimation in each resampled data can be performed sufficiently fast. Multiple imputation will in this case also give consistent inference in only a fraction of the computing time when additionally the imputation can be assumed proper, which is sometimes the case for particulary simple missing mechanisms, and when the estimates are approximately normal distributed. Consequently, a bootstrap approach has an advantage over multiple imputation when the imputation procedure can be assessed first-moment proper – hot-deck and random-draw regression methods usually satisfy this condition7 and are generally not hard to implement – but not fully proper, or when the estimates are not approximately normal. Acknowledgements This work was supported by Public health Services Grant 2R01-CA54706-10A1 from the National Cancer Institute. We thank H.E. Nielsen – Department of Paediatrics, Gentofte University Hospital – for his contributions and permission to use data from the RSV-study in this paper, and B. Floderus and N. Håkansson – Division of Epidemiology, Karolinska Institutet – for their work and permission to use data from the Swedish arm of the welding-study in this paper. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Breslow NE and Day NE: Statistical Methods in Cancer Research, 1, The Analysis of Case-Control Studies, Lyon, International Agency for Research on Cancer, 1980 Breslow NE: Statistics in epidemiology: the case-control study. Journal of the American Statistical Association 91: 14-28, 1996 Breslow NE and Cain KC: Logistic regression for two-stage case-control data. Biometrika 75: 11-20, 1988 Lipsitz SR, Parzen M and Ewell M: Inference using conditional logistic regression with missing covariates. Biometrics 54, 295-303, 1998 Rubin DB: Inference and missing data. Biometrika 63: 581-592, 1976 Rubin DB: Multiple Imputation for Nonresponse in Surveys, New York, Wiley & Sons, 1987 Rubin DB: Multiple imputation after 18+ years. Journal of the American Statistical Association 91: 473-489, 1996 Li KH, Raghunathan TE and Rubin DB: Large-sample significance levels from multiply-imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association 86, 1065-1073, 1991 Schafer JL: Analysis of Incomplete Multivariate Data, London, Chapman & Hall, 1997 Robins DB and Wang N: Inference for imputation estimators. Biometrika 87: 113-124, 2000 Binder DA and Sun W: Frequency valid multiple imputation for surveys with a complex design. Proceedings of the survey research methods section of American Statistical Association 7: 281-286, 1996 Efron B and Tibshirani R: An Introduction to the Bootstrap. New York, Chapman & Hall, 1993 Manly BFJ: Randomization, Bootstrap and Monte Carlo Methods in Biology, 2 nd edition. London, Chapman & Hall, 1997 Davison AC and Hinkley DV: Bootstrap Methods and their Applications, London, Chapman & Hall, 1997 Carpenter J and Bithell J: Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine 19: 1141-1164, 2000 Efron B: Missing data, imputation and the bootstrap (with discussion). Journal of the American Statistical Association 89: 463-479, 1994 Laird NM and Louis TA: Empirical Bayes confidence intervals based on bootstrap samples (with discussion). Journal of the American Statistical Association 82:739-757, 1987 Little RJA and Rubin DB: Statistical Analysis with Missing Data, 2nd edition. New York, Wiley and Sons, 2002 Nielsen HE, Siersma V, Andersen S, Gahn-Hansen B, Mordhorst CH, Nørgaard-Pedersen B, Røder B, Sørensen TL, Temme R and Vestergaard BF: Respiratory Syncytial Virus infection: Risk factors for hospital admission, a population-based study. Acta Paediatrica 92, 1314-1321, 2003 Kennickell AB: Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation, Proceedings of the Survey Research Methods Section of the American Statistical Association, 110, 1991 Raghunathan TE, Lepkowski JM, Solenberger P and Van Hoewyk J: A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology 27, 85-95, 2001 Raghunathan TE, Solenberger P and Van Hoewyk J: IVEware: imputation and variance estimation software. Michigan, Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan. http://www.isr.umich.edu/src/smp/ive/ 2002 Shao J and Sitter RR: Bootstrap for imputed survey data. Journal of the American Statistical Association 91, 1278-1288, 1996 Efron B: Better bootstrap confidence intervals. Journal of the American Statistical Association 82: 171-200, 1987 Johansen C, Floderus F, Håkansson N and Olsen, JH: unpublished data 2003 Rubin DB: Comments on “Missing data, imputation and the bootstrap” by B Efron. Journal of the American Statistician Association, 89: 475-478, 1994