Published By Science Journal Publication Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 http://www.sjpub.org/sjmct.html © Author(s) 2013. CC Attribution 3.0 License. Research Article doi: 10.7237/sjmct/248 Hot Deck Propensity Score Imputation For Missing Values Dr. Benjamin Mayer Institute of Epidemiology and Medical Biometry, Ulm University, Schwabstr. 13, 89075 Ulm, Germany. e-mail: benjamin.mayer@uni-ulm.de, Phone: +49-731-5026896, Fax: +49-731-5026902 Accepted 22th May, 2013 ABSTRACT The adequate handling of missing data in medical research still constitutes a major problem of statistical data analysis. Although different imputation strategies and methods have been developed in recent years, none of them can be used unhesitatingly in arbitrary missing data situations. A commonly observed drawback of the established imputation methods is their sensitivity against the number of missing values in the analysis set, the underlying missing data mechanism and scale level of the incomplete variable. The extent of this sensitivity is indeed different for individual methods. A novel approach for the imputation of missing data was proposed here which especially does not depend on the scale level of the variable that is affected by missing values. This approach in a multiple imputation setting was based on the calculation of propensity scores and their usage in creating adequate imputation values, resting upon a Hot Deck principle. In the course of a real-data based simulation study, the proposed method was compared to a standard multiple imputation approach and a complete case analysis strategy. The results showed that there was a dependency of imputation’s goodness from the assumed missing data proportion, too. However, the developed imputation method generated consistent estimations of the width of confidence intervals and could be applied for different kinds of incomplete variables straightforwardly, which is not possible with other imputation approaches that easily in general. KEYWORDS: Hot deck imputation; Missing values; Multiple Imputation; Propensity Score. 1.0 INTRODUCTION In medical research, the adequate handling of missing values still constitutes a major problem. Despite extensive methodological research on this topic, there is still no uniform solution for the handling and imputation of missing data yet. This circumstance is also reflected in relevant guidelines for clinical trials which indeed summarize the problematic aspects of missing values and present different approaches for their imputation meanwhile. However, it has not been possible to formulate uniform advices so far. Hence, it is much more important to improve the existing methods and to develop novel approaches for imputing missing data. Within the latter, it is worthwhile to investigate a preferably wide range of missing data “situations”, which arise from different missing data mechanisms and different types of variables which are affected by missing values. The problems if some values of a dataset cannot be observed are substantial. First of all, each missing value stands for a loss of information. The generalization of study results is therefore limited to a certain extent, depending on the amount of missing items in the dataset on the one hand and which variables are affected on the other hand. Just ignoring those subjects with missing information for one or more analysis variables may lead to a considerable loss of statistical power, especially when the number of analysis variables and thus the probability for a missing observation is high. This so called complete case analysis (CCA) strategy only leads to convincing results if the missing data arise independently of the underlying study conditions. The most serious problem with missing values is biased parameter estimates, however. Missing values may destroy a generated structural equality between comparison groups, for example, with the result that an investigated treatment effect includes some bias. With respect to the stated problems of data analysis and interpretation, a methodological examination of alternative approaches to handle the missing data problem apart from only ignoring them has been necessary. There is a well established and accepted missing data theory which can be significantly traced back to Allison (2001), Little (2002) and Rubin (1987). According to this theory, the occurrence of missing data can be divided into three so called missing data mechanisms, which describe the relation between observed and missing values. The term “Missing Completely At Random” (MCAR) characterizes a situation where the incomplete cases with missing values are a random sample of the whole dataset, so the occurrence of missing Corresponding Author: Dr. Benjamin Mayer Institute of Epidemiology and Medical Biometry, Ulm University, Schwabstr. 13, 89075 Ulm, Germany. e-mail: benjamin.mayer@uni-ulm.de, Phone: +49-731-5026896, Fax: +49-731-5026902 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 data is independent of other variables in the dataset. An example for MCAR would be a patient who is dying in the course of a clinical trial, but this event is totally independent of the participation on the trial. The second mechanism is called “Missing At Random” (MAR) and describes the situation where the occurrence of missing data can be explained by other variables of the dataset which are observed (e.g. patient fulfils exclusion criteria in the course of a trial). If one has to assume that the occurrence of missing values is in addition to it related to unobserved variables, the missing data mechanism is “Missing Not At Random” (MNAR). One may think of a patient diagnosed with depression who does not report about his actual mental constitution because he is afraid of potential consequences like inpatient treatment. Numerous imputation methods have been developed, whereas none of them is really applicable in a MNAR situation. They can be separated into either Single Imputation (SI) or Multiple Imputation (MI) methods. The crucial distinction between both strategies is that all SI methods impute each missing by one value, whereas the MI methods replace each missing value by m (m > 1) imputation values. The main advantage of MI over SI is that the imputation values are estimated more precisely, since the imputation itself as an additional source of variability is explicitly considered at the estimation of the variance. Therefore, MI approaches are considered to be state of the art today (Schafer and Graham, 2002). An example of a basic SI approach is imputation by the mean of the observed values. Often used MI methods are the Markov Chain Monte Carlo (MCMC) method and a MI using chained equations (White et al., 2011). However, as already mentioned, there is still no uniform solution for the handling and imputation of missing data yet (European Medicines Agency, 2010). A popular method of the SI strategy is, in a simple version, the so called Hot Deck imputation that includes a variety of approaches. According to the Hot Deck principle, a missing value of an incomplete variable is imputed by an observed value of the same variable where the selection of this observed value can be randomly or pursuant a defined criterion. Often the imputation value for a study participant is a random choice from a set of observed values, where the participants show similarities with respect to particular characteristics. Different options to create this so 2 called “donor pool” are summarized and discussed in Andridge and Little (2010). Defining subjects similarly may be a complex procedure in general. The investigated medical problem strongly determines the most relevant factors for such a definition. From a methodological point of view usually multivariate approaches are required. For a descriptive example one may think of observational studies where patients are not randomized to the treatment arms. Covariables and potential confounders possibly are not equally distributed to the comparison groups, so respective analysis methods are required to avoid biased effect estimates. A well-known approach in such a situation to make patients more comparable and to get adjusted effect estimates is the propensity score (PS). It describes the conditional probability for a patient to be exposed (e.g. belonging to the treatment group) under a given covariate structure. In this way the partially extensive information on a patient can be reduced to a scalar value. Patients with the same profile of covariables will have the same conditional probability to be exposed. The PS can then be used, for example, to adjust the effect estimator for potential confounding in the course of a regression analysis or to conduct a PSbased matching of exposed to non-exposed patients. If the exposure status would be defined as an indicator for the presence of missing values of an analysis variable, then the PS can be used to assess the similarity of patients with and without missing values. Within this paper, it is described how the two concepts of propensity score estimation and Hot Deck imputation can be combined to create a flexible multiple imputation method. This approach is called the Hot Deck Propensity Score (HDPS) method. Related approaches are described in Andridge and Little (2010), Mittinty and Chacko (2004), Little (1986), Haziza and Beaumont (2007) and Allison (2000). The proceeding is not illustrated in detail in Andridge and Little, while Mittinty and Chacko provide a nearest neighbor imputation based on PS in a Single Imputation setting only. The papers of Little and Haziza and Beaumont, however, also rather describe Single Imputation approaches, whereas they use the Propensity Score to create imputation classes, which is not the scope of this article. Allison indeed describes a PS-based Multiple Imputation procedure which is called “approximate Bayesian bootstrap”, but this approach also rests upon How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 imputation classes and not on a nearest neighbor approach. The paper is organized as follows. After a short presentation of the example data set which was used for the simulation study, some theoretical aspects of Hot Deck imputation and the calculation of propensity scores are given. The presentation of the HDPS method is additionally endorsed by a simulation study which has been conducted to investigate the performance of the method with respect to effect estimation for different missing data situations. Respective simulation results for a complete case analysis and a standard multiple imputation approach are also given to which the HDPS results can be referred to. 2.0 Data Example The SPICE data set (Holubarsch et al, 2008) was derived from a cohort study in the field of cardiology. This clinical trial investigated the efficacy and safety of Crataegus extract in patients with congestive heart failure. In an extended evaluation of the SPICE data (Mayer et al., 2013), a multivariate Cox regression analysis was performed to find risk factors for a cardiac event, which was a composite endpoint of cardiac death, non-fatal myocardial infarction and hospitalization due to progressive heart failure. This analysis was based on 2170 patients who had complete values in all necessary variables for the regression model. The mean age of the patients was 59.8 years (SD 10.5) with a higher amount of male patients (84%). Most relevant risk factors for getting a cardiac event had been the intake of cardioactive medications (glycosides, antiarrhythmics, nitrates, diuretics, beta blockers, and calcium antagonists), the extent of clinical parameters (left ventricular ejection fraction (LVEF), New York Heart Association (NYHA) functional class) and the age of the patients. The hazard ratios for those variables were significantly different from 1. The results of the multivariate Cox regression analysis are presented more detailed in Table 2. Parts thereof should be contrasted to the results of the simulation study which will be described in the following section 3, to capture the effect of a missing value imputation via the proposed Hot Deck Propensity Score method, and other strategies to handle missing data. 3 3.0 Materials and Methods Before the concepts of Hot Deck imputation, propensity scores and a combination of both to impute missing values are discussed, an appropriate notation is introduced. This should support a clear description of the algorithm for HDPS imputation. In the following, V=(V1,…,Vn)T denotes the incomplete target variable from which missing values have to be imputed, where n is the number of subjects in the dataset. For a first view on the method’s properties, it is assumed that only one variable is affected by missing values. An application of the proposed method for more than one incomplete variable is then discussed within section 5. Further, Xi=(Xi1,…,Xip) denotes the vector of p completely observed variables for subject i=1,…,n. Thus, X=(X1,…,Xn)T is the (n×p) matrix of observed variables of the dataset. Furhtermore, let R=(R1,…,Rn)T with Ri=1 if Vi is missing and Ri=0 if Vi observed for all i=1,…,n be a missing data indicator variable for V. Above that, the target variable V can easily be separated into two components, the observed part and the incomplete part: V= (Vobs,Vmis). Consequently, the indicator variable R is also separated into R=(R0,R1). The main objective is to present a procedure for missing value imputation. The introduced notation should therefore not be interpreted in a way that the regression of V on X is the aim, so V has not necessarily to be the outcome variable of the final analysis model. Instead, the aim is to conduct a regression of the missing value indicator R on X to estimate the conditional likelihood for each subject to have a missing value in V. How the propensity score is subsequently used to multiply impute the missing values according to a Hot Deck approach is described in section 3.3. Based on the completed dataset, the final analysis model, for example a Cox regression as in section 2, can be conducted. 3.1 The Concept of Hot Deck Imputation According to the Hot Deck principle, every missing value v* of a variable V (v* as a part of Vmis) is imputed by an observed value v (v as a part of Vobs). At this, the subject with the missing v* is often labelled “recipient” whereas the subject who provides the imputation value is called “donor” (Andridge and Little, 2010). In general, one has to differentiate two approaches: random Hot Deck procedures choose a donor value v randomly out of How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 an underlying donor pool, and, by contrast, deterministic Hot Deck methods select v according to certain criteria. One searches for donor values, for example, whose donors are preferably similar to the recipient. For this decision, several approaches which are able to capture the similarity between subjects are possible. In contrast to the so called Cold Deck principle where the potential donor pool is based on data which don’t origin from the actual analysis dataset, the Hot Deck approaches go back to observed values v of Vobs for the generation of the imputation values. Since Hot Deck imputation is usually applied in a SI setting, its general drawback is that the amount of variability that can be traced back to the imputation itself is not explicitly considered at the imputation, so the variance is underestimated by tendency, and confidence intervals are estimated to narrow. The HDPS approach therefore aims to implement the Hot Deck principle in a Multiple Imputation setting by creating an appropriate donor pool from which the imputation values are chosen randomly. 3.2 The Calculation and Usage of Propensity Scores Basically, propensity scores describe the conditional probability P( T=1 | X=x) to be exposed with respect to a defined event, given the specific covariable structure for a subject (Rosenbaum and Rubin, 1983). The exposure status may indicate the belonging to an intervention group or the presence of a certain event like, for example, a missing value. According to D’Agostino (1998), the PS are usually estimated by a logistic regression model logit P (T=1 | X=x) = ln (P/(1-P)) = β0 + β1x1 + … + βpxp. The propensity score summarizes the partially extensive information of a whole set of covariables to a scalar value. This reduction of dimension could be beneficial if the PS is used again within further analyses. In observational studies, for example, propensity scores are frequently used to correct for potential confounding variables which affect the estimation of treatment effects negatively (Guo and Fraser, 2010). A major problem of observational studies concerns the possible lack of structural equality of comparison groups due to a not-feasible randomization. By calculating an individual conditional probability (on the particular covariate 4 structure) for each subject it is possible to adjust for an unequal distribution of confounders in comparison groups. The final analysis model, whatever it is, has to consider the calculated PS as an adjustment variable. Basically there are two procedures to make use of the propensity score (Guo and Fraser, 2010), either including it in the final analysis model (e.g. as additional covariable in a regression model) or to interpose a subject matching according to the PS before a postmatching analysis is done (e.g. stratification based on matched sample). The PS approach is applicable to any type of multivariate data when a binary, or even multinomial, response variable is investigated. Defining a binary missing data indicator variable R as response, the PS methodology can be used to estimate the chance for each study participant to have a missing value in V, given the individual structure of covariables X. The validity of this likelihood is essentially dependent on the quality and completeness of the covariables X. At that, study participants with a similar covariable structure have similar PS, so a matching of study participants according to their calculated conditional likelihoods can be made, for example. Just that approach should be applied to create an appropriate donor pool for study participants with missing values in the course of a Hot Deck imputation. 3.3 The Hot Deck Propensity Score (HDPS) Approach The objective of the HDPS method is to find a respective donor element v of Vobs for each missing value v* of Vmis. The approaches of Hot Deck imputation and propensity score estimation are combined for this purpose. According to the Hot Deck principle, donors and recipients should be preferably similar. The application of a missing data indicator variable R enables to use a PS approach to find appropriate pairs of donors and recipients, if R defines the dependent variable of the corresponding logistic regression model and all completely observed variables are used as predictors. The strength of similarity is represented by the PS of the subjects at this. Since a missing data indicator variable is used which denotes the presence or absence of the values of the target variable V, the HDPS approach is not restricted to an imputation of just certain variables in a sense that there are no limitations with respect to the scale level of V. How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 To avoid the general drawback of deterministic Hot Deck procedures that each missing value is imputed by just one donor value, the HDPS proposal rests upon the creation of a donor pool from which an imputation value is randomly chosen. In this way, the HDPS approach can be implemented as a Multiple Imputation method and is given through the following algorithm: 1. 2. 3. Calculate the PS via P(R=1 | X=x) where X is the matrix of all completely observed variables of the original data set and R is the missing data indicator variable for the target variable V For all v* in Vmis search vi in Vobs, i=1,…,10: Di | PS v* PS vi | 0 with min {Di: i=1,…,10} := D1 ≤ D2 ≤ … ≤ D10 Do a MI (m = 5) by choosing randomly vi, i=1,…,10 as imputation value for v* PS PS vi v* and Here, are the corresponding propensity scores of v* and vi, respectively. For each missing element of V (i.e. for each subject with a missing in V) one has to search for a set of observed values of the target variable (i.e. from all subjects with observed values for V) such that the calculated individual propensity scores show an absolute deviation which tends to zero. The aim is to find PS vi those propensity scores which result in the ten most minimal absolute differences when PS v* . In doing so, a donor subtracting them from pool of ten observed values of V is created. For each missing value of the target variable then m = 5 elements of this pool are chosen randomly as imputation values. The preferably high similarity between subjects with missing values and observed values is characterized and obtained by an absolute deviation of their propensity scores which tends to zero. The establishment of a donor pool allows performing a MI, which has substantial advantages over deterministic Hot Deck procedures which are only able to conduct a SI. Simultaneously the intended similarity between donors and recipients is ensured by minimizing the absolute difference in PS. The size of the donor pool has been set to a number of ten potential donor values to guarantee a sufficient variability among the imputation values on the one hand, and to preserve an acceptable extent of similarity on the other hand. The choice of 5 m = 5 corresponds to the established opinion with respect to the number of imputations which are sufficient (Rubin, 1987). The HDPS approach is very intuitive. Initially one investigates the relationship between the target variable (by the respective missing data indicator variable R) and all completely observed variables of the data set. Using the PS as a scalar measure of similarity of subjects, an adequate donor can be identified easily. The consideration of the absolute difference guarantees that the potential donor subjects could have either smaller or larger PS values. A priori one would expect that the ability of the logistic regression model to capture a relation between the missing data indicator and all observed variables is high, especially if the data are missing not completely at random (i.e. in case of MAR). 3.4 Simulation Study To evaluate the proposed HDPS approach with respect to its ability to impute missing values of different scale levels properly, a real data-based simulation study has been performed. On the basis of the presented example dataset in section 2, missing values have been artificially simulated and imputed afterwards again. Therefore, the original results of the Cox regression model (Table 2) can be compared to results after imputation. Different missing data situations have been investigated which arise from different missing data mechanisms and various proportions of missing values in the dataset. Two alternative approaches to handle missing data have also been applied to allow for a comparison of the HDPS results to established methods. The simulations have been performed with the statistical software SAS. 3.4.1 Missing Data Situations The simulation study has been set up to investigate a number of different missing data situations. The two missing data mechanisms MCAR and MAR have been created, so the application of the HDPS approach is restricted to so-called “ignorable” missingness. Additionally, distinct proportions of missing values in the dataset (5%, 10%, 20%, 30%, 40%, and 50%) have been examined. Although there is no recommendation up to which rate of missing values an imputation can, or should be performed, a proportion higher than 50% seems problematic in general with respect to data quality. For that reason higher percentages have not been considered for simulation. The investigated missing How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 rates represent more realistic scenarios which may arise in practice. For each missing data situation, 1000 simulations have been run. Hot Deck Propensity Score imputation is a flexible approach since it is not restricted to a certain scale level of the variable to be imputed. To show its applicability for different types of variables, the simulations have been run for both, one continuous (LVEF) and one binary variable (glycosides intake). 3.4.2 Creation of Missingness To create missing values according to the MCAR mechanism, for both variables a simple random sample has been chosen from the complete dataset (2170 cases). For every patient a random number generator has been used which indicated if the respective value of LVEF or glycosides should be deleted, where the likelihoods for deletion or not were equal. For generating the MAR mechanism a random number generator has been used, too. But here the likelihood for a value of LVEF or glycosides, respectively, to be deleted was based on the values of the variable NYHA class, and is therefore not completely random. It has been assumed that patients with a higher NYHA class are more likely to have a missing in LVEF or glycosides. This assumption might be realistic, since patients with a higher NYHA class are severely ill and are, maybe, more likely to miss an appointment for LVEF examination, or do not know exactly if they ingest glycosides besides all other medications. Overall, the likelihoods for deleting a value or not have been unequal, but resulted in the desired missing data proportion for each simulation scenario. 3.4.3 Investigated Approaches to Handle Missing Data The incomplete datasets were subsequently treated with three different approaches. The presented HDPS method has been applied to the data, as well as a standard multiple imputation approach (Markov Chain Monte Carlo method (MCMC) or Logistic method). Moreover, a complete case analysis has been conducted. For the HDPS method, the SAS macro “%hdps.sas” has been created which implements the algorithm presented in section 3.3. The MCMC method is default for the SAS procedure PROC MI for the imputation of continuous variables (LVEF). The imputation of binary variables 6 (glycosides) requires the Logistic method which is also available in PROC MI. According to the HDPS method, a number of 5 imputations have been performed with the MCMC method and the Logistic method, respectively. The Cox regression, as the final analysis model, includes only completely observed cases by default, so the performance of CCA was straightforward by analyzing the incomplete data sets directly. 3.4.4 Evaluation of Simulation Results To evaluate the applicability of the proposed HDPS method, the same Cox regression which created the original results of Table 2 have been performed again with all imputed datasets. The parameters of primary interest for the two incomplete variables were the hazard ratios with their corresponding 95% confidence intervals. The comparison between the original results and those which have been created after missing value imputation based on these parameters. Since for every missing data situation 1000 simulation runs have been conducted, the mean and standard deviation over all simulation results for every parameter estimate was calculated for the particular missing data situation. The c-statistic of the logistic regression model to calculate the propensity scores has additionally been recorded for informative purposes. This value serves for describing the ability of the logistic regression to model the relationship between the completely observed covariables and the missing data indicator variable. Higher values of the cstatistic indicate a better selectivity of the logistic regression model. Therefore, in simulation scenarios where the completely observed variables fit well to the missing data indicator variable, higher c-values and more precise imputations are expected. Evaluation of the results after imputing the binary variable (glycosides) is also based on capturing the fraction of accordances among original and imputed values. 4.0 Results 4.1 General Remarks In the following, the results of the real data-based simulation study, which has been described within the last section, are presented. The main objective was to investigate the impact of a missing value imputation, with respect to the estimated hazard ratios and corresponding 95% confidence intervals How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 of the two target variables: LVEF and glycosides. No efforts have been undertaken at the generation of the HPDS results which explicitly affected the goodness of the logistic regression model (expressed by the c-statistic). The calculation of the propensity scores, which subsequently were used for choosing the surrogate values, was based on all observed variables of the example data set. First of all, salient differences between the particular simulation scenarios should be investigated with respect to different proportions of missing values and missing data mechanisms. Since the simulation has been conducted for two different types of variables, the results were presented in two parts. Beside the hazard ratios and 95% confidence intervals, the value of the c-statistic was also given for those simulation scenarios where the HDPS method has been used to impute missing data. The analysis of the binary variable was a bit more extended by presenting additionally the rate of correctly imputed missing items (i.e. accordance of original and imputed value) for the HDPS and MI method. It has to be kept in mind that the presented results were the mean values over all 1000 simulation scenarios. 4.2 Continuous Simulation Variable The simulation results for the continuous variable (LVEF) confirmed some previously formulated expectations and brought out interesting findings with respect to the intended comparison of the HDPS method for imputing missing values with CCA and Multiple Imputation approaches. First of all, according to Table 3, the use of MI led to almost unbiased estimates of the hazard ratio and corresponding 95% confidence interval for LVEF. This holds for both investigated missing data mechanisms, MCAR and MAR, and proved again the very good properties of the MCMC method in imputing missing values. Moreover, it seemed that there is no considerable impact of the rate of missing values in the data set on the precision of estimation. The maximal deviation of the MI estimator from the original hazard ratio (0.985) was 0.9863 (MAR, 50% missing values). The HDPS method, however, generated slightly inaccurate estimates of the hazard ratio and the confidence limits for missing value rates of 20% and higher. Further, precision of estimation depended on the underlying missing data mechanism, whereas the estimation was more precise in case of 7 a MCAR mechanism. The maximum difference between original and simulated hazard ratio of LVEF could be observed again for a MAR mechanism and a missing value rate of 50% (0.985 vs. 0.9933). The upper and lower confidence limits were estimated too large compared to the original limits. Overall, strength of the estimated effect sizes for LVEF was too high. The CCA strategy resulted in relatively robust estimations of the hazard ratio for both, MCAR and MAR, but much more sensitive estimations of the confidence limits. The maximal deviation of the estimated hazard ratio was 0.9688. While both confidence limits were estimated too large by applying the HDPS method, the CCA strategy brought out smaller estimators for the lower confidence limit and larger estimators for the upper confidence limit. The broadest confidence interval was given for a MCAR mechanism and 50% missing values. The MI and HDPS methods were both able to generate consistent estimations of the confidence intervals’ width in every simulation scenario. The MI estimations were generally very consistent, even with respect to both interval limits, but not only regarding the width of the interval. Although both confidence limits were estimated too large by the HDPS method, the width of the interval stayed relatively consistent. As already described, the inaccurate estimation of confidence limits by the CCA strategy differed considerably from the biased estimation of the confidence limits by HDPS. Whereas the HDPS method generated too large values of both confidence limits, CCA estimated confidence intervals which were too wide, especially for higher rates of missing values. The significant impact of LVEF to the occurrence of a cardiac event was kept up by applying a MI approach to the incomplete data, since the estimation of the upper confidence limit was below 1 in each scenario. For missing value rates of 20% and higher, the HDPS and CCA strategies suggested that LVEF was not a significant predictor of a cardiac event, since the 95% confidence interval of the hazard ratio included 1. Table 4 shows the mean c-statistic of each HDPS simulation scenario. As expected, the c-value which can be used to describe the ability of the logistic regression model to model the relationship between the completely observed covariables and the missing data indicator variable, was just above 50% How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 (0.54-0.59) in case of a MCAR assumption. The extent of the c-value was negatively correlated with the number of missing values in the data set. Assuming a MAR mechanism, the expectation of higher values of the c-statistic was also confirmed (0.79-0.82), whereas there seemed to be no correlation with the rate of missing data. Another very important finding was that the HDPS and MI approaches are both able to estimate the hazard ratios and corresponding confidence intervals of all other variables consistently, while CCA led to severely biased estimates for higher rates of missing values (see Appendix). 4.2 Binary Simulation Variable The simulations for the dichotomous variable ‘glycosides intake’ resulted in -at least- slightly biased estimates of the hazard ratio and confidence interval limits. All investigated strategies to handle missing data revealed precision problems even for smaller rates of missing data. HDPS underestimated glycosides effect size by tendency, as well as the corresponding upper and lower limits of the confidence interval. The extent depended on the underlying missing data mechanism, whereas the results were a bit more biased in case of MCAR (maximal deviation is 1.3919 vs. 1.970). The width of the confidence interval, however, was estimated consistently for almost every scenario. For the highest rates of missing values, the width was estimated a bit too narrow, and the interval did not contain the true effect size anymore. The MI approach also underestimated the effect size in all scenarios, and the confidence interval as well. 8 Like for HDPS, the Multiple Imputation method also brought out consistent estimations for the width of the confidence interval, even for high amounts of missing values. Moreover, there seemed to be no substantial impact of the assumed missing data mechanism on precision of estimation. In contrast to HDPS and MI, the CCA strategy generated effect sizes which were overestimated, especially if the missing data mechanism was MAR and the amount of missing data was high. Additionally, the corresponding confidence limits were biased and led to interval widths which were too large (maximum at 50% and MAR). Again, as in case of a missing value imputation of LVEF, HDPS and MI approaches led to nearly unbiased estimations of the hazard ratios and confidence intervals of the other predictor variables of the Cox regression model, whereas the CCA strategy generated substantially biased estimates, especially for higher amounts of missing data in ‘glycosides intake’ (see Appendix). The values of the c-statistic were similar to those from the imputation of LVEF. For MCAR, the range was again 0.54 to 0.59 and confirmed again the expectation that the logistic regression model was not able to find a meaningful relationship between the predictor variables and the missing data indicator variable when the data were MCAR. In case of MAR, the values ranged from 0.79 to 0.83. For the dichotomous variable ‘glycosides intake’ additionally the fraction of accordances among original and imputed values was investigated. This rate was negatively correlated with the amount of missing data for both, HDPS (0.51 to 0.95) and MI (0.75 to 0.98), whereas the MI approach brought out more accordance. Overall, these values were higher in case of a MAR mechanism. Table 1: Patient characteristics of the analysis set for the extended evaluation of the SPICE study How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 Gender (n(%)) Female Male Age (mean (SD)) [years] Smoking (n(%)) 9 Complete cases of the SPICE data set (n=2170) 345 (16%) 1825 (84%) 59.8 (10.5) Non-smoker 986 (45%) Smoker 953 (44%) Ex-smoker NYHA class I II III Body mass index (mean (SD)) [kg/m²] Left ventricular ejection fraction (mean (SD)) [%] Cardioactive medications (n (%)) 231 (11%) 8 (0.4%) 1225 (56%) 937 (43%) 26.8 (3.7) 23.8 (6.6) Glycosides 1228 (57%) Nitrates 1224 (56%) Beta Blockers 1395 (64%) Antiarrhythmics Diuretics Calcium antagonists 476 (22%) 1859 (86%) 26 (1%) Table 2: Explanatory variables for the risk of a cardiac event in patients with congestive heart failure How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 Explanatory variable 10 Glycosides (yes vs. no) increased risk for yes 1.970 [1.636 , 2.372] <0.0001 NYHA class (III vs. II) class III 1.602 [1.357 , 1.892] <0.0001 Diuretics (yes vs. no) yes Antiarrhythmics (yes vs. no) yes Nitrates (yes vs. no) yes Age yes Calcium antagonists(yes vs. no) yes LVEF 95% CI 1.947 low LVEF p-value [1.636 , 2.317] 1.426 older age Beta Blockers (yes vs. no) HR [1.203 , 1.690] 1.775 [1.249 , 2.521] 1.015 [1.007 , 1.023] 1.312 [1.105 , 1.559] 2.233 [1.316 , 4.136] 0.985 [0.974 , 0.997] <0.0001 <0.0001 0.0014 0.0004 0.0020 0.0037 0.0177 Table 3: Simulation results for the continuous variable LVEF. Denoted are (averaged over all simulations) the hazard ratio and corresponding 95% confidence interval for LVEF HR MCAR LCL HR 0.997 0.985 Original results 0.985 5% HDPS 0.9861 0.9742 0.9981 CCA 0.9854 0.9729 0.9981 10% 20% 30% CCA MI HDPS MI CCA HDPS MI 0.974 0.9855 0.9733 0.9978 0.9855 0.9736 0.9975 0.9867 0.9748 0.9988 0.9854 0.9735 0.9974 0.9855 0.9722 0.9989 0.9881 0.9761 1.0001 0.9854 0.9735 0.9975 CCA 0.9855 0.9712 0.9999 MI 0.9854 0.9734 0.9975 HDPS MAR LCL UCL 0.9893 0.9774 1.0014 0.974 UCL 0.997 0.9856 0.9733 0.9980 0.9863 0.9744 0.9984 0.9855 0.9736 0.9976 0.9858 0.9731 0.9987 0.9874 0.9755 0.9994 0.9856 0.9737 0.9977 0.9862 0.9724 1.0002 0.9894 0.9774 1.0015 0.9859 0.9740 0.9979 0.9863 0.9714 1.0015 0.9905 0.9785 1.0027 0.9859 0.9740 0.9979 How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 40% 50% 11 CCA 0.9854 0.9701 1.0010 0.9868 0.9703 1.0036 MI 0.9852 0.9732 0.9973 0.9858 0.9739 0.9978 HDPS 0.9907 0.9787 1.0028 CCA 0.9923 0.9802 1.0045 0.9853 0.9685 1.0024 HDPS 0.9868 0.9703 1.0037 0.9919 0.9799 1.0040 MI 0.9933 0.9812 1.0055 0.9852 0.9733 0.9974 0.9863 0.9744 0.9983 Table 4: C-statistic from the logistic regression model. Denoted is (averaged over all simulations) the cvalue of the logistic regression model before HPDS imputation for LVEF. C-statistic MCAR MAR 5% 10% 20% 30% 40% 50% 0.591 0.567 0.550 0.543 0.541 0.540 0.804 0.810 0.824 0.790 0.826 0.820 Table 5: Simulation results for the binary variable glycosides intake. Denoted are (averaged over all simulations) the hazard ratio and corresponding 95% confidence interval for glycosides intake. MCAR Original results 5% 10% 20% CCA HDPS MI CCA HDPS MI CCA HDPS MI CCA MAR HR LCL UCL HR LCL UCL 1.970 1.636 2.372 1.970 1.636 2.372 1.9719 1.6297 2.3860 1.8938 1.5762 2.2753 1.9533 1.6226 2.3516 1.9726 1.6217 2.3995 1.8212 1.5189 2.1837 1.9369 1.6093 2.3311 1.9749 1.6040 2.4316 1.6956 1.4190 2.0261 1.8954 1.5761 2.2794 1.9780 1.5833 2.4712 1.9759 1.6321 2.3921 1.8974 1.5781 2.2812 1.9566 1.6252 2.3557 1.9836 1.6270 2.4185 1.8159 1.5129 2.1796 1.9371 1.6096 2.3312 2.0062 1.6192 2.4856 1.6797 1.4033 2.0106 1.9019 1.5821 2.2864 2.0196 1.6037 2.5434 How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 30% 40% 50% 12 HDPS 1.5832 1.3290 1.8859 1.5779 1.3212 1.8846 CCA 1.9855 1.5606 2.5262 2.0456 1.5867 2.6373 MI HDPS MI CCA HDPS MI 1.8649 1.5520 2.2410 1.4855 1.2503 1.7650 1.8421 1.5332 2.2134 1.9898 1.5277 2.5918 1.3919 1.1742 1.6499 1.8040 1.5017 2.1672 1.8768 1.5624 2.2544 1.4767 1.2387 1.7604 1.8423 1.5339 2.2128 2.0658 1.5608 2.7342 1.3959 1.1730 1.6612 1.8228 1.5173 2.1900 Table 6: Biased CCA estimates of confidence interval width due to 50% missing data in LVEF under MAR assumption. Explanatory variable Glycosides (yes vs. no) increased risk for yes imputation method yes class III MI 1.980 [1.644 , 2.384] CCA MI CCA HDPS Nitrates (yes vs. no) yes 1.970 [1.636 , 2.372] 2.046 [1.585 , 2.641] HDPS NYHA class (III vs. II) 95% CI CCA HDPS Antiarrhythmics (yes vs. no) HR MI CCA HDPS 1.981 [1.645 , 2.385] 1.947 [1.636 , 2.317] 1.925 [1.502 , 2.467] 1.947 [1.636 , 2.318] 1.945 [1.634 , 2.315] 1.602 [1.357 , 1.892] 1.617 [1.253 , 2.086] 1.618 [1.370 , 1.912] 1.603 [1.357 , 1.894] 1.426 [1.203 , 1.690] 1.381 [1.088 , 1.754] 1.425 [1.202 , 1.690] How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 Diuretics (yes vs. no) 13 yes MI 1.419 [1.197 , 1.682] CCA 1.777 [1.154 , 2.739] HDPS Age older age MI CCA HDPS Beta Blockers (yes vs. no) yes MI CCA HDPS Calcium antagonists (yes vs. no) yes MI CCA HDPS LVEF low LVEF MI CCA HDPS MI 1.775 [1.249 , 2.521] 1.797 [1.265 ,2.553] 1.777 [1.251 , 2.524] 1.015 [1.007 , 1.023] 1.015 [1.004 , 1.026] 1.014 [1.006 , 1.022] 1.014 [1.006 , 1.022] 1.312 [1.105 , 1.559] 1.268 [0.992 , 1.620] 1.316 [1.108 , 1.564] 2.233 [1.316 , 4.136] 1.320 2.540 2.333 2.322 [1.111 , 1.569] [1.212 , 5.347] [1.316 , 4.137] [1.309 , 4.119] 0.985 [0.974 , 0.997] 0.986 [0.970 , 1.003] 0.993 [0.981 , 1.005] 0.986 [0.974 , 0.998] Table 7: Biased CCA estimates of confidence interval width due to 50% missing data in glycosides intake under MAR assumption. How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 Explanatory variable Glycosides (yes vs. no) 14 ‘ increased risk for yes imputation method CCA HDPS Antiarrhythmics (yes vs. no) yes MI CCA HDPS NYHA class (III vs. II) class III MI CCA HDPS Nitrates (yes vs. no) yes MI CCA HDPS Diuretics (yes vs. no) yes MI CCA HDPS Age older age MI CCA HDPS Beta Blockers (yes vs. no) yes MI HR 95% CI 1.970 [1.636 , 2.372] 1.395 [1.173 , 1.661] 2.065 1.822 1.947 1.921 1.866 1.894 1.602 1.627 1.634 1.612 1.426 1.372 1.379 1.420 1.775 1.782 2.180 1.869 1.015 [1.560 , 2.734] [1.517 , 2.190] [1.636 , 2.317] [1.459 , 2.529] [1.568 , 2.221] [1.592 , 2.255] [1.357 , 1.892] [1.192 , 2.223] [1.383 , 1.931] [1.365 , 1.904] [1.203 , 1.690] [1.053 , 1.787] [1.163 , 1.635] [1.197 , 1.684] [1.249 , 2.521] [1.120 , 2.839] [1.546 , 3.075] [1.318 , 2.651] [1.007 , 1.023] 1.015 [1.003 , 1.028] 1.013 [1.005 , 1.021] 1.014 [1.005 , 1.022] 1.312 [1.105 , 1.559] How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 15 CCA 1.256 [0.957 , 1.647] MI 1.291 [1.087 , 1.534] HDPS Calcium antagonists (yes vs. no) yes CCA HDPS LVEF low LVEF MI 2.233 2.615 2.320 2.281 0.985 CCA 0.987 MI 0.984 HDPS 5.0 Discussion 1.221 0.984 [1.029 , 1.449] [1.316 , 4.136] [1.175 , 5.866] [1.308 , 4.115] [1.286 , 4.047] [0.974 , 0.997] [0.968 , 1.005] [0.972 , 0.996] [0.972 , 0.996] The presence of missing data is a common problem in medical research. However, there is no unique How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 proposal to handle missing values yet which is supposed to be appropriate in every kind of ‘missing data situation’. This circumstance is especially reflected in a respective guideline (European Medicines Agency, 2010) which just summarizes the problems of missing values for data analysis and some established strategies and methods which can be applied to incomplete data. Therefore, improving the existing methods and developing new ones, respectively, should be aimed in this important research field of medical statistics. According to literature, methods of the MI strategy are viewed as state of the art today (Engels and Diehr, 2003; Newman, 2003; Schafer and Graham, 2002; White et al., 2011) because they lead to less biased results with respect to the behaviour of parameter estimates and respective confidence intervals. In contrast, pragmatic SI methods, like mean or regression imputation, are not proposed anymore since they underestimate variance by tendency. MI overcomes this problem by imputing missing values m > 1 times, so the imputation itself as an additional source of variance is explicitly considered at the estimation process (Rubin, 1987). Whenever missing values are present, an imputation strategy is indeed much more reasonable than just deleting incomplete cases from the data set, although Vach and Blettner (1991) showed that this CCA strategy generates unbiased estimates in case of a MCAR mechanism. However, the loss of statistical power can be tremendous when the number of examined variables is high. Major restrictions to available imputation methods are their inappropriateness in case of a MNAR mechanism and their sensitivity against the number of missing values in a data set, whereas common MI methods even work well in case of higher amounts of missing values up to 50% (Engels and Diehr, 2003; Mayer, 2011). Further, each of the established imputation methods is only applicable for a certain kind of scale level. The MCMC method, for example, can only be applied to continuous variables with missing items, and the Logistic method is only appropriate for imputing binary variables. In practice, often more than one variable are affected by missing values, so one has to apply different imputation methods to fill the data set. These distinct methods, however, have different presumptions with respect to the structure of the data set which may lead to a complex imputation process in total. Moreover, many researchers are not too familiar with possibilities to impute missing data, so the utilization of different imputation methods maybe confusing. On the other hand, there 16 is the Fully Conditional Specification (FCS) method (van Buuren, 2007) which can perform a MI for variables with either continuous or categorical scale level, but this approach is some kind of a black box since the underlying algorithm is quite complex. Knowing these facts, the development of new and easily understandable methods for imputing missing values should preferably include a MI methodology which can be applied to variables of different scale levels. The proposed HDPS method for missing value imputation combines both. A propensity score approach is used to create a respective donor pool of possible imputation values for each missing observation of the data set, so the HDPS method can be implemented in a MI setting by multiply choosing one donor value (at random) out of the pool. On the one hand, the donor pool which is generated by those 10 donors which have a PS most next to the respective PS of the recipient must not be too large because then the similarity assumption does not hold anymore. On the other hand, however, the donor pool must not be too small since there should be enough variability in the donor data to get valid MI estimates. Choosing those 10 donors’ PS which are most next to the respective recipient PS seemed to be an acceptable approach for the initial implementation of the HDPS procedure. The simulation study showed that the HDPS method can easily be applied to different types of incomplete variables. To apply a standard MI approach instead, the implementation of different methods (MCMC or Logistic) is necessary. However, the validity of parameter estimation is much better for standard MI approaches which are almost unbiased, even in case of high amounts of missing data. Especially the simulations for the binary variable show that HDPS and MI generate rather conservative estimations by tendency in contrast to a CCA approach which estimates higher hazard ratios and confidence intervals in nearly all scenarios. But although HDPS, as well as MI, leads to hazard ratio estimates that are too low, the width of the corresponding confidence intervals is validly estimated. In comparison, CCA produces severely biased estimates of the confidence interval widths. This mainly traces back to the reduced sample size for the respective analysis. Another very important finding is that HDPS, also as well as standard MI approaches, is able to generate unbiased parameter estimates for all other variables which are included in the original Cox regression model (see Appendix), How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 whereas CCA produces invalid estimations for most of the other variables instead. For both simulation variables, the c-values are negatively correlated with the number of missings in the data set in case of a MCAR mechanism. There are much higher values in case of a simulated MAR mechanism, but that was the expectation in advance since the mechanism has been created artificially from the data. However, there seems to be no relation between the goodness of fit of the logistic regression model (i.e. the ability of the model to separate cases with missing items from those with complete data by doing a regression of the missing data indicator variable and all completely observed covariates) and the validity of estimation. Besides, the fraction of correctly imputed values in case of the binary simulation variable is negatively correlated with the number of missing values in the data set. Thus, the HDPS method can produce valid imputations for binary variables only in case of lower amounts of missing data. In practice, it is unlikely that only one variable is affected by missing values, so the handling of missing data in multiple variables is an important issue. Thus, an application of the proposed HDPS approach in such data situations should be discussed at least. For this initial implementation of the method and the conducted simulation study, respectively, only one variable was assumed to have missing items. The main focus of the article was to present the newly developed procedure for missing value imputation, so the first setting of the simulation study was restricted to only one partially missing variable to see the properties of HDPS imputation. The conception of HDPS allows from a theoretical point of view quite easily its application in situations with multiply incomplete variables. As already showed, the method can deal with different types of missing data variables by using a missing data indicator variable to create propensity scores on the basis of all completely observed variables of a data set. This proceeding can be conducted repeatedly for each incomplete variable without further theoretical problems regarding the imputation procedure. However, one has to discuss the presumptions with respect to available data in general. The presented data set includes a huge number of cases, and therefore the implementation of the logistic model is unproblematic. If the investigated data set is much smaller, however, a valid estimation of PS may be doubtful. But it is actually very difficult to give an advice at this point up to which amount of observations the HDPS 17 method can be applied validly. Further research is necessary here where the proposed method should be applied to data sets of different sizes. The simulations showed that there seems to be no relation between goodness of fit of the logistic regression model and the validity of estimations for the HDPS procedure. Otherwise, one should has been able to observe an increase of estimation precision in case of MAR compared to MCAR since the c-values have been much higher for MAR. However, it should at least be examined within further investigations what happens to estimation precision, if the establishment of the logistic regression model is modified with respect to the included predictors. Moreover, the performance of the HDPS procedure in a MNAR setting would give useful advices regarding the extent of bias of parameter estimates. And above that, of course, efforts to prove the method’s ability to produce acceptable estimations in case of multiply incomplete analysis variables should be undertaken. The main conclusion of this article is that the proposed HDPS approach can be easily applied to different types of incomplete variables regarding their scale level, which is a major advantage against most of the established imputation methods which are one only applicable to certain types of variables. Although the simulation results showed that HDPS estimations for the imputed variable were rather conservative in a sense that the real effect was underestimated by tendency, the method was able to produce unbiased estimates for all other included covariates of the original analysis model. Further investigations are needed, however, to check for possible improvements with respect to estimation precision, for the method’s performance in data sets of different structures, and for its application in estimating distinct outcome variables. Acknowledgements The study was funded by the young scientists program of the German network 'Health Services Research Baden-Wuerttemberg' of the Ministry of Science, Research and Arts in collaboration with the Ministry of Employment and Social Order, Family, Women and Senior Citizens, Baden-Wuerttemberg. The data for the simulation study have been kindly provided by Dr Willmar Schwabe Pharmaceuticals, Karlsruhe, Germany. Thank you very much for that. I also appreciate the support of Silvia Sander at the implementation of the SAS macro for HDPS How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248 Science Journal of Medicine & Clinical Trial ISSN: 2276-7487 imputation. Finally, I thank the independent reviewers for their valuable comments which really improved the manuscript. References 1. 2. 3. 4. 5. 6. 7. Allison, P.D. Multiple imputation for missing data: A cautionary tale. Sociological Methods and Research 2000; 28: 301–309 Allison, P.D. Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, Thousand Oaks, CA, 2001, 07-136 Andridge, R.R. and Little, R.J.A. A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review 2010, 78, 1: 4064 D’Agostino, R.B. Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a non-randomized Control Group. Statistics in Medicine 1998, 17, 2265-2281 Engels, J.M. and Diehr P. Imputation of missing longitudinal data: a comparison of methods. Journal of Clinical Epidemiology 2003, 56: 968976 European Medicines Agency. Guideline on Missing Data in Confirmatory Clinical Trials. 2010 Guo, S., Fraser, M.W. Propensity score analysis: statistical methods and applications. SAGE Publications, Inc., Thousand Oaks, 2010 18 11. Little, R.J.A. and Rubin, D.B. Statistical Analysis with Missing Data. John Wiley & Sons, Hoboken, New Jersey, 2002 12. Mayer, B. Missing Values in clinical longitudinal studies – Handling of Drop-outs. Dissertation at the University of Ulm (in German), 2011 (http://vts.uniulm.de/docs/2011/7633/vts_7633_10939.pdf) 13. Mayer, B., Holubarsch, C., Muche, R., Gaus, W. Prognostic Factors for Patients with Congestive Heart Failure – An Extended Evaluation of the SPICE-Study. In revision at British Journal of Medicine and Medical Research 2013 14. Mittinty, M., Chacko, E. Imputation by Propensity Matching. Proceedings of the Survey Research Methods Section, ASA 2004: 4022-4028 15. Newman, D.A. Longitudinal Modelling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood and Multiple Imputation Techniques. Organizational Research Methods 2003, 6: 328 16. Rosenbaum, P.R., Rubin, D.B. The central role of the propensity score in observational studies for causal effects. Biometrika 1983, 70: 41-55 17. Rubin, D.B. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, Hoboken, New Jersey, 1987 18. Schafer, J.L., Graham, J.W. Missing data: our view of the state of the art. Psychological Methods 2002, 7: 147-177 Haziza, D., Beaumont, J.F. On the Construction of Imputation Classes in Surveys. International Statistical Review 2007, 75: 25-43 19. Vach, W., Blettner, M. Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. American Journal of Epidemiology 1991, 134: 895-907 10. Little, R.J.A. Survey Nonresponse Adjustments for Estimates of Means. International Statistical Review 1986, 54: 139-157 21. White, I.R., Royston, P., Wood, A.M. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine 2011, 30: 377-399 8. 9. Holubarsch C.J.F., Colucci W.S., Meinertz T., Gaus W., Tendera M. The efficacy and safety of Crataegus extract WS® 1442 in patients with heart failure: The SPICE trial. European Journal of Heart Failure 2008, 10: 1255–1263 20. van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research 2007, 16: 219-242 How to Cite this Article:Dr. Benjamin Mayer “Hot Deck Propensity Score Imputation For Missing Values,” Science Journal of Medicine and Clinical Trials, Volume 2013 (2013), Article ID sjmct-248, Issue 2, 18 Pages, doi: 10.7237/sjmct/248