Supplementary material for: Confidence intervals after multiple imputation: combining profile likelihood information from logistic regressions Georg Heinze1 , Meinhard Ploner2 and Jan Beyea3 1 Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria, georg.heinze@meduniwien.ac.at 2 data-ploner.com, Brunico, Italy 3 Consulting In The Public Interest, 53 Clinton Street, Lambertville, NJ, 08530, USA 1 Contents 1. 2. 3. 4. Additional results of simulation study, p. 3 CLIP analysis of all parameters in the Alzheimer case-control study, p. 8 MCMC analysis of Alzheimer case-control study, p. 16 Example code for computation of CLIP confidence intervals and CLIP profiles by the R package logistf, p. 35 2 Additional results of simulation study Here we summarize some additional results of our simulation study referenced in the main paper. 5-imputations Supplementary Tables 1 and 2 provide the coverage rates of two-sided confidence intervals, if only the first five imputations are used in the analysis. It can be seen that the five imputations are clearly not enough for CLIP to reach the nominal level. On the other hand, RR is clearly overcovering. PVR falls in between these two methods, but still tends to overcoverage. APL’s actual coverage rates are clearly below the claimed nominal levels. 200-imputations Supplementary Tables 3 and 4 show the coverage rates of two one-sided nominal 97.5% confidence intervals with 200 imputations. The results show adequate coverage rates by UPL and CLIP, overcoverage by RR and partly also by PVR, and undercoverage by APL. 3 Supplementary Table 1: Simulation study: results on coverage of two-sided nominal 95% confidence intervals for 2 , 5 covariates X 2 X 6 with ~30% of the values missing completely at random (MCAR), 5 imputations, 1000 simulations. N j xj y Coverage (two-sided) x 1000 (expected = 950) UPL CLIP RR APL PVR 50 0 0.2 0.2 959 930 988 886 973 50 log2 0.2 0.2 953 937 988 898 965 50 log4 0.2 0.2 952 935 978 876 974 50 0 0.5 0.5 955 914 963 859 956 50 log2 0.5 0.5 967 950 985 921 963 50 log4 0.5 0.5 952 946 980 897 967 100 0 0.2 0.2 952 916 977 863 974 100 log2 0.2 0.2 949 912 980 850 972 100 log4 0.2 0.2 932 920 990 853 954 100 0 0.5 0.5 957 913 956 868 950 100 log2 0.5 0.5 951 917 954 864 953 100 log4 0.5 0.5 942 917 961 844 963 UPL, undeleted-data profile penalized likelhood; CLIP, combination of likelihood profiles; RR, Rubin’s rules; APL, averaging of profile penalized likelihood confidence limits; PVR, pseudo-variance modification of Rubin’s rules. Coverage is defined as the proportion of simulated confidence intervals covering the true value. 4 Supplementary Table 2: Simulation study: results on two-sided coverage of nominal 95% confidence intervals for 2 , 5 covariates X 2 X 6 , missingness of X 2 depending on Y and X 3 (MAR), missingness of X 3 X 6 completely at random, 5 imputations, 1000 simulations N j xj y Coverage (two-sided) x 1000 (expected = 950) UPL CLIP RR APL PVR 50 0 0.2 0.2 967 952 984 916 979 50 log2 0.2 0.2 956 932 988 899 969 50 log4 0.2 0.2 955 937 989 890 977 50 0 0.5 0.5 953 932 983 867 975 50 log2 0.5 0.5 955 917 976 838 961 50 log4 0.5 0.5 950 912 976 833 961 100 0 0.2 0.2 956 940 980 895 970 100 log2 0.2 0.2 954 929 971 880 966 100 log4 0.2 0.2 950 917 970 859 954 100 0 0.5 0.5 948 930 959 869 959 100 log2 0.5 0.5 946 930 976 869 966 100 log4 0.5 0.5 950 923 975 856 967 UPL, undeleted-data profile penalized likelhood; CLIP, combination of likelihood profiles; RR, Rubin’s rules; APL, averaging of profile penalized likelihood confidence limits; PVR, pseudo-variance modification of Rubin’s rules. Coverage is defined as the proportion of simulated confidence intervals covering the true value. 5 Supplementary Table 3: Simulation study: results on coverage of one-sided nominal 97.5% lefttailed/right-tailed confidence intervals for 2 , 5 covariates X 2 X 6 with ~30% of the values missing completely at random (MCAR), 200 imputations, 1000 simulations. y Coverage of one-sided left/right-tailed confidence intervals x 1000 N j xj (expected = 975) UPL CLIP RR APL PVR 50 0 0.2 0.2 977/982 976/979 995/999 940/959 982/998 50 log2 0.2 0.2 980/973 981/981 999/996 949/958 984/990 50 log4 0.2 0.2 977/975 980/983 990/993 938/957 987/990 50 0 0.5 0.5 981/974 978/976 995/982 937/936 973/982 50 log2 0.5 0.5 975/992 985/993 993/1000 957/982 975/986 50 log4 0.5 0.5 967/985 974/993 983/1000 942/968 987/980 0.2 0.2 973/979 977/973 994/991 936/941 977/998 100 log2 0.2 0.2 964/985 962/980 986/991 905/955 981/991 100 log4 0.2 0.2 966/966 966/984 999/991 912/952 982/983 100 0 0.5 0.5 974/983 974/977 976/984 935/942 976/980 100 log2 0.5 0.5 974/977 964/976 983/981 915/950 974/976 100 log4 0.5 0.5 967/975 975/983 993/985 925/932 984/982 100 0 UPL, undeleted-data profile penalized likelhood; CLIP, combination of likelihood profiles; RR, Rubin’s rules; APL, averaging of profile penalized likelihood confidence limits; PVR, pseudo-variance modification of Rubin’s rules. Coverage is defined as the proportion of simulated confidence intervals covering the true value. 6 Supplementary Table 4: Simulation study: results on coverage of one-sided nominal 97.5% left-tailed/right-tailed confidence intervals for 2 , 5 covariates X 2 X 6 , missingness of X 2 depending on Y and X 3 (MAR), missingness of X 3 X 6 completely at random, 200 imputations, 1000 simulations y Coverage of one-sided left/right-tailed confidence intervals x 1000 N j xj (expected = 975) UPL CLIP RR APL PVR 50 0 0.2 0.2 978/989 981/991 992/1000 954/976 982/997 50 log2 0.2 0.2 978/978 974/986 994/999 945/962 986/993 50 log4 0.2 0.2 981/974 974/985 996/996 953/959 993/987 50 0 0.5 0.5 975/978 980/983 995/994 935/937 985/985 50 log2 0.5 0.5 982/973 970/974 989/986 915/938 984/980 50 log4 0.5 0.5 975/975 968/981 996/988 918/943 989/984 0.2 0.2 974/982 975/988 977/999 947/960 975/997 100 log2 0.2 0.2 973/981 971/986 981/997 936/956 976/992 100 log4 0.2 0.2 979/971 977/978 994/989 940/939 983/983 100 0 0.5 0.5 973/975 977/973 985/984 940/943 981/978 100 log2 0.5 0.5 974/972 972/985 980/987 928/947 975/986 100 log4 0.5 0.5 979/971 977/982 996/983 911/950 989/983 100 0 UPL, undeleted-data profile penalized likelhood; CLIP, combination of likelihood profiles; RR, Rubin’s rules; APL, averaging of profile penalized likelihood confidence limits; PVR, pseudo-variance modification of Rubin’s rules. Coverage is defined as the proportion of simulated confidence intervals covering the true value. 7 CLIP analysis of all variables in the Alzheimer case-control study This section provides some complementary material on the CLIP analysis of the Alzheimer case-control study. Supplementary Figures 1-7 contain plots of a CLIP analysis, similarly to Fig. 2 of the main paper, describing the posterior cumulative distribution function estimated by combining profile likelihoods from imputed data for the parameters corresponding to age, sex, OCCU, SELF, FAMI, LEIS and the intercept, respectively. In each of these plots, panel (A) shows the approximated posterior cumulative distribution functions F ( ) , and the completed-data approximated posteriors F(l ) ( ); l 1,..., 200 ; corresponding to the 200 imputations. Panel (B) compares deviates from a normal approximation with the mean and standard error estimated by Rubin’s rules (x-axis) and corresponding deviates from F ( ) . Panel (C) shows the back-transformed pooled relative profile penalized likelihood functions Dˆ * ( ) / 2 , with Dˆ * ( ) [ 1 ( F ( ))]2 , which are useful to detect assymmetry in the posterior distribution. Finally, panel (D) shows the normalized posterior density f ( ) , given by numerical estimation of F ( ) / . These plots reveal almost perfect coincidence with a Gaussian for the variable age (Supplementary Fig. 1), and only modest deviation from a Gaussian distribution for the intercept (Supplementary Fig. 7), and for the parameters corresponding to sex (Supplementary Fig. 2), OCCU (Supplementary Fig. 3) and FAMI (Supplementary Fig. 4). For these variables, application of Rubin’s rules would be approximately justified. However, for SELF (Supplementary Fig. 5) the normal approximation is questionable, and for LEIS (Supplementary Fig. 6) it is clearly unreliable. These plots also reveal that the higher proportion of missing values in variable LEIS (Supplementary Fig. 6) causes a broader variation of the completed data posteriors than for the other variables, where no or only few missing values occurred. 8 Supplementary Fig. 1: Alzheimer study: CLIP analysis of the regression parameter corresponding to variable age. The regression parameter is denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 9 Supplementary Fig. 2: Alzheimer study: CLIP analysis of the regression parameter corresponding to variable sex. The regression parameter is denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 10 Supplementary Fig. 3: Alzheimer study: CLIP analysis of the regression parameter corresponding to variable OCCU. The regression parameter is denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 11 Supplementary Fig. 4: Alzheimer study: CLIP analysis of the regression parameter corresponding to variable SELF. The regression parameter is denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 12 Supplementary Fig. 5: Alzheimer study: CLIP analysis of the regression parameter corresponding to variable FAMI. The regression parameter is denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 13 Supplementary Fig. 6: Alzheimer study: CLIP analysis of the regression parameter corresponding to variable LEIS. The regression parameter is denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 14 Supplementary Fig. 7: Alzheimer study: CLIP analysis of the intercept parameter, denoted by . (A) completed data (gray lines) and averaged (black line) posterior cumulative distribution function obtained by chi-squared approximation from profile penalized likelihood (B) Q-Q plot of normal deviates by Rubin’s Rules approximation and deviates from averaged posterior cumulative distribution function), (C) completed data (gray lines) and pooled (black line) relative profile penalized likelihood (D) completed data (gray lines) and averaged (black line) posterior density (derivative of (A)), normalized to a maximum of 1 15 MCMC analysis of the Alzheimer case-control study This section describes the MCMC analysis of the case-control study on Alzheimer’s disease. MCMC analysis was carried out in two steps: first, the Markov chain was run on the first 10 imputed data sets only, but with a chain length of 100,000. From these, we examined Raftery-Lewis [1] diagnostics and autocorrelation to decide on 1) the number of burn-in iterations, 2) the number of effective iterations needed to estimate the 2.5th percentile of the posterior with adequate precision, and 3) the amount of thinning needed to eliminate autocorrelation. Since the Raftery and Lewis diagnostics typically need a longer chain to arrive at reliable estimates of the minimum number of iterations required, we evaluated them only for the first 10 imputed data sets, but with a chain length of 100,000. Moreover, we monitored the autocorrelation time 1 2 k . The variable k denotes autocorrelation of lag k. The sum is over k=1 to k*, where k* is such that all higher-order autocorrelation terms are lower than 0.05 [2]. The rounded estimated autocorrelation time was used as thinning factor of the Markov chains, ensuring that no relevant autocorrelation existed in the finally evaluated chains. The initial run was called using: proc genmod data=dataalz descending; ods output autocorr=alz.autocorr_alz ess=alz.ess_alz; model a1 = age10 a3 a7 a12 a14 a15 / d=bin link=logit; bayes seed=17 coeffprior=jeffreys nmc=100000 thin=1 nbi=1000 outpost=posterior10alz diag=(autocorr(lags=1 2 3 4 5) ess); by X_imputation_; where X_imputation_<= 10; run; Supplementary Tables 5 and 6 summarize the MCMC diagnostics for the first 10 imputed versions of the Alzheimer data set. Across the six parameters corresponding to risk factors, the maximum number of burn-iterations was 6 (possibly because with a Jeffreys prior, SAS can start the chain at the analytically determined posterior mode), and the maximum number of effective iterations was 8,908. Maximum autocorrelation time was 3.2. These numbers were not apparently different for the intercept parameter. For analysis of the 200 imputed versions of the Alzheimer data set, we decided to use 100 burn-in iterations, 9,000 effective iterations and a thinning factor of 3. This yields a total number of 100+3*9,000=27,100 MCMC iterations. The final run was called using: proc genmod data=dataalz descending; ods output autocorr=alz.autocorr200_alz geweke=alz.geweke200_alz gelman=alz.gelman200_alz heidelberger=alz.heidelberger200_alz ess=alz.ess200_alz; model a1 = age10 a3 a7 a12 a14 a15 / d=bin link=logit; bayes seed=17 coeffprior=jeffreys nmc=27000 thin=3 nbi=100 outpost=posterior10_alz 16 diag=(autocorr(lags=1 2 3 4 5) heidelberger geweke gelman by X_imputation_; run; ess); Supplementary Table 5: Results of Raftery-Lewis diagnostics as output by SAS/PROC GENMOD over 10 imputed versions of the Alzheimer data set, each with a chain length of 100,000. Variables in the Table include, nBurn (necessary number of burn-in iterations), median and maximum; nTotal (necessary number of total iterations), median and maximum. Parameter N Obs Variable Label N Median Maximum SELF (A12) 10 nBurn nTotal Burn-in 10 5.0000000 6.0000000 Total 10 8273.00 8908.00 FAMI (A14) 10 nBurn nTotal Burn-in 10 4.0000000 5.0000000 Total 10 7884.50 8593.00 LEIS (A15) 10 nBurn nTotal Burn-in 10 3.0000000 6.0000000 Total 10 4351.50 8507.00 Sex (A3) 10 nBurn nTotal Burn-in 10 2.0000000 3.0000000 Total 10 3956.50 4095.00 OCCU (A7) 10 nBurn nTotal Burn-in 10 2.0000000 3.0000000 Total 10 3953.50 4071.00 Intercept 10 nBurn nTotal Burn-in 10 3.0000000 5.0000000 Total 10 4161.00 7878.00 Age (age10) 10 nBurn nTotal Burn-in 10 3.0000000 3.0000000 Total 10 4231.50 4472.00 17 Supplementary Table 6: ‘Autocorrelation time’ [2] (median and max), expressed in units of number of iterations, over 10 imputed versions of the Alzheimer data set as output by SAS/PROC GENMOD. Analysis Variable : CorrTime Autocorrelation Time Parameter N Obs N Median Maximum SELF (A12) 10 10 2.3191092 2.9136267 FAMI (A14) 10 10 1.7964058 2.4611236 LEIS (A15) 10 10 1.6125760 3.1823774 Sex (A3) 10 10 1.7323859 2.7173755 OCCU (A7) 10 10 1.9474537 2.0734939 Intercept 10 10 1.7780654 2.0161502 Age (age10) 10 10 1.6853984 1.9011540 After having determined the burn-ins, the number of iterations and the amount of thinning, we generated MCMC chains for all 200 imputed data sets. From these we determined the Geweke [3], Gelman-Rubin [4] and Heidelberger-Welch [5] statistics for assessing convergence of the chains. Supplementary Figures 8-13 contain the trace plots for the six variables from the first imputed data set. There is no apparent evidence of any convergence issues. Heidelberger-Welch diagnostics employ a Cramer-van Mises test to assess if the chain comes from a covariance stationary process. If the test fails, then the first 10%, say, of the chain could be discarded and the test repeated with the remaining 90%. This could be repeated until a stationary chain is obtained. As shown in Supplementary Table 7, the Cramer-von Mises test flagged only few chains as non-stationary. Trace plots from chains where the stationarity test failed were reviewed (Supplementary Fig. 14 - 18). There was no apparent evidence for relevant convergence issues, and we attribute these rare occurrences of non-stationarity to random fluctuations. Supplementary Table 7: Results from Stationarity Test (Heidelberger-Welch diagnostics) Variable Number of imputations where stationarity test failed (P<0.05) 0 (0%) 0 (0%) 0 (0%) 2 (1%) 3 (1.5%) 0 (0%) 0 (0%) Intercept Age Sex (A3) OCCU (A7) SELF (A12) FAMI (A14) LEIS (A15) 18 Gelman and Rubin diagnostics are based on multiple chains and compare the variance within chains to the variance between chains. Essentially, the Gelman-Rubin statistic should be close to 1, indicating equality of the two variances. To compute Gelman-Rubin diagnostics, at least two additional chains have to be run, which increases the computing time by a factor of 3. In our implementation, we used different initial values for each chain. In SAS, an upper 97.5% confidence bound for the Gelman-Rubin statistic is supplied. We depicted the distribution of the upper bound over the 200 imputations in histograms, which are shown in Supplementary Fig. 19-22. In none of the parameters and none of the imputed data sets did the distribution of the Gelman-Rubin statistic show any relevant deviation from its expected value of 1. Finally, Geweke z tests were computed, which compare the mean parameter value between the first and the second half of the chain. Results are shown in Table 8. Under the null hypothesis that there is no difference, 5% of significant tests would be expected. Overall, in 6.2% of the 1400 chains there was a significant result. Summarizing, careful inspection of various MCMC diagnostics offered by SAS/PROC GENMOD in all imputed data sets confirmed that the chains have reached their stationarity distribution, i.e., they approximately converged to the posterior distribution to be estimated. Thus, we conclude that the results obtained from mixing the 200 chains should be reliable. Supplementary Table 8: Results from Geweke test for equality of mean parameter value between first and second half of the chain Variable Number of imputations where Geweke test failed (P<0.05) Intercept 15 (7.5%) Age 16 (8%) Sex (A3) 12 (6%) OCCU (A7) 14 (7%) SELF (A12) 9 (4.5%) FAMI (A14) 8 (4%) LEIS (A15) 13 (6.5%) Overall 87 (6.2%) 19 Supplementary Fig. 8: Alzheimer example, trace plot of variable age in imputed data set 1 age10 2 1 0 -1 -2 0 1000 2000 3000 4000 5000 Iteration 20 6000 7000 8000 9000 10000 Supplementary Fig 9: Alzheimer example, trace plot of variable sex (variable name A3) in imputed data set 1 A3 4 3 2 1 0 -1 -2 0 1000 2000 3000 4000 5000 Iteration 21 6000 7000 8000 9000 10000 Supplementary Fig. 10: Alzheimer example, trace plot of variable OCCU (A7) in imputed data set 1 A7 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 0 1000 2000 3000 4000 5000 Iteration 22 6000 7000 8000 9000 10000 Supplementary Fig. 11: Alzheimer example, trace plot of variable SELF (A12) in imputed data set 1 A12 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -1.1 -1.2 -1.3 -1.4 -1.5 -1.6 -1.7 -1.8 -1.9 -2 0 1000 2000 3000 4000 5000 Iteration 23 6000 7000 8000 9000 10000 Supplementary Fig. 12: Alzheimer example, trace plot of variable FAMI in imputed data set 1 A14 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -1.1 -1.2 0 1000 2000 3000 4000 5000 Iteration 24 6000 7000 8000 9000 10000 Supplementary Fig. 13: Alzheimer example, trace plot of variable LEIS in imputed data set 1 A15 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -1.1 -1.2 -1.3 -1.4 -1.5 -1.6 -1.7 -1.8 0 1000 2000 3000 4000 5000 Iteration 25 6000 7000 8000 9000 10000 Supplementary Fig. 14: Trace plot of variable OCCU (A7) in imputed data set 93. A7 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 0 1000 2000 3000 4000 5000 Iteration 26 6000 7000 8000 9000 10000 Supplementary Fig. 15: Trace plot of variable OCCU (A7) in imputed data set 126. A7 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 0 1000 2000 3000 4000 5000 Iteration 27 6000 7000 8000 9000 10000 Supplementary Fig. 16: Trace plot of variable SELF (A12) in imputed data set 79. A12 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -1.1 -1.2 -1.3 -1.4 -1.5 -1.6 -1.7 -1.8 -1.9 -2 -2.1 -2.2 0 1000 2000 3000 4000 5000 Iteration 28 6000 7000 8000 9000 10000 Supplementary Fig. 17: Trace plot of variable SELF (A12) in imputed data set 142. A12 -0.1 -0.2 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -1.1 -1.2 -1.3 -1.4 -1.5 -1.6 -1.7 -1.8 -1.9 -2 -2.1 0 1000 2000 3000 4000 5000 Iteration 29 6000 7000 8000 9000 10000 Supplementary Fig. 18: Trace plot of variable SELF (A12) in imputed data set 194. A12 -0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -0.9 -1 -1.1 -1.2 -1.3 -1.4 -1.5 -1.6 -1.7 -1.8 -1.9 -2 -2.1 -2.2 -2.3 -2.4 0 1000 2000 3000 4000 5000 Iteration 30 6000 7000 8000 9000 10000 Supplementary Fig. 19: Alzheimer example, distribution of upper bound of Gelman-Rubin statistic over the 200 imputations for variables SELF (A12, top) and FAMI (A14, bottom) 31 Supplementary Fig. 20: Alzheimer example, distribution of upper bound of Gelman-Rubin statistic over the 200 imputations for variables LEIS (A15, top) and sex (A3, bottom) 32 Supplementary Fig. 21: Alzheimer example, distribution of upper bound of Gelman-Rubin statistic over the 200 imputations for OCCU (A7, top) and the intercept (bottom) 33 Supplementary Fig. 22: Alzheimer example, distribution of upper bound of Gelman-Rubin statistic over the 200 imputations for age 34 Example R code for CLIP analysis CLIP analysis is facilitated by a new version of the R package logistf [6]. Here we provide some code examples illustrating how this package can be used to: (a) obtain CLIP confidence intervals after multiple imputation, (b) obtain confidence intervals based on the pseudo-variance modification of Rubin’s rules, (c) display the profile of the combined posterior to determine whether the assumptions of Rubin’s rules are adequate. We start by using some R code to generate the data for the toy example of Section 2.5 of the main paper: R> R> R> + R> + R> #generate data set with NAs freq=c(5,2,2,7,5,4) y<-c(rep(1,freq[1]+freq[2]), rep(0,freq[3]+freq[4]), rep(1,freq[5]), rep(0,freq[6])) x<-c(rep(1,freq[1]), rep(0,freq[2]), rep(1,freq[3]), rep(0,freq[4]), rep(NA,freq[5]), rep(NA,freq[6])) toy<-data.frame(x=x,y=y) The toy data set now looks as follows: R> toy x y 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 0 1 7 0 1 8 1 0 9 1 0 10 0 0 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 NA 1 18 NA 1 19 NA 1 20 NA 1 21 NA 1 22 NA 0 23 NA 0 24 NA 0 25 NA 0 Next, five imputed versions of the data set are generated: R> set.seed(169) R> toymi<-list(0) R> for(i in 1:5){ + toymi[[i]]<-toy + y1<-toymi[[i]]$y==1 & is.na(toymi[[i]]$x) + y0<-toymi[[i]]$y==0 & is.na(toymi[[i]]$x) + xnew1<-rbinom(sum(y1),1,freq[1]/(freq[1]+freq[2])) + xnew0<-rbinom(sum(y0),1,freq[3]/(freq[3]+freq[4])) + toymi[[i]]$x[y1==TRUE]<-xnew1 35 + + toymi[[i]]$x[y0==TRUE]<-xnew0 } The imputed versions have their NA’s in x replaced by 0’s and 1’s, following the conditional distribution of the observed x, conditional on y. Here we print the first imputed data set: R > toymi[[1]] x y 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 0 1 7 0 1 8 1 0 9 1 0 10 0 0 11 0 0 12 0 0 13 0 0 14 0 0 15 0 0 16 0 0 17 1 1 18 0 1 19 0 1 20 0 1 21 0 1 22 0 0 23 1 0 24 0 0 25 0 0 In the following code, each imputed data set is analysed using logistf to produce a list of logistf model fits: R> fit.list<-lapply(1:5, function(X) logistf(data=toymi[[X]], y~x, pl=TRUE)) For illustration, we summarize the results of the first completed-data analysis: R> fit.list[[1]] logistf(formula = y ~ x, data = toymi[[X]], pl = TRUE) Model fitted by Penalized ML Confidence intervals and p-values by Profile Likelihood coef se(coef) lower 0.95 upper 0.95 Chisq p (Intercept) -0.4795731 0.5144434 -1.5161357 0.4801805 0.9500591 0.3297042 x 1.0986124 0.8677860 -0.4873944 2.8248767 1.8270245 0.1764794 Likelihood ratio test=1.827024 on 1 df, p=0.1764794, n=25 CLIP confidence limits CLIP confidence intervals for the intercept coefficient and the regression coefficient of variable x can simply be computed using a one-line command: R> CLIP.confint(fit.list) 36 CLIP.confint(obj = fit.list) Number of imputations: 5 Iterations, mean: 12.75 max: 17 Confidence level, lower: 2.5 %, upper: 97.5 % Estimate Lower Upper P-value (Intercept) -0.9316852 -2.51968459 0.2696288 0.14081509 x 1.7767921 -0.07298734 3.9234581 0.06041205 The output of the function first gives some general information on the number of imputations found in the input object, and the mean and maximum number of imputations needed to compute the four confidence limits (lower and upper for intercept and x). Then, it provides a table with the pooled regression coefficients (labelled ‘Estimate’), and the lower and upper confidence limits based on the pooled posterior. The P-value directly follows from inverting the confidence interval. Pseudo variance modification of Rubin’s rules Using the R command PVR.confint, confidence intervals based on the pseudo-variance modification of Rubin’s rules can be obtained. The output not only contains the computed limits, but also the lower and upper pseudo variance, which allows a quick check on their agreement. E.g, for variable x the upper pseudo-variance is only about 18.5% higher than the lower one, which means that the assumptions for Rubin’s rules are roughly fulfilled. R> PVR.confint(fit.list) Pseudo-variance modification of Rubins Rules Confidence level: 95 % Estimate Lower Upper Lower pseudo variance Upper pseudo variance (Intercept) -0.9316852 -2.489119 0.4337575 0.6314269 0.4853453 x 1.7767921 -0.244709 3.9780441 1.0637799 1.2613724 Profile of the posterior The profile of the posterior for the regression parameter x, using the CLIP method, can be obtained by the R command: R> xprof<-CLIP.profile(fit.int, variable="x", keep=TRUE) The keep=TRUE directive requests the program to keep all five completed-data profiles in the output object. A convenient plot method allows to plot the profile: R> plot(xprof) While this will display the profile as log likelihood ratio (relative to the maximum), one may alternatively plot the profile as the cumulative distribution function or as a density: R> plot(xprof, “cdf“) R> plot(xprof, “density“) The results of the three plot commands are shown in Supplementary Figures 23-25. 37 -2 -3 -4 -5 -6 Relative log profile penalized likelihood -1 0 Supplementary Fig. 23: CLIP estimate of the pooled posterior (solid black line), and completed-data profile likelihoods (dashed gray lines) for parameter x in the toy example. The scaling of the profiles is in terms of the likelihood ratio statistic (twice the difference to the maximized log likelihood). 0 1 2 38 3 4 0.6 0.4 0.2 0.0 Cumulative distribution function 0.8 1.0 Supplementary Fig. 24: CLIP estimate of the cumulative distribution function of the pooled posterior (solid black line), and completed-data cumulative distribution functions (dashed gray lines) for parameter x in the toy example. 0 1 2 39 3 4 0.6 0.4 0.2 0.0 Posterior density 0.8 1.0 Supplementary Fig. 25: CLIP estimate of the density of the pooled posterior (solid black line), and completed-data densities (dashed gray lines) for parameter x in the toy example. -2 0 2 40 4 6 While these graphs may already provide some guidance for checking the adequacy of Rubin’s rules in this example, many researchers are familiar with checking the normal distribution by means of Q-Q plots. By computing the pooled variance following Rubin’s rules with the pool.RR function of logistf, one may generate such a plot with a few simple commands. A pooled analysis by Rubin’s rules of the five completed-data analyses is obtained by R> RR.sum<-summary(pool.RR(fit.list)) R> RR.sum est se t df Pr(>|t|) lo 95 hi 95 nmis fmi lambda (Intercept) -0.9316852 0.7466744 -1.247780 42.31076 0.2189735 -2.438207 0.5748368 NA 0.3074713 0.3074713 x 1.7767921 1.0930820 1.625488 39.22362 0.1120679 -0.433772 3.9873562 NA 0.3193421 0.3193421 We extract mean and standard error for the regression coefficient of x from this table, and use them to generate normal deviates according to the CDF values that are already contained in the profile object created above. The normal deviates can then be plotted against the CLIP deviates to see if there is a relevant disagreement. R> R> R> R> + + R> m<-RR.sum[2,1] s<-RR.sum[2,2] normq<-qnorm(prof$cdf)*s+m plot(normq, prof$beta, xlab="Normal deviate", ylab="Pooled posterior deviate", xlim=quantile(c(normq, prof$beta),c(0,1)), ylim=quantile(c(normq,prof$beta),c(0,1))) lines(normq,normq,lty=1,col="gray") Although the disagreement between the normal deviates and the CLIP deviates is not too large, the lower limit is considerably closer to 0 by the CLIP method than by the normal approximation. Since the lower limit is in the area where the Q-Q plot shows the largest disagreement, it may be safer to prefer the CLIP method in this illustrative example. However, it should be emphasized again that generally a higher number of imputations is recommended (at least 100), in particular with data sets as small as this one. 41 2 1 0 -1 Pooled posterior deviate 3 4 Supplementary Fig. 26: Q-Q plot of normal deviates based on Rubin’s rules and deviates from the CLIP estimate of the CDF of the posterior -1 0 1 2 Normal deviate 42 3 4 References [1] Raftery, A. E. and Lewis, S. M. (1992), “One Long Run with Diagnostics: Implementation Strategies for Markov Chain Monte Carlo,” Statistical Science, 7, 493–497. [2] Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. (1998), “Markov Chain Monte Carlo in Practice: A Roundtable Discussion,” The American Statistician, 52, 93–100. [3] Geweke, J. (1992), “Evaluating the Accuracy of Sampling-Based Approaches to Calculating Posterior Moments,” in J. M. Bernardo, J. O. Berger, A. P. Dawiv, and A. F. M. Smith, eds., Bayesian Statistics, volume 4, Oxford, UK: Clarendon Press. [4] Gelman, A. and Rubin, D. B. (1992), “Inference from Iterative Simulation Using Multiple Sequences,” Statistical Science, 7, 457–472. [5] Heidelberger, P. and Welch, P. D. (1981), “A Spectral Method for Confidence Interval Generation and Run Length Control in Simulations,” Communication of the ACM, 24, 233–245. [6] Heinze G, Ploner M, Dunkler D, Southworth H,. logistf: Firth’s bias reduced logistic regression. R package version 1.20. available at: http://cran.r-project.org/web/packages/logistf/index.html (16 May 2013). 43