Hierarchical Normal Linear Models With Group Specific Intercepts: A Frequentist and Bayesian Perspective 22S:138 Bayesian Statistics Lizette Ortega Kristi Swanson Mitch Thomann Introduction The goal of this analysis is to compare Frequentist and Bayesian linear regression analyses on hierarchical normal linear models with a subject specific intercept. The data will be simulated so that the true values of the regression parameters, as well as the variance components, are known. Then, comparisons of biases of the regression parameters and their interval coverage will be assessed for each method. The Bayesian analysis will be carried out with three different sets of priors to see how the priors affect the parameter estimates. Also, 95% confidence intervals for each parameter will be reported to determine how often the true parameter is contained in the interval. This will be repeated on 25 identically simulated datasets, and average results will be reported. Methods Simulation: This simulation study used data that was generated using the R statistical package. We generated 25 different datasets simulating repeated measures data using the following method. There were ten different groups that each had five observations. Each of these groups had a subject specific intercept, which was generated as a random normal with mean 0 and variance 100. Each observation also has a random error, which was drawn from a normal with mean 0 and variance 25. There were two predictor variables for each observation, one which was binary and another that was continuous. The response variable was a linear combination of the intercept, five times the continuous variable, negative 25 times the categorical variable, the group specific intercept, and the random error. Summary of the simulation: πΌπ ~ N(0,100) eij ~ N(0,25) x1 ~ Uniform(10,30) x2 ~ Bernoulli(.25) yij = 50 + 5x2 – 25x3 + αi + eij i = 1,…,10 j = 1,…,5 Frequentist Analysis: SAS version 9.2 was used to carry out the Frequentist analysis of the simulated data. The MIXED procedure was utilized to ascertain parameter estimates, standard errors, and confidence intervals, as well as the covariance matrices for the parameters. Because subjects’ data was generated in such a way to suggest correlation between observations, we specified an unstructured correlation matrix in the procedure. The bias of each parameter estimate and variance component was evaluated by comparing the average value of the estimate to the true value. So the bias results were calculated as follows: Bias(x2) = 5 – π½Μ 1 Bias(x3) = - 25 – π½Μ 2 The following code is an example of the proc mixed procedure in SAS that was used: proc mixed data = mylib.set1 method=ml; class person; model y1 = x2_1 x3_1 / s covb covbi cl; random person; run; The class statement is used to specify the ten different subjects of the dataset. The model statement specifies the regression model and inputs the observed values for y, x2, and x3. The options listed after the model statement allows retrieval of the fixed effects parameters, the covariance matrix, the inverse covariance matrix, and the confidence limits respectively. Finally, the random statement is used in SAS so that the random group effects are taken in to account and to retrieve estimates for the variance components. Bayesian Analysis: Using R and Winbugs, the 25 dataset were analyzed using a hierarchical normal linear model as follows: yij| α0i, α1i, α2i ~ N(α0i + α1ix1ij + α2ix2ij, σy2) α0i π½0 π½0 πΌ [ 1π ] | [π½1 ] , ∑πΌ ~ N3( [π½1 ] , ∑−1 πΌ ) πΌ2π π½2 π½2 π½0 π0 π0 [π½1 ] | [π1 ], ∑0 ~ N3( [π1 ] , ∑−1 0 ) π π π½2 2 2 σy2 ~ Inv-Gamma(α,β) ∑−1 πΌ ~ Wishart(R[3,3],6) The parameters that we are primarily interested in this analysis are the regression coefficients, βi. Using Winbugs, we checked that every prior converged for some of the data sets. We found that all of the parameters converged after a burn-in of 20,000 iterations for all of the different priors. Some may have converged before 20,000 iterations, but we chose the largest number as the burn-in since we knew it would work for all of the priors. The BGR diagnostics and auto-correlation plots were used to check for convergence. Below is an example of convergence for the prior from Table 4. All of the priors converged similarly to this. BGR Plots for Results from Table 4 mu.beta[1] chains 1:3 mu.beta[2] chains 1:3 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 20000 22000 24000 20000 iteration 22000 24000 iteration mu.beta[3] chains 1:3 1.5 1.0 0.5 0.0 20000 22000 24000 iteration Auto-correlation plots for Table 4 mu.beta[1] chains 1:3 mu.beta[2] chains 1:3 1.0 0.5 0.0 -0.5 -1.0 1.0 0.5 0.0 -0.5 -1.0 0 20 40 0 lag 20 40 lag mu.beta[3] chains 1:3 1.0 0.5 0.0 -0.5 -1.0 0 20 40 lag Results The following tables contain the Frequentist results. Table 1. Frequentist Regression Parameters and Estimate Bias Results Parameter Average True Value Average Bias MSE Interval Coverage Average Interval π½0 π½1 π½2 2 ππππ‘π€πππ 2 ππ€ππ‘βππ 49.81086 5.016578 -25.7439 90.23633 25.2558 0.18914 -0.01651 0.74386 9.763672 -0.2558 14.55445 0.01430536 3.7889 Width 19.10194 0.54701 7.538995 96% 100% 96% These results indicate that the Frequentist analysis performed does a fairly good job of estimating the parameters. The results in Table 1, show that the average bias of both of the regression parameters is fairly close to 0 and the confidence intervals for all parameters has near universal coverage (96% for β0, 100% for β1, 96% for β2). Table 2. Bayesian Regression Parameters and Estimate Bias with prior: π½0 π0 0 1.0−6 −1 −1 π π½ [ 1 ] | [ 1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0 π2 π½2 0 0 0 1.0−6 0 0 0 ] 1.0−6 σy2 ~ Inv-Gamma(.001,.001) ∑−1 πΌ ~ . 01 Wishart(R[3,3],6), where R = [ 0 0 Parameter π½0 π½1 π½2 0 100 0 0 0 ] 100 Average True Value Average Bias MSE Interval Coverage 35.631667 5.677392 -23.76716 14.368333 -0.67739 -1.23293 222.902892 0.51135 17.2062 32% 100% 100% Table 3. Bayesian Regression Parameters and Estimate Bias with prior: π½0 π0 0 1.0−6 −1 −1 [π½1 ] | [π1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0 π2 π½2 0 0 σy2 ~ Inv-Gamma(.001,.001) 0 1.0−6 0 0 0 ] 1.0−6 Average Interval Width 25.83636 4.29651636 20.44518 100 ∑−1 ~ Wishart(R[3,3],6), where R = [ 0 πΌ 0 Parameter π½0 π½1 π½2 0 .1 0 0 0] .1 Average True Value Average Bias MSE Interval Coverage 50.621129 4.991935 -26.995 -0.621129 0.00807 1.99532 30.8059225 0.07933 13.9455 100% 100% 100% Average Interval Width 24.495 1.15123 14.260267 Table 4. Bayesian Regression Parameters and Estimate Bias with prior: π½0 π0 0 1.0−5 −1 −1 [π½1 ] | [π1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0 π2 π½2 0 0 0 1.0−5 0 0 0 ] 1.0−5 σy2 ~ Inv-Gamma(.001,.001) 100 ∑−1 ~ Wishart(R[3,3],6), where R = [ 0 πΌ 0 Parameter π½0 π½1 π½2 0 100 0 0 0 ] 100 Average True Value Average Bias MSE Interval Coverage 49.7877844 5.01447792 -25.0088236 0.2122156 -0.0144779 0.008824 33.17405895 0.088343731 12.94222 100% 100% 100% Table 5. Bayesian Regression Parameters and Estimate Bias with prior: π½0 π0 0 1.0−5 −1 −1 π [π½1 ] | [ 1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0 π2 π½2 0 0 σy2 ~ Inv-Gamma(.001,.001) 100 ∑−1 ~ Wishart(R[3,3],6), where R = [ 0 πΌ 0 0 . 01 0 0 0 ] . 01 0 1.0−5 0 0 0 ] 1.0−5 Average Interval Width 27.18472 4.01021084 18.7669 Parameter π½0 π½1 π½2 Average True Value Average Bias MSE Interval Coverage 49.895778 4.89836184 -16.4872 0.104222 0.10163916 -8.51281 33.08919386 0.1013463 79.99366 100% 100% 44% Average Interval Width 26.118114 1.2473464 19.1024 With the first two priors, (Tables: 2,3), the model is doing a good job of estimating β1 and β2 (nearly 100% interval coverage and fairly small average bias). The third prior, (Table 4), did a good job of estimating all three parameters (100% interval coverage and a smaller average bias). However, with the last prior, the Bayesian method is still doing a good job at estimating β1 (96% interval coverage), but a very poor job estimating β2 (44% interval coverage). Each estimate of β2 in this model is greater than the true β2, with an absolute average difference of 8.51. This indicates that this model is not properly fit in some way. Bias estimates of the Frequentist model were tested for a significant difference against each Bayesian prior using a two-sample t-test. Variances were pooled for the Frequentist model versus the last prior (Table 5). All other comparisons had significantly different variances, and thus, were not pooled. The resulting p-values are summarized in Table 6. Table 6: Two Sample T-Test p-values testing for significant difference Comparison H0: π½2,πΉ = π½2,π΅ H1: π½2,πΉ ≠ π½2,π΅ P < 0.0001 Frequentist vs. 1st prior (Table 2) Frequentist vs. 2nd prior P = 0.69535 (Table 3) Frequentist vs. 3rd prior P = 0.97465 (Table 4) Frequentist vs. 4th prior P = 0.083689 (Table 5) *π½2,πΉ represents the Frequentist β2 estimate (similar for β3) H0: π½3,πΉ = π½3,π΅ H1: π½3,πΉ ≠ π½3,π΅ P = 0.03283 P = 0.099683 P = 0.37671 P <0.001 *π½2,π΅ represents the Bayesian β2 estimate (similar for β3) Discussion These results indicate that there is a difference in the results that can be ascertained via Frequentist vs. Bayesian methods when modeling linear models. The Frequentist models converged easily and had more accurate estimates of the parameters. We found that the Frequentist method had significantly smaller bias for β2 versus the first prior used in the bayesian analysis(Table 2). The Frequentist method also had significantly smaller bias for β3 versus the first prior (Table 2), as well as the last prior (Table 5). All other comparison yielded no significant difference between the two methods (Table 6). Also, the Frequentist estimates were more precise, as the confidence intervals were narrower, on average, than the Bayesian credible intervals. We found, as evidenced by the results, that the Bayesian methods varied greatly depending upon which priors were chosen for the variance components. In fact, when a prior was fit that closely resembled the actual structure that was used to generate the data, the last two priors, the estimates for the parameters were worse than the first prior, especially with respect to estimating β2. Although, these are valid criticisms of the Bayesian analysis, in the model that we fit in SAS for the Frequentist estimates had an inherent advantage over the Bayesian models. The Frequentist model allowed for no group-to-group variation in slopes for the regression analysis, only group specific intercepts were allowed. However, with the Bayesian results, we allowed the slopes to vary between groups because, in practice, we would not know the exact relationship between groups. Although, we attempted to minimize this by setting the Omega prior to have small values for the variance of these components. Therefore, it is not surprising that the Bayesian estimates suffered due to this. Another thing that may have hurt the Bayesian analysis were that the priors on the mean of beta parameters were set to zero. These values were chosen because we wanted to simulate a real data situation where we have no prior information about the slopes and the intercepts of our model. Consequently, all of the estimates of the beta coefficients were shrunk towards zero. For these reasons, we may have had more success choosing a different model and prior. Furthermore, if we were going to pursue comparing Frequentist and Bayesian modeling again using hierarchical normal linear models, it would be great if we were able to compare the estimated random error and group specific random error between the two methods. We were able to find estimates of the error using the mixed procedure in SAS, but when estimating these quantities in WINBUGS, we were finding results that were too incongruous to report. This could indicate a problem in our model fit, despite diagnostics that indicated otherwise. Most likely, the problem was that since we allowed the variance of the between group slopes to vary, these variances were absorbing some of the variance that should have been attributed to the intercept. Lastly, the problems that we encountered fitting the proper prior influenced our results quite drastically. Knowing how greatly different priors can influence the inference being made, you must be careful when fitting Bayesian models. If, in practice, we had no idea of the true values of the regression model, and we had just fit the model using the last set of priors, we would have very biased results for β2. Therefore, an important lesson to be learned is to not only assess convergence, but to carefully determine what the appropriate prior is before fitting your model, as well as be mindful of how the choice a prior can drastically affect your results. Conclusion Both Frequentist and Bayesian models estimated the values of the regression parameters well in our simulation study. This is reinforced by how often the true parameters were within the 95% intervals for both methods. Other than the last prior, all models had over 95% parameter coverage of the intervals. Also, if a different Bayesian model had been fit, which only allowed for the intercept variance to vary between groups, the Bayesian model most likely would have performed better. However, the Frequentist methods seemed to be a bit more accurate in certain comparisons and simpler to perform, so we would lean to using them if needing to fit a hierarchical linear model in the future.