Hierarchical Normal Linear Models With Group

advertisement
Hierarchical Normal Linear Models With
Group Specific Intercepts:
A Frequentist and Bayesian Perspective
22S:138 Bayesian Statistics
Lizette Ortega
Kristi Swanson
Mitch Thomann
Introduction
The goal of this analysis is to compare Frequentist and Bayesian linear regression
analyses on hierarchical normal linear models with a subject specific intercept. The data will be
simulated so that the true values of the regression parameters, as well as the variance
components, are known. Then, comparisons of biases of the regression parameters and their
interval coverage will be assessed for each method. The Bayesian analysis will be carried out
with three different sets of priors to see how the priors affect the parameter estimates. Also, 95%
confidence intervals for each parameter will be reported to determine how often the true
parameter is contained in the interval. This will be repeated on 25 identically simulated datasets,
and average results will be reported.
Methods
Simulation:
This simulation study used data that was generated using the R statistical package. We
generated 25 different datasets simulating repeated measures data using the following method.
There were ten different groups that each had five observations. Each of these groups had a
subject specific intercept, which was generated as a random normal with mean 0 and variance
100. Each observation also has a random error, which was drawn from a normal with mean 0
and variance 25. There were two predictor variables for each observation, one which was binary
and another that was continuous. The response variable was a linear combination of the
intercept, five times the continuous variable, negative 25 times the categorical variable, the group
specific intercept, and the random error.
Summary of the simulation:
𝛼𝑖 ~ N(0,100)
eij ~ N(0,25)
x1 ~ Uniform(10,30)
x2 ~ Bernoulli(.25)
yij = 50 + 5x2 – 25x3 + αi + eij
i = 1,…,10
j = 1,…,5
Frequentist Analysis:
SAS version 9.2 was used to carry out the Frequentist analysis of the simulated data. The
MIXED procedure was utilized to ascertain parameter estimates, standard errors, and confidence
intervals, as well as the covariance matrices for the parameters. Because subjects’ data was
generated in such a way to suggest correlation between observations, we specified an
unstructured correlation matrix in the procedure.
The bias of each parameter estimate and variance component was evaluated by
comparing the average value of the estimate to the true value. So the bias results were calculated
as follows:
Bias(x2) = 5 – 𝛽̂ 1
Bias(x3) = - 25 – 𝛽̂ 2
The following code is an example of the proc mixed procedure in SAS that was used:
proc mixed data = mylib.set1 method=ml;
class person;
model y1 = x2_1 x3_1 / s covb covbi cl;
random person;
run;
The class statement is used to specify the ten different subjects of the dataset. The model
statement specifies the regression model and inputs the observed values for y, x2, and x3. The
options listed after the model statement allows retrieval of the fixed effects parameters, the
covariance matrix, the inverse covariance matrix, and the confidence limits respectively. Finally,
the random statement is used in SAS so that the random group effects are taken in to account and
to retrieve estimates for the variance components.
Bayesian Analysis:
Using R and Winbugs, the 25 dataset were analyzed using a hierarchical normal linear
model as follows:
yij| α0i, α1i, α2i ~ N(α0i + α1ix1ij + α2ix2ij, σy2)
α0i
𝛽0
𝛽0
𝛼
[ 1𝑖 ] | [𝛽1 ] , ∑𝛼 ~ N3( [𝛽1 ] , ∑−1
𝛼 )
𝛼2𝑖
𝛽2
𝛽2
𝛽0
πœ‡0
πœ‡0
[𝛽1 ] | [πœ‡1 ], ∑0 ~ N3( [πœ‡1 ] , ∑−1
0 )
πœ‡
πœ‡
𝛽2
2
2
σy2 ~ Inv-Gamma(α,β)
∑−1
𝛼 ~ Wishart(R[3,3],6)
The parameters that we are primarily interested in this analysis are the regression
coefficients, βi. Using Winbugs, we checked that every prior converged for some of the data
sets. We found that all of the parameters converged after a burn-in of 20,000 iterations for all of
the different priors. Some may have converged before 20,000 iterations, but we chose the largest
number as the burn-in since we knew it would work for all of the priors. The BGR diagnostics
and auto-correlation plots were used to check for convergence. Below is an example of
convergence for the prior from Table 4. All of the priors converged similarly to this.
BGR Plots for Results from Table 4
mu.beta[1] chains 1:3
mu.beta[2] chains 1:3
1.5
1.5
1.0
1.0
0.5
0.5
0.0
0.0
20000
22000
24000
20000
iteration
22000
24000
iteration
mu.beta[3] chains 1:3
1.5
1.0
0.5
0.0
20000
22000
24000
iteration
Auto-correlation plots for Table 4
mu.beta[1] chains 1:3
mu.beta[2] chains 1:3
1.0
0.5
0.0
-0.5
-1.0
1.0
0.5
0.0
-0.5
-1.0
0
20
40
0
lag
20
40
lag
mu.beta[3] chains 1:3
1.0
0.5
0.0
-0.5
-1.0
0
20
40
lag
Results
The following tables contain the Frequentist results.
Table 1. Frequentist Regression Parameters and Estimate Bias Results
Parameter
Average True
Value
Average Bias
MSE
Interval
Coverage
Average
Interval
𝛽0
𝛽1
𝛽2
2
πœŽπ‘π‘’π‘‘π‘€π‘’π‘’π‘›
2
πœŽπ‘€π‘–π‘‘β„Žπ‘–π‘›
49.81086
5.016578
-25.7439
90.23633
25.2558
0.18914
-0.01651
0.74386
9.763672
-0.2558
14.55445
0.01430536
3.7889
Width
19.10194
0.54701
7.538995
96%
100%
96%
These results indicate that the Frequentist analysis performed does a fairly good job of
estimating the parameters. The results in Table 1, show that the average bias of both of the
regression parameters is fairly close to 0 and the confidence intervals for all parameters has near
universal coverage (96% for β0, 100% for β1, 96% for β2).
Table 2. Bayesian Regression Parameters and Estimate Bias with prior:
𝛽0
πœ‡0
0
1.0−6
−1
−1
πœ‡
𝛽
[ 1 ] | [ 1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0
πœ‡2
𝛽2
0
0
0
1.0−6
0
0
0 ]
1.0−6
σy2 ~ Inv-Gamma(.001,.001)
∑−1
𝛼 ~
. 01
Wishart(R[3,3],6), where R = [ 0
0
Parameter
𝛽0
𝛽1
𝛽2
0
100
0
0
0 ]
100
Average True
Value
Average Bias
MSE
Interval
Coverage
35.631667
5.677392
-23.76716
14.368333
-0.67739
-1.23293
222.902892
0.51135
17.2062
32%
100%
100%
Table 3. Bayesian Regression Parameters and Estimate Bias with prior:
𝛽0
πœ‡0
0
1.0−6
−1
−1
[𝛽1 ] | [πœ‡1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0
πœ‡2
𝛽2
0
0
σy2 ~ Inv-Gamma(.001,.001)
0
1.0−6
0
0
0 ]
1.0−6
Average
Interval
Width
25.83636
4.29651636
20.44518
100
∑−1
~
Wishart(R[3,3],6),
where
R
=
[
0
𝛼
0
Parameter
𝛽0
𝛽1
𝛽2
0
.1
0
0
0]
.1
Average True
Value
Average Bias
MSE
Interval
Coverage
50.621129
4.991935
-26.995
-0.621129
0.00807
1.99532
30.8059225
0.07933
13.9455
100%
100%
100%
Average
Interval
Width
24.495
1.15123
14.260267
Table 4. Bayesian Regression Parameters and Estimate Bias with prior:
𝛽0
πœ‡0
0
1.0−5
−1
−1
[𝛽1 ] | [πœ‡1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0
πœ‡2
𝛽2
0
0
0
1.0−5
0
0
0 ]
1.0−5
σy2 ~ Inv-Gamma(.001,.001)
100
∑−1
~
Wishart(R[3,3],6),
where
R
=
[
0
𝛼
0
Parameter
𝛽0
𝛽1
𝛽2
0
100
0
0
0 ]
100
Average True
Value
Average Bias
MSE
Interval
Coverage
49.7877844
5.01447792
-25.0088236
0.2122156
-0.0144779
0.008824
33.17405895
0.088343731
12.94222
100%
100%
100%
Table 5. Bayesian Regression Parameters and Estimate Bias with prior:
𝛽0
πœ‡0
0
1.0−5
−1
−1
πœ‡
[𝛽1 ] | [ 1 ], ∑0 ~ N3(µ, ∑0 ), where µ = [0] and ∑0 = [ 0
πœ‡2
𝛽2
0
0
σy2 ~ Inv-Gamma(.001,.001)
100
∑−1
~
Wishart(R[3,3],6),
where
R
=
[
0
𝛼
0
0
. 01
0
0
0 ]
. 01
0
1.0−5
0
0
0 ]
1.0−5
Average
Interval
Width
27.18472
4.01021084
18.7669
Parameter
𝛽0
𝛽1
𝛽2
Average True
Value
Average Bias
MSE
Interval
Coverage
49.895778
4.89836184
-16.4872
0.104222
0.10163916
-8.51281
33.08919386
0.1013463
79.99366
100%
100%
44%
Average
Interval
Width
26.118114
1.2473464
19.1024
With the first two priors, (Tables: 2,3), the model is doing a good job of estimating β1 and
β2 (nearly 100% interval coverage and fairly small average bias). The third prior, (Table 4), did
a good job of estimating all three parameters (100% interval coverage and a smaller average
bias). However, with the last prior, the Bayesian method is still doing a good job at estimating β1
(96% interval coverage), but a very poor job estimating β2 (44% interval coverage). Each
estimate of β2 in this model is greater than the true β2, with an absolute average difference of
8.51. This indicates that this model is not properly fit in some way.
Bias estimates of the Frequentist model were tested for a significant difference against
each Bayesian prior using a two-sample t-test. Variances were pooled for the Frequentist model
versus the last prior (Table 5). All other comparisons had significantly different variances, and
thus, were not pooled. The resulting p-values are summarized in Table 6.
Table 6: Two Sample T-Test p-values testing for significant difference
Comparison
H0: 𝛽2,𝐹 = 𝛽2,𝐡
H1: 𝛽2,𝐹 ≠ 𝛽2,𝐡
P < 0.0001
Frequentist vs. 1st prior
(Table 2)
Frequentist vs. 2nd prior
P = 0.69535
(Table 3)
Frequentist vs. 3rd prior
P = 0.97465
(Table 4)
Frequentist vs. 4th prior
P = 0.083689
(Table 5)
*𝛽2,𝐹 represents the Frequentist β2 estimate (similar for β3)
H0: 𝛽3,𝐹 = 𝛽3,𝐡
H1: 𝛽3,𝐹 ≠ 𝛽3,𝐡
P = 0.03283
P = 0.099683
P = 0.37671
P <0.001
*𝛽2,𝐡 represents the Bayesian β2 estimate (similar for β3)
Discussion
These results indicate that there is a difference in the results that can be ascertained via
Frequentist vs. Bayesian methods when modeling linear models. The Frequentist models
converged easily and had more accurate estimates of the parameters. We found that the
Frequentist method had significantly smaller bias for β2 versus the first prior used in the bayesian
analysis(Table 2). The Frequentist method also had significantly smaller bias for β3 versus the
first prior (Table 2), as well as the last prior (Table 5). All other comparison yielded no
significant difference between the two methods (Table 6). Also, the Frequentist estimates were
more precise, as the confidence intervals were narrower, on average, than the Bayesian credible
intervals. We found, as evidenced by the results, that the Bayesian methods varied greatly
depending upon which priors were chosen for the variance components. In fact, when a prior
was fit that closely resembled the actual structure that was used to generate the data, the last two
priors, the estimates for the parameters were worse than the first prior, especially with respect to
estimating β2.
Although, these are valid criticisms of the Bayesian analysis, in the model that we fit in
SAS for the Frequentist estimates had an inherent advantage over the Bayesian models. The
Frequentist model allowed for no group-to-group variation in slopes for the regression analysis,
only group specific intercepts were allowed. However, with the Bayesian results, we allowed the
slopes to vary between groups because, in practice, we would not know the exact relationship
between groups. Although, we attempted to minimize this by setting the Omega prior to have
small values for the variance of these components. Therefore, it is not surprising that the
Bayesian estimates suffered due to this. Another thing that may have hurt the Bayesian analysis
were that the priors on the mean of beta parameters were set to zero. These values were chosen
because we wanted to simulate a real data situation where we have no prior information about
the slopes and the intercepts of our model. Consequently, all of the estimates of the beta
coefficients were shrunk towards zero. For these reasons, we may have had more success
choosing a different model and prior.
Furthermore, if we were going to pursue comparing Frequentist and Bayesian modeling
again using hierarchical normal linear models, it would be great if we were able to compare the
estimated random error and group specific random error between the two methods. We were
able to find estimates of the error using the mixed procedure in SAS, but when estimating these
quantities in WINBUGS, we were finding results that were too incongruous to report. This
could indicate a problem in our model fit, despite diagnostics that indicated otherwise. Most
likely, the problem was that since we allowed the variance of the between group slopes to vary,
these variances were absorbing some of the variance that should have been attributed to the
intercept.
Lastly, the problems that we encountered fitting the proper prior influenced our results
quite drastically. Knowing how greatly different priors can influence the inference being made,
you must be careful when fitting Bayesian models. If, in practice, we had no idea of the true
values of the regression model, and we had just fit the model using the last set of priors, we
would have very biased results for β2. Therefore, an important lesson to be learned is to not only
assess convergence, but to carefully determine what the appropriate prior is before fitting your
model, as well as be mindful of how the choice a prior can drastically affect your results.
Conclusion
Both Frequentist and Bayesian models estimated the values of the regression parameters
well in our simulation study. This is reinforced by how often the true parameters were within the
95% intervals for both methods. Other than the last prior, all models had over 95% parameter
coverage of the intervals. Also, if a different Bayesian model had been fit, which only allowed
for the intercept variance to vary between groups, the Bayesian model most likely would have
performed better. However, the Frequentist methods seemed to be a bit more accurate in certain
comparisons and simpler to perform, so we would lean to using them if needing to fit a
hierarchical linear model in the future.
Download