Eliminating Aggregation Bias when Estimating Treatment

advertisement
Eliminating Aggregation Bias when Estimating Treatment
Effects on Combined Outcomes with Applications to Quality of
Life Assessment
Ian M. McCarthy∗
Emory University
Department of Economics†
November 2014
Abstract
Researchers are often interested in combined measures such as overall ratings, indices of physical
or mental health, or health-related quality-of-life (HRQoL) outcomes. Such measures are typically
composed of two or more underlying discrete variables. I show that estimating the effect of a
treatment on the combined measure is biased with non-random treatment selection. I provide a
solution to this problem by adopting an alternative estimator that first estimates treatment effects
on the underlying variables and then combines these effects into an overall effect on the combined
outcome of interest.
JEL Classification: C18, C21, I10
Keywords: program evaluation, treatment effects, quality of life, cost effectiveness, comparative effectiveness
Funding: This project was supported by grant number K99HS022431 from the Agency for Healthcare Research and Quality. The content is solely the responsibility of the author and does not necessarily represent
the official views of the Agency for Healthcare Research and Quality.
∗ I would like to thank Dann Millimet, Rusty Tchernis, John Mullahy, and Jon Skinner for comments on early drafts,
as well as the participants of the 2014 American Society of Health Economists Conference.
† Emory University, Rich Memorial Building, Room 306, Atlanta, GA 30322, Email: ian.mccarthy@Emory.edu
1
1
Introduction
In many areas of applied research, an average or some other weighting procedure is used to combine underlying outcomes into a single summary measure. For example, the County Health Rankings
from the University of Wisconsin Population Health Institute are based on a single summary measure
calculated as a weighted combination of five health-related outcome variables (Peppard et al., 2003,
2008; Courtemanche et al., 2013). The practice of combining several underlying outcome variables
into a single summary measure is particularly prevalent in the analysis of individual health outcomes
or health-related quality-of-life (HRQoL) data. For example, studies of physical limitations from the
Health and Retirement Study (HRS) and similar datasets such as the Medical Expenditure Panel
Survey often aggregate several individual discrete responses to generate some total number of limitations or some index of physical functioning, where the index or aggregated variable serves as the
primary outcome of interest (Loprest et al., 1995; Dor et al., 2006; Dave et al., 2008; Haas, 2008).
Researchers adopt a similar approach when analyzing common HRQoL assessments such as the EuroQol 5-dimension (EQ-5D) health outcome survey and the Short Form 6-dimension (SF-6D) health
outcome survey, where the empirical analysis is often based solely on the summary score derived from
aggregating the HRQoL profile into a single measure (Powell, 1984; Austin, 2002; Drummond et al.,
2005; Manca et al., 2005; Brazier & Ratcliffe, 2007; Basu & Manca, 2012).1
These and other aggregate index scores play an increasingly important role in the evaluation of
health care programs and policy, as demonstrated by numerous initiatives and legislative requirements
to collect HRQoL data both internationally and within the U.S. (Department of Health, 2008; Devlin
et al., 2010; Porter, 2010; Ahmed et al., 2012; PCORI, 2012; Selby et al., 2012; Landro, 2012). Examples
of cost and comparative effectiveness studies based on these types of measures abound both in the
health economics and health services research literature.2 And some widely-used datasets such as the
RAND HRS data automatically provide researchers with aggregated indices including a mobility and
activities of daily living index, formed based on the responses of several underlying binary outcomes.3
The implicit assumption is that estimating treatment effects on the aggregate outcome appropriately captures the combined treatment effects across the individual measures. However, the current
study illustrates that this is not necessarily the case, and when confronted with non-random treatment
assignment, an analysis based solely on index scores may yield biased treatment effect estimates. Importantly, this problem is separate from the common distributional issues encountered when analyzing
1 The EQ-5D is a five-dimensional HRQoL profile providing a score of 1 to 3 (or 1 to 5 in other versions of the
questionnaire) in each of five health domains. The SF-6D is a similar metric composed of six health domains (Brazier
et al., 2002; Brazier & Ratcliffe, 2007).
2 The development of cost and comparative effectiveness research in economics is reflected in Garber & Phelps (1997),
Brauer et al. (2006), and Chandra et al. (2011), among many others.
3 RAND HRS data available for download at rand.org/labor/aging/dataprod/hrs-data.html.
2
HRQoL data (Austin, 2002; Manca et al., 2005; Basu & Manca, 2012; Hernández Alava et al., 2012)
or from questions of which weights to adopt in the aggregation function. Rather, the problem arises
more fundamentally from the aggregation process itself.
I propose an alternative empirical approach that first estimates treatment effects on each underlying
outcome and then re-interprets these effects in terms of the overall summary score (McCarthy, 2014).
This two-stage estimator (2SE) therefore effectively reverses the order of the analysis, from aggregating
to a summary score and then estimating treatment effects, to estimating treatment effects and then
aggregating to the summary score. Through a series of Monte Carlo simulations, treatment effects
estimates based solely on the summary score are shown to be biased under non-random treatment
assignment, while the 2SE provides unbiased estimates of treatment effects. In the absence of treatment
selection (e.g., randomly assigned treatment), the 2SE is also shown to provide equivalent treatment
effects estimates to that of existing estimators used in the literature, with minor efficiency gains.
I then provide two applications of the 2SE to the estimation of average treatment effects (ATE)
with health outcomes data. The first application concerns the effect of retirement on physical health in
the U.S., and the second application concerns the effect of complex spine surgery on patient HRQoL.
The results reveal potentially large differences in the estimated ATE when relying solely on the index
versus the 2SE, with ATE estimates differing by as much as 100% in some cases.
The current paper contributes to a growing line of research on the differences in health outcomes
evaluation based on aggregated scores versus individual or joint evaluation of each health outcome
of interest (Mortimer & Segal, 2008; Devlin et al., 2010; Parkin et al., 2010; Gutacker et al., 2013).
However, the main concern in many of these studies is the choice of aggregation algorithm. Instead,
I focus on a more fundamental problem introduced by the aggregation process itself, regardless of
the weights adopted. The results show that the distinction between analyses based on the underlying
outcomes versus the combined outcome is not merely a normative issue surrounding the aggregation
technique (e.g., how and for whom to measure preferences across health domains); but rather, a reliance
solely on the combined outcome may yield biased treatment effects estimates when using observational
data. I discuss the broad empirical framework and 2SE in Section 2. Details of the Monte Carlo
exercise are presented in Section 3, with applications presented in Section 4. Section 5 concludes.
2
Methods
The primary goal of the current analysis is to estimate the average treatment effect (ATE) on some
combined outcome (or index score) when treatment assignment is non-random. Although several other
treatment effects estimates may ultimately be of interest, I consider the average treatment effect in
3
order to focus the analysis on the specific impact of selection on observables in estimating treatment
effects on combined outcomes. The empirical issues would naturally extend to other treatment effects
such as the average treatment effect on the treated (ATT) or the average treatment effect on the
untreated (ATU), with potentially larger bias for more recent estimators such as the person-specific
treatment effects developed in Basu (2013).
2.1
Two-stage Estimator (2SE)
Rather than rely on an aggregate outcome, I propose a two-stage estimator (2SE) which first estimates
a model of each individual outcome and then re-interprets the coefficients in terms of the effects on
the overall index. For example, a common outcome of interest in the HRS data is the mobility index
ranging from 0 to 5 based on the individual’s self-reported difficultly of walking one block, walking
several blocks, walking across a room, climbing one flight of stairs, and climbing several flights of stairs
Dave et al. (2008). The mobility index is therefore constructed from five individual, binary outcomes.
Denote a vector of D individual outcomes by y = (y1 , ..., yD ), and denote some aggregation function
of these outcomes by f (y). In practice, f (y) is often some form of weighted average, or in the case of
the mobility index discussed above, simply the sum of individual responses. The 2SE first estimates
separate models for each individual dichotomous outcome. I then form predicted probabilities, P̂id ,
for each person i and each outcome d. The overall ATE can then be estimated using the method
of recycled predictions, where I estimate predicted index scores under alternative covariate values in
order to estimate effects of interest (Oaxaca, 1973; Graubard & Korn, 1999; Basu & Rathouz, 2005;
Basu, 2005; Glick, 2007; Kleinman & Norton, 2009). Specifically, denoting treatment status by the
indicator, Ti , I estimate the ATE by:
AT E =
N
X
{f (ŷi |Ti = 1) − f (ŷi |Ti = 0)} ,
(1)
i=1
where f (ŷ|Ti = 1) and f (ŷ|Ti = 0) denote the aggregation function assigned to predicted values of
y1 , ..., yD with and without treatment, respectively.
Although equation 1 generally reflects the estimated ATE from the 2SE, the specific methods
underlying the 2SE will vary by application. In the current paper, I consider two empirical applications:
1) the effect of retirement on physical health; and 2) the effect of surgical treatment on health for
spine surgery patients. The second application is more in-line with a standard cost or comparative
effectiveness study, while the first application revisits a common health economics question in light of
the potential biases introduced through aggregation and highlighted in the Monte Carlo simulations
in Section 3. In the remainder of this section, I discuss the specific form of the 2SE for each of these
4
applications as well as the traditional empirical methods adopted for comparison purposes.
2.2
Retirement and Health
Measuring health based on the mobility index discussed previously, the aggregation function is simply
P5
the sum of the individual binary outcomes, f (y) =
d=1 yd . Denoting the index value by ỹ, and
assuming panel data consistent with the HRS data, a linear fixed or random effects model is a common
approach to estimating the effect of retirement on mobility (Mein et al., 2003; Van Solinge, 2007; Dave
et al., 2008). In this case, the regression equation is as follows:
ỹit = νi + xit β + εit ,
(2)
where νi denote individual-level random (fixed) effects, and for simplicity, εit is an idiosyncratic error
term with mean 0 and variance σ 2 .
In this setting, the 2SE estimates D = 5 separate regressions - one for each individual binary
outcome. Assuming the random effects, νi , follow a normal distribution with mean 0 and variance σν2 ,
the contribution to the likelihood from person i is:
Z
∞
Li =
−∞
2
2
e−νi /2σν
√
σν 2π
(T
i
Y
)
F (yidt , νi + xit β) dνi ,
t=1
where F (y, z) = 1/(1 + exp(−z)) if y = 1 and F (y, z) = 1/(1 + exp(z)) if y = 0. Maximum likelihood
estimation proceeds with adaptive Gauss-Hermite quadrature.
Although the HRS data are longitudinal by nature, the estimation of fixed effects models using
conditional maximum likelihood introduces practical issues that may cloud the comparison between the
2SE and the standard linear estimators. My analysis using the panel structure of the data is therefore
limited to random effects models. However, even in the case of random effects, the techniques required
for maximum likelihood estimation of the random effects logistic model rely on approximations that
may still call into question the comparability of the 2SE and linear models. I therefore consider an
additional pre-post analysis where I analyze physical health in period t as a function of physical health
in period t − 1, retirement status in period t, and other control variables. For this analysis, I estimate
separate cross-sectional models for each HRS wave, where the first stage of the 2SE estimates a logistic
regression model for each binary outcome, while the linear model is estimated using OLS.
5
2.3
Comparative Effectiveness
The second application I consider is the effect of surgical treatment on HRQoL for patients undergoing
spine surgery. This application relates directly to the growing cost and comparative effectiveness
literature. In this application, HRQoL is measured by the SF-6D, which is described in more detail in
Appendix A.
Details of the 2SE applied to the SF-6D are presented in McCarthy (2014). Generally, I estimate
separate ordered probit models for each HRQoL domain, and use the results of each model to form
predicted probabilities of responses. I denote the predicted probabilities by P̂ijd , for person i, response
j, and domain d. For example, in the physical functioning domain of the SF-6D, the first-stage results
PF
PF
PF
provide six predicted probabilities for each person, P̂i1
, P̂i2
, ..., P̂i6
. Continuing this process for
all domains and adopting the scoring algorithm in Table 1, the probability estimates from the ordered
probit estimation can then be converted to an SF-6D summary score as follows:4
PF
PF
PF
PF
PF
Ŝi = 1 − 0.035 × P̂i2
+ P̂i3
− 0.044 × P̂i4
− 0.056 × P̂i5
− 0.117 × P̂i6
RL
RL
RL
+ P̂i3
+ P̂i4
− 0.053 × P̂i2
(3)
SF
SF
SF
SF
− 0.057 × P̂i2
− 0.059 × P̂i3
− 0.072 × P̂i4
− 0.087 × P̂i5
P ain
P ain
P ain
P ain
P ain
− 0.042 × P̂i2
+ P̂i3
− 0.065 × P̂i4
− 0.102 × P̂i5
− 0.171 × P̂i6
MH
MH
MH
MH
+ P̂i3
− 0.042 × P̂i2
− 0.100 × P̂i4
− 0.118 × P̂i5
V
V
V
V
− 0.071 × P̂i2
+ P̂i3
+ P̂i4
− 0.092 × P̂i5
− 0.061 × P̂ (Most Severe) .
Equation 3 allows the researcher to use all of the available HRQoL information but still offers the
familiar interpretation of effects in terms of the composite score. Applied to HRQoL data, the 2SE
also avoids some of the statistical difficulties in the analysis of HRQoL data that are introduced by
the underlying scoring process (e.g., censoring at the boundaries of the summary score).
Existing methods for the analysis of HRQoL data include standard OLS, variations of the classic
Tobit model, censored least-absolute deviations, Beta MLE, and Beta QMLE (Powell, 1984; Austin,
2002; Basu & Manca, 2012). For comparison with the 2SE, I therefore consider OLS as well as the Beta
MLE and QMLE models proposed in Basu & Manca (2012), where I again estimate the ATE using
the method of recycled predictions. For the Beta MLE and QMLE models, note that the conditional
4 See
McCarthy (2014) for details regarding the estimated probability of a “most severe” outcome in the SF-6D.
6
mean function for estimating the ATE is (Basu & Manca, 2012):
exp xi β̂
.
µ̂i (yi |xi ) =
1 + exp xi β̂
3
(4)
Monte Carlo Simulations
3.1
General Case of Binary Individual Outcomes
I first highlight the problem with a simplified data generating process (DGP) in which a single index
is derived from a sum of five individual binary outcomes. Each individual outcome is generated from
an underlying latent continuous variable,
∗
yid,t=0
= αd + xi βd + εid,t=0 ,
(5)
where d denotes the domain or the individual outcome measure, i denotes a person, t = 0 denotes the
baseline or pre-treatment period, and εid is assumed to follow a normal distribution with µ = 0 and
∗
σ = 1, independent across d.5 The median of the empirical distribution functions of yd,t=0
, denoted
ȳd∗ , is taken as the threshold value for the observed binary outcome. For simplicity, x consists only of
a single, normally distributed covariate with µ = 0 and σ = 1.
Similarly, post-treatment (t = 1) latent outcomes are generated as follows:
∗
yid,t=1
= αd + xi βd + δd Ti + εid,t=1 ,
where δd denotes the treatment effect in domain d, Ti denotes treatment status, and the remaining
variables are similarly defined from equation (5). The observed outcome, yid,t=1 , is then calculated
based on the baseline threshold values, ȳd∗ , and the index score for baseline and post-treatment is
calculated as the sum of all 5 binary outcomes.
Within this structure, I simulate 50 different datasets with alternative degrees of selection (on
observables). First, I generate a random variable for each person from a uniform distribution with
support from 0 to 1, ri ∼ U [0, 1] for all i. Treatment status is then determined by
ρ Ti = 1 ri > Φ xi ∗
50
5 The estimation could be extended to allow for nonzero correlation across domains; however, such an approach would
only impact the efficiency of the estimated coefficients and would not impact the point estimates. Since the current
focus is on the bias introduced through the aggregation process itself, I simplify the DGP and assume the error terms
of the individual measures are independent.
7
for ρ = 0, ..., 50. Each value of ρ therefore represents a different DGP, with ρ = 0 representing the case
of no selection and ρ = 50 representing the highest extent of selection considered.
For each value of ρ, I simulate 100 datasets with 5,000 observations in each simulation. Following
common approaches in the applied literature, I estimate the ATE on the index scores using standard
OLS and compare this to the effect from the 2SE. Results from the simulations are summarized in
Figure 1. The top row of Figure 1 is based on a DGP with identical functional forms for each latent
outcome: αd = 0.5 and βd = 0.5 for all d, and δd = 1. The bottom row considers heterogeneous
treatment effects across domains, with δ = [1, 1.5, 2, .5, 0]0 and all other coefficients unchanged.
Figure 1 clearly illustrates the increasing bias introduced through the selection process. In both
DGPs, even though the common assumption of unconfoundedness holds, the influence of selection on
the individual outcomes is not appropriately accounted for when focusing solely on the index measure.
The 2SE, meanwhile, restores the unbiased estimation of the average treatment effect by estimating
the effects on each individual outcome and then converting into an effect on the index.
Intuitively, the bias when relying on the combined outcome derives from an inherent nonlinearity
that is not accounted for when relying solely on the combined outcome. An OLS model that is
linear in x is therefore misspecified. As a robustness check, I re-estimated the combined outcome
models with additional nonlinear terms in x, including x2 , x3 , and indicator variables for x based on
quartile. The results were largely unchanged from Figure 1. Therefore, although nonlinearities in x
may be the source of the bias, it is unclear how to sufficiently approximate this nonlinearity in a single
regression specification. The 2SE, meanwhile, avoids this problem altogether and explicitly estimates
the probabilities of the every individual outcome, each of which is nonlinear in x.
3.2
Specific Case of the SF-6D
I simulate SF-6D data beginning with a latent continuous variable for each HRQoL domain (d =
∗
1, ..., 6), denoted yid
, specified as a function of a 1 × 2 vector of covariates, xi , a constant, a treatment
indicator Ti , and a normally distributed error term, εid . I further denote by α the 6 × 1 vector of
intercept coefficients, by β the 6 × 2 vector of slope coefficients on x, and by δ the 6 × 1 vector of
treatment effect coefficients.
8
As a baseline DGP, I generate the 6 × 1 vector of latent HRQoL values, yi∗ , as follows:
yi∗ = α + βx0i + δTi + εi , where
ε ∼ N (06×1 , I6×6 ) ,
x1 , x2 ∼ N(0, 1),
α = 0.5 × I6×1 ,
β = I6×1 × [1.5, 1], and
δ = 1.5 × I6×1 .
I extend this baseline DGP by considering larger treatment effects, δ = 3 × I6×1 , and variable
treatment effects across domains, δ = [2, 1, 0.5, 2.5, 0, 1]0 .
I also consider the role of interaction
yi∗
= α + βx0i + δTi + γx0i × Ti + εi , where
terms in the estimated treatment effects by specifying
γ = [1.5, 1, 2.5, 0.5, 2, 0.5]0 - again for three different parameterizations of δ, δ = 1.5×I6×1 , δ = 3×I6×1 ,
and δ = [2, 1, 0.5, 2.5, 0, 1]0 . In all cases, I adopt two alternative possibilities for treatment status: 1)
random treatment assignment, Ti = 1(ri < 0.5) with ri ∼ U[0, 1] ∀i; and 2) selection on observed variables, Ti = 1 (ri < Φ (1.5 − 3x1i )). In total, this yields 12 simulated DGPs, 6 with random treatment
assignment and 6 with selection on observed variables.
Observed HRQoL values, yid for d ∈ (1, 2, 3, 4, 5, 6), are then generated based on the value of the
∗
latent value, yid
, relative to the Jd × 1 vector of threshold values in each domain:
γP F = [−2, −0.6, 0.5, 1.8, 3.2]0
γRL = [−1, 0.6, 2.2]0
γSF = [−1.6, 0, 1.2, 2.6]0
γP = [−1.8, −0.5, 0.5, 1.7, 3.1]0
γM H = [−1.6, −0.1, 1.3, 2.7]0
γV = [−1.7, −0.2, 1.2, 2.8]0 .
Threshold values were selected based on the respective quantile for each domain. For example, the
threshold values in the physical functioning domain are such that approximately 1/6th of the observations fall below -2 in that domain, 1/6th fall between -2 and -0.6, etc. The resulting distributions
of the summary scores are therefore well-behaved and relatively normally distributed between 0.3 and
1 (the minimum and maximum values based on the SF-6D scoring algorithm in Brazier & Ratcliffe
(2007), respectively).
9
For each DGP, I simulate 1,000 datasets consisting of N = 500 observations. As discussed in
Section 2, I estimate the ATE on the summary score with four alternative estimators: 1) 2SE; 2)
standard OLS; 3) the Beta MLE model proposed in Basu & Manca (2012); and 4) the Beta QMLE
also proposed in Basu & Manca (2012). The results are summarized in Table 2.
Table 2
The 2SE consistently provides unbiased estimates of the true ATE across a range of alternative
DGPs. By comparison, ATEs estimated with all other estimators are downward biased. The 2SE
also provides the lowest RMSE in all cases, although the differences in RMSE across estimators are
minimal and statistically insignificant.
4
Empirical Applications
4.1
Retirement and Physical Health
In my first application, I examine the effect of retirement on physical health using data from seven
longitudinal waves of the HRS. The HRS is a biannual survey conducted by the University of Michigan
beginning in 1992. I include all four HRS cohorts in my analysis; the original cohort consisting
of individuals born between 1931 and 1941, as well as additional cohorts consisting of individuals
born before 1924, between 1924 and 1930, and between 1942 to 1947. In order to focus the analysis
specifically on the problem of aggregation, I consider as the measure of physical health the mobility
index as described in Section 2. This index is constructed from the individual’s self-reported difficultly
of walking one block, walking several blocks, walking across a room, climbing one flight of stairs, and
climbing several flights of stairs. Since the underlying mobility measures are only available starting in
wave 2 (1994), my analysis is limited to 1994 through 2012.
In both the random effects and cross-sectional analysis, I consider mobility as a function of age,
race, gender, education, total household income, mother’s age and education, father’s age and education, whether the individual has any form of health insurance, and the individual’s retirement status.
Summary statistics are provided in Table 3, and I discuss the details of these variables and the dataset
construction in Appendix B.
Table 3
10
Results from the random effects model are summarized in Table 4. I focus on the bottom row of
Table 4 which presents the overall ATE for the 2SE and the linear random effects model. Note that a
positive effect implies a worsening of physical health (i.e., increase in mobility difficulties). The results
reveal large relative differences in the estimated ATE with the 2SE versus the linear model, with the
linear model estimating a 20% larger ATE relative to the 2SE.
Table 4
Results from the cross-sectional analysis by HRS wave are illustrated in Figure 2.6 The figure
presents the 90% confidence interval for the estimated ATE of retirement on the mobility index for
each wave, with results for the 2SE and OLS in the left and right panels, respectively. The magnitude
of the effects at each wave are less than those from the longitudinal analysis in Table 4 and the
absolute differences in ATEs between the 2SE and OLS are subsequently smaller; however, the relative
differences between the 2SE and OLS persist, with the estimated ATE from OLS exceeding that of
the 2SE by as much as 37% in 2006.
Figure 2
As has been noted in the literature, the direction of the relationship between retirement and health
is not clear ex ante. In particular, it may be that individuals of lesser health are more likely to retire,
as documented by Dwyer & Mitchell (1999), McGarry (2004), Jones et al. (2010), and others. To
address this source of endogeneity, I follow Dave et al. (2008) and limit the retired sample only to
those who had no reported health problems prior to retirement. The results from the random effects
model are summarized in Table 5, while the results from the cross-sectional analysis by HRS wave are
illustrated in Figure 3.
Table 5 and Figure 3
The results from the longitudinal analysis are of smaller magnitude than the full-sample results,
while larger relative differences between OLS and 2SE. The cross-sectional analysis, however, reveals
6 The pre-post analysis considers each HRS wave separately, and for each HRS wave, I estimate separate logit regression
models for each mobility measure as well as an overall linear model for the mobility index. The full results of these
regressions are omitted for brevity, and I focus instead on the estimated effect of full retirement on mobility. At each
HRS wave, the sample is limited to those ages 50 to 75 who are newly retired or otherwise not retired and in the work
force.
11
particularly large differences between the 2SE and OLS. Specifically, 2SE finds a relatively large improvement in mobility following retirement among individuals who were healthy prior to retirement
(up to a 26% improvement in mobility relative to the overall sample average of 0.75), while estimated
ATEs using OLS are consistently lower in magnitude, of opposite signs in 1996 and 2010, and even fall
outside of the 2SE confidence bands in 2002 and 2004.
4.2
HRQoL Data following Spine Surgery
As an additional application, I examine the effect of surgical intervention on HRQoL for adult spinal
deformity (ASD) patients. The data for this application were collected from a multi-center, prospective
database maintained by the International Spine Study Group (ISSG). One of the largest datasets of
its kind, the data consist of 362 consecutively enrolled adult scoliosis and spinal deformity patients at
a participating ISSG member (i.e., hospital or surgery center), with institutional review board (IRB)
approval obtained at all centers.
For purposes of this application, I include as covariates the patient’s age, gender, body mass index
(BMI), SF-6D scores at the time of the first physician visit, and a dummy variable for whether the
patient underwent surgical treatment. The outcome of interest is the patient’s HRQoL after one year.
Summary statistics are provided in Table 6.
Table 6
Coefficient estimates and ATEs are provided in Table 7. As with the prior analysis of retirement
and health, the treatment effect for each individual outcome measure is translated to an ATE on the
index score using the method of recycled predictions, and these ATE estimates are presented in the
bottom of Table 7. I also compared the model fit across all estimators as measured by the RMSE.
Table 7
The estimated effects in this application are relatively small, with an estimated ATE of 0.033
based on OLS compared to the effect of 0.029 using the 2SE. Although these differences are not
statistically significant, the relative difference is potentially meaningful, with the estimated ATEs from
OLS and Beta QMLE nearly 14% larger than the effects estimated from the 2SE. The magnitude of
these differences is perhaps more meaningful when put into context of a cost effectiveness study. For
example, if the incremental cost of surgery averages $100,000, a difference of 0.004 in the estimated
12
effect of surgery on QALYs equates to a difference of nearly $30,000 per QALY in terms of cost
effectiveness over a 20 year period (with QALYs discounted at 3% per year and all costs incurred at
period 0).
5
Discussion
This study considered the potential bias introduced when estimating treatment effects using traditional
regression methods based solely on combined outcomes, and illustrated this bias through a series of
Monte Carlo simulations with a variety of alternative DGPs and treatment assignment mechanisms. In
the presence of selection on observed variables, the source of the bias is not simply a matter of replacing
linear methods some other nonlinear model. Instead, the bias derives from a fundamental difference
between the impact of selection on the underlying domains versus the impact of selection on the overall
aggregated score. The results therefore indicate that an analysis based solely on summary scores
cannot appropriately control for the impact of selection, even under the standard unconfoundedness
assumptions.
The bias demonstrated in this paper would be compounded when studying health outcomes over
an extended period. For example, in the cost effectiveness literature, researchers typically calculate
the HRQoL index score and add up the individual scores over time (Drummond et al., 2005; Brazier
& Ratcliffe, 2007; Gray et al., 2011). To the extent treatment effects persist over time, the magnitude
of the bias under these conventional methods would only increase.
The Monte Carlo results illustrated the improved performance available via an alternative two-stage
estimator that first estimates the treatment effect on each underlying outcome and then re-interprets
these effects in terms of the index score based on predicted values from the first-stage regressions. The
2SE was shown to restore the unbiased estimation of treatment effects while maintaining the parsimony
of the summary score interpretation. In doing so, the proposed methodology can improve the economic
evaluation of health care programs when randomized controlled trials are not available. As funding
for clinical and health economics studies becomes more competitive, researchers (particularly in health
economics) are increasingly dependent on secondary data analysis. The proposed methodology is
therefore an important step in ensuring that economic analysis and inference performed outside of
randomized trials remain accurate and informative, while still allowing for the familiar interpretation
of results in terms of the composite score.
13
References
Ahmed, Sara, Berzon, Richard A, Revicki, Dennis A, Lenderking, William R, Moinpour, Carol M,
Basch, Ethan, Reeve, Bryce B, Wu, Albert W, et al. 2012. The use of patient-reported outcomes
(PRO) within comparative effectiveness research: implications for clinical practice and health care
policy. Medical Care, 50(12), 1060–1070.
Austin, P.C. 2002. A comparison of methods for analyzing health-related quality-of-life measures.
Value in Health, 5(4), 329–337.
Basu, A., & Manca, A. 2012. Regression Estimators for Generic Health-Related Quality of Life and
Quality-Adjusted Life Years. Medical Decision Making, 32(1), 56–69.
Basu, Anirban. 2005. Extended generalized linear models: simultaneous estimation of flexible link and
variance functions. Stata Journal, 5(4), 501–516.
Basu, Anirban. 2013. Estimating Person-centered Treatment (PeT) Effects using Instrumental Variables: An Application to Evaluating Prostate Cancer Treatments. Journal of Applied Econometrics.
Basu, Anirban, & Rathouz, Paul J. 2005. Estimating marginal and incremental effects on health
outcomes using flexible link and variance function models. Biostatistics, 6(1), 93–109.
Brauer, Carmen A., Rosen, Allison B., Greenberg, Dan, & Neumann, Peter J. 2006. Trends in the
Measurement of Health Utilities in Published Cost-Utility Analyses. Value in Health, 9(4), 213 –
218.
Brazier, J., & Ratcliffe, J. 2007. Measuring and valuing health benefits for economic evaluation. Oxford
University Press, USA.
Brazier, J., Roberts, J., & Deverill, M. 2002. The estimation of a preference-based measure of health
from the SF-36. Journal of health economics, 21(2), 271–292.
Chandra, A., Jena, A.B., & Skinner, J.S. 2011. The Pragmatist’s Guide to Comparative Effectiveness
Research. The Journal of Economic Perspectives, 25(2), 27–46.
Courtemanche, Charles, Soneji, Samir, & Tchernis, Rusty. 2013. Modeling Area-Level Health Rankings.
Tech. rept. National Bureau of Economic Research.
Dave, Dhaval, Rashad, Inas, & Spasojevic, Jasmina. 2008. The Effects of Retirement on Physical and
Mental Health Outcomes. Southern Economic Journal, 497–523.
14
Department of Health. 2008. Guidance on the Routine Collection of Patient Reported Outcome Measures (PROMs).
Devlin, N.J., Parkin, D., & Browne, J. 2010. Patient-reported outcome measures in the NHS: new
methods for analysing and reporting EQ-5D data. Health economics, 19(8), 886–905.
Dor, Avi, Sudano, Joseph, & Baker, David W. 2006. The effect of private insurance on the health of
older, working age adults: evidence from the Health and Retirement Study. Health services research,
41(3p1), 759–787.
Drummond, M.F., Sculpher, M.J., & Torrance, G.W. 2005. Methods for the economic evaluation of
health care programmes. Oxford University Press, USA.
Dwyer, Debra Sabatini, & Mitchell, Olivia S. 1999. Health problems as determinants of retirement:
Are self-rated measures endogenous? Journal of health economics, 18(2), 173–193.
Garber, A.M., & Phelps, C.E. 1997. Economic foundations of cost-effectiveness analysis. Journal of
Health Economics, 16(1), 1–31.
Glick, H. 2007. Economic evaluation in clinical trials. Oxford University Press, USA.
Graubard, Barry I, & Korn, Edward L. 1999. Predictive margins with survey data. Biometrics, 55(2),
652–659.
Gray, A.M., Clarke, P.M., Wolstenholme, J., & Wordsworth, S. 2011. Applied Methods of Costeffectiveness Analysis in Healthcare. Oxford Univ Pr.
Gutacker, Nils, Bojke, Chris, Daidone, Silvio, Devlin, Nancy, & Street, Andrew. 2013. Hospital
Variation in Patient-Reported Outcomes at the Level of EQ-5D Dimensions: Evidence from England.
Medical Decision Making.
Haas, Steven. 2008. Trajectories of functional health: the long armof childhood health and socioeconomic factors. Social Science & Medicine, 66(4), 849–861.
Hernández Alava, Mónica, Wailoo, Allan J, & Ara, Roberta. 2012. Tails from the peak district:
adjusted limited dependent variable mixture models of EQ-5D questionnaire health state utility
values. Value in Health, 15(3), 550–561.
Jones, Andrew M, Rice, Nigel, & Roberts, Jennifer. 2010. Sick of work or too sick to work? Evidence
on self-reported health shocks and early retirement from the BHPS. Economic Modelling, 27(4),
866–880.
15
Kleinman, Lawrence C, & Norton, Edward C. 2009. What’s the risk? A simple approach for estimating
adjusted risk measures from nonlinear models including logistic regression. Health services research,
44(1), 288–302.
Landro, L. 2012. The Simple Idea That Is Transforming Health Care. The Wall Street Journal.
Loprest, Pamela, Rupp, Kalman, & Sandell, Steven H. 1995. Gender, disabilities, and employment in
the health and retirement study. Journal of Human Resources, S293–S318.
Manca, A., Hawkins, N., & Sculpher, M.J. 2005.
Estimating mean QALYs in trial-based cost-
effectiveness analysis: the importance of controlling for baseline utility. Health economics, 14(5),
487–496.
McCarthy, I. 2014. Putting the Patient in Patient Reported Outcomes: A Robust Methodology for
Health Outcomes Assessment. Health Economics, 10.1002/hec.3113.
McGarry, Kathleen. 2004. Health and Retirement Do Changes in Health Affect Retirement Expectations? Journal of Human Resources, 39(3), 624–648.
Mein, Gill, Martikainen, Pekka, Hemingway, Harry, Stansfeld, Stephen, & Marmot, Michael. 2003. Is
retirement good or bad for mental and physical health functioning? Whitehall II longitudinal study
of civil servants. Journal of Epidemiology and Community Health, 57(1), 46–49.
Mortimer, D., & Segal, L. 2008. Comparing the incomparable? A systematic review of competing
techniques for converting descriptive measures of health status into QALY-weights. Medical decision
making, 28(1), 66.
Oaxaca, Ronald. 1973. Male-female wage differentials in urban labor markets. International economic
review, 14(3), 693–709.
Parkin, D., Rice, N., & Devlin, N. 2010. Statistical analysis of EQ-5D profiles: does the use of value
sets bias inference? Medical Decision Making, 30(5), 556–565.
PCORI. 2012. Draft National Priorities for Research and Research Agenda: version 1.
Peppard, P., Kindig, D., Riemer, A., Dranger, E., & Remington, P. 2003. Wisconsin County Health
Rankings. Tech. rept. Wisconsin Public Health and Health Policy Institute.
Peppard, Paul E, Kindig, David A, Dranger, Elizabeth, Jovaag, Amanda, & Remington, Patrick L.
2008. Ranking community health status to stimulate discussion of local public health issues: the
Wisconsin County Health Rankings. American Journal of Public Health, 98(2), 209–212.
16
Porter, Michael E. 2010. What Is Value in Health Care? New England Journal of Medicine, 363(26),
2477–2481. PMID: 21142528.
Powell, J.L. 1984. Least absolute deviations estimation for the censored regression model. Journal of
Econometrics, 25(3), 303–325.
Selby, J.V., Beal, A.C., & Frank, L. 2012. The Patient-Centered Outcomes Research Institute (PCORI)
national priorities for research and initial research agenda. JAMA: The Journal of the American
Medical Association, 307(15), 1583–1584.
Van Solinge, Hanna. 2007. Health Change in Retirement A Longitudinal Study among Older Workers
in the Netherlands. Research on Aging, 29(3), 225–256.
17
A
Appendix A: Description of the SF-6D
The SF-6D is a six-dimensional health profile derived from a subset of responses from the SF-36 or
SF-12 (Brazier et al., 2002; Brazier & Ratcliffe, 2007). The six dimensions of health classified by
the SF-6D are: 1) physical functioning; 2) role limitations; 3) social functioning; 4) pain; 5) mental
health; and 6) vitality. Each domain is characterized numerically with a range of integers, where a 1
indicates the best value in each domain. The worst value in each domain varies, with values up to 6
in the physical functioning and pain domains, values up to 5 in the social functioning, mental health,
and vitality domains, and values up to 4 in the role limitations domain. The patient’s full SF-6D
profile is therefore characterized by a series of six integers, with the best health state represented by
{1, 1, 1, 1, 1, 1} and the worst health state represented by {6, 4, 5, 6, 5, 5}.
Taking all possible combinations of responses, the SF-6D defines 18,000 unique health states. Each
health state can then be converted into a single index score using available scoring algorithms that essentially assign weights to each domain and interactions between domains. Following the algorithm in
Brazier & Ratcliffe (2007), the resulting SF-6D index score ranges from 0.30 to 1.0, with 0.30 representing the poorest health state, {6, 4, 5, 6, 5, 5}, and 1 representing the best health state, {1, 1, 1, 1, 1, 1}.7
The scoring algorithm from Brazier & Ratcliffe (2007) is reproduced in Table 1.
Table 1
7 With one HRQoL assessment at one-year follow-up and no discounting, the patient’s SF-6D index score is equivalent to the patient’s QALY over the follow-up period. QALYs and summary scores are therefore sometimes treated
synonymously in the empirical literature (e.g., Basu & Manca (2012), Gutacker et al. (2013), and others).
18
B
Appendix B: Construction of HRS Dataset
The RAND HRS data from years 1994 through 2012 include 207,816 total observations and 37,319
individual respondents. Following Dave et al. (2008), I exclude any observations in which the respondent is below 50 or older than 75 years of age, reducing the total dataset to 151,856 observations
and 31,809 respondents. I also exclude observations in which the mobility index or any underlying
mobility measure is missing. This leaves 126,620 total observations and 29,317 respondents. Finally, I
exclude all observations for the respondent if (at any time in the data) the individual is not in the labor
force other than being fully retired, resulting in the final dataset of 55,105 observations and 14,999
individuals. In my analysis of the subset of healthy individuals prior to retirement, I further exclude
individuals with “poor” self-reported health status prior to retirement or with any documented health
problem (i.e., diabetes, heart disease, stroke, high blood pressure, arthritis, cancer, lung disease, or
mental health problems).
For the cross-sectional analysis, I include observations in which individuals are partly retired, and
the regressions include a dummy variable capturing whether this person is partly retired during the
relevant HRS wave. These observations were excluded in the longitudinal analysis for two reasons: 1)
more direct comparison with the existing literature; and 2) inherent difficulty interpreting the effect
of full retirement when the individual reported being partly retired in the prior interview.
19
C
Tables and Figures
Table 1: Scoring Algorithm for SF-6Da
Starting value = 1.0 (perfect health)
Physical Functioning (PF)
PF=2 or PF=3
-0.035
PF=4
-0.044
PF=5
-0.056
PF=6
-0.117
Role Limitations (RL)
RL=2 or RL=3 or RL=4 -0.053
Social Functioning (SF)
SF=2
-0.057
SF=3
-0.059
SF=4
-0.072
SF=5
-0.087
Pain (P)
P=2 or P=3
-0.042
P=4
-0.065
P=5
-0.102
P=6
-0.171
Mental Health (MH)
MH=2 or MH=3
-0.042
MH=4
-0.100
MH=5
-0.118
Vitality (V)
V=2 or V=3 or V=4
-0.071
V=5
-0.092
Combination of Domains
“Most Severe”
-0.061
a Algorithm
based on Brazier & Ratcliffe (2007). “Most Severe” denotes any one of the following responses: a level
of 4 or more in the physical functioning, social functioning, mental health, or vitality domains; a level of 3 or more in
the role limitation domain; or a level of 5 or more in the pain domain.
20
-.1
-.05
Deviation from True Effect
-.05
0
.05
Deviation from True Effect
0
.05
.1
.1
.15
Figure 1: ATE Estimates with Binary Outcomesa
0
10
20
30
Degree of Selection
40
50
0
10
OLS
20
30
Degree of Selection
40
50
40
50
2SE
Deviation from True Effect
0
-.05
-.05
Deviation from True Effect
0
.05
.05
.1
αd = 0.5, βd = 0.5, δd = 1 ∀d
0
10
20
30
Degree of Selection
40
50
0
10
20
30
Degree of Selection
OLS
2SE
αd = 0.5, βd = .5, ∀d, and δ = [1, 1.5, 2, .5, 0]
a Solid
lines denote the average deviation from the true value, with dotted lines reflecting the 95% confidence bands.
21
Table 2: ATE Estimates with HRQoL Dataa
Model
Random Treatment Assignment
Treatment Effect
St. Dev.
RMSEb
DGP 1, δ = 1.5 × I6×1
True Effect
0.142
0.005
2SE
0.143
0.006
0.054
OLS
0.143
0.007
0.066
Beta MLE
0.169
0.012
0.082
Beta QMLE
0.143
0.007
0.067
DGP 2, δ = 3 × I6×1
True Effect
0.264
0.007
2SE
0.264
0.007
0.046
OLS
0.265
0.008
0.077
Beta MLE
0.296
0.010
0.075
Beta QMLE
0.264
0.008
0.061
DGP 3, δ = [2, 1, 0.5, 2.5, 0, 1]0
True Effect
0.104
0.004
2SE
0.104
0.005
0.055
OLS
0.104
0.006
0.063
Beta MLE
0.117
0.012
0.087
Beta QMLE
0.104
0.006
0.070
DGP 4, interaction terms with δ = 1.5 × I6×1
True Effect
0.122
0.006
2SE
0.122
0.007
0.048
OLS
0.122
0.008
0.084
Beta MLE
0.133
0.011
0.096
Beta QMLE
0.122
0.008
0.074
DGP 5, interaction terms with δ = 3 × I6×1
True Effect
0.220
0.007
2SE
0.220
0.007
0.043
OLS
0.220
0.008
0.096
Beta MLE
0.231
0.011
0.081
Beta QMLE
0.220
0.008
0.068
DGP 6, interaction terms with δ = [2, 1, 0.5, 2.5, 0, 1]0
True Effect
0.102
0.005
2SE
0.102
0.006
0.047
OLS
0.102
0.007
0.078
Beta MLE
0.114
0.012
0.109
Beta QMLE
0.102
0.008
0.081
a Results
b RMSE:
Selection on Observed Variables
Treatment Effect
St. Dev.
RMSE
0.142
0.143
0.151
0.174
0.146
0.005
0.007
0.010
0.021
0.011
0.054
0.068
0.080
0.066
0.264
0.263
0.284
0.378
0.320
0.007
0.009
0.010
0.018
0.013
0.046
0.091
0.067
0.056
0.104
0.104
0.088
0.083
0.079
0.004
0.007
0.009
0.023
0.011
0.055
0.064
0.083
0.070
0.122
0.122
0.137
0.234
0.165
0.006
0.010
0.014
0.023
0.015
0.048
0.094
0.085
0.073
0.220
0.220
0.266
0.332
0.272
0.007
0.010
0.014
0.022
0.015
0.043
0.132
0.080
0.065
0.102
0.102
0.098
0.210
0.137
0.005
0.009
0.013
0.024
0.015
0.047
0.079
0.090
0.081
based on 1,000 bootstrap iterations for N = 500 observations in each DGP.
Root mean squared error
22
Table 3: Summary Statistics for HRS Dataa
Standard
Deviation
Variable
Mean
Individual-level Variables
Female
Hispanic
White
Native Born
Educationb
Mother’s Education
Father’s Education
Fully Retiredc
People
0.476
0.106
0.736
0.880
12.780
9.862
9.615
0.538
14,999
0.499
0.308
0.441
0.325
3.062
3.687
4.012
0.499
Panel-level Variables
Age
Mother’s Age
Father’s Age
Married
Health Insurance
Household Income ($1,000s)
Mobility Index
Difficulty Walking one block
Difficulty Walking several blocks
Difficulty Walking across room
Difficulty Climbing one flight of stairs
Difficulty Climbing several flights of stairs
Fully Retired
Observations
61.961
75.614
71.715
0.699
0.926
74.238
0.752
0.079
0.188
0.031
0.106
0.348
0.430
55,105
6.977
13.570
13.949
0.459
0.262
302.210
1.238
0.270
0.391
0.173
0.308
0.476
0.495
a To
avoid differences in weights across panels, the statistics presented are the unweighted sample means and
standard deviations. Sample is limited to individuals ages 50 to 75 who are either actively working the labor force or
fully retired. Individuals who are otherwise not in the labor force (e.g., partial retirement, disability, or unemployed)
at any HRS wave are excluded.
b All education variables are measured in years.
c Reflects whether the individual was fully retired at any HRS wave
23
Table 4: Retirement and Health, Random Effects Modelsa
Dependent Variable
Retired
Female
Hispanic
White
Education
Native Born
Mother’s Education
Father’s Education
Married
Health Insurance
Household Income (log)
Age
Mother’s Age
Father’s Age
Constant
ATE on Mobility Indexb
1.92***
(0.10)
0.25***
(0.10)
-0.36*
(0.21)
-0.45***
(0.12)
-0.16***
(0.02)
0.89***
(0.20)
-0.00
(0.02)
-0.03*
(0.02)
-0.22**
(0.09)
0.27*
(0.15)
-0.28***
(0.04)
0.03***
(0.01)
-0.01***
(0.00)
-0.01**
(0.00)
-2.79***
(0.69)
Individual Mobility Measures
1.51***
2.23***
1.52***
(0.07)
(0.14)
(0.08)
0.48***
0.18
0.66***
(0.08)
(0.12)
(0.08)
-0.20
-0.27
0.13
(0.17)
(0.26)
(0.17)
-0.31*** -0.38***
-0.25**
(0.10)
(0.15)
(0.10)
-0.19*** -0.14***
-0.16***
(0.02)
(0.02)
(0.02)
1.06***
0.57**
0.71***
(0.16)
(0.25)
(0.16)
0.00
-0.04
0.00
(0.02)
(0.02)
(0.02)
-0.03**
0.01
-0.04***
(0.01)
(0.02)
(0.01)
-0.10
-0.50***
-0.28***
(0.07)
(0.12)
(0.08)
0.33***
0.11
0.01
(0.11)
(0.19)
(0.11)
-0.32*** -0.34***
-0.30***
(0.03)
(0.05)
(0.03)
0.05***
-0.02*
0.02***
(0.01)
(0.01)
(0.01)
-0.01***
-0.01**
-0.01***
(0.00)
(0.00)
(0.00)
-0.01**
-0.01
-0.01**
(0.00)
(0.00)
(0.00)
-1.52***
-0.64
-0.67
(0.55)
(0.85)
(0.56)
0.267***
(0.009)
a Each
0.82***
(0.06)
1.24***
(0.07)
0.00
(0.14)
-0.24***
(0.09)
-0.16***
(0.01)
0.93***
(0.13)
-0.01
(0.01)
-0.06***
(0.01)
-0.04
(0.06)
0.18**
(0.08)
-0.19***
(0.03)
0.05***
(0.00)
-0.01***
(0.00)
-0.01***
(0.00)
-0.66
(0.46)
Mobility
Index
0.32***
(0.02)
0.22***
(0.02)
-0.04
(0.04)
-0.12***
(0.03)
-0.05***
(0.00)
0.25***
(0.03)
-0.00
(0.00)
-0.01***
(0.00)
-0.06***
(0.02)
0.07***
(0.02)
-0.06***
(0.01)
0.01***
(0.00)
-0.00***
(0.00)
-0.00***
(0.00)
1.20***
(0.14)
0.321***
(0.018)
individual mobility measure regression equation is estimated using a random effects logit model. The overall
mobility index is estimated using a linear random effects model. The sample is limited to individuals between 50 and
75 years of age who are either in the labor force or fully retired. Robust standard errors are reported in parentheses.
All models include fixed effects for census region and HRS wave. * p<0.1. ** p<0.05. *** p<0.01.
b ATE: Average Treatment Effect. Bootstrap standard errors calculated for the 2SE based on 500 replications.
24
.2
0.148
.15
.15
.2
Figure 2: Cross-sectional Analysis of Retirement and Health
0.137
0.126
0.122
0.118
.1
.1
0.116
0.085
0.070
0.065
0.043
0.059
0.034
0
0.030
-.05
-.05
0
0.062
0.051
.05
.05
0.050
1996 1998 2000 2002 2004 2006 2008 2010
Two-stage Estimator
1996 1998 2000 2002 2004 2006 2008 2010
OLS
25
Table 5: Retirement and Health among Healthy Retirees, Random Effects Modelsa
Dependent Variable
Retired
Female
Hispanic
White
Education
Native Born
Mother’s Education
Father’s Education
Married
Health Insurance
Household Income (log)
Age
Mother’s Age
Father’s Age
Constant
ATE on Mobility Indexb
0.06
(0.36)
0.74***
(0.21)
-0.01
(0.38)
-0.05
(0.24)
-0.09**
(0.04)
0.64*
(0.37)
-0.00
(0.04)
-0.02
(0.03)
0.13
(0.21)
0.07
(0.25)
-0.27***
(0.09)
0.09***
(0.02)
-0.02**
(0.01)
-0.01
(0.01)
-9.07***
(1.67)
Individual Mobility Measures
0.45*
0.99**
0.26
(0.23)
(0.50)
(0.29)
0.73***
0.57**
0.93***
(0.14)
(0.29)
(0.17)
-0.00
-0.52
-0.02
(0.26)
(0.58)
(0.30)
0.03
0.08
-0.14
(0.17)
(0.35)
(0.19)
-0.13***
-0.17***
-0.10***
(0.03)
(0.06)
(0.03)
0.66***
0.14
0.48*
(0.25)
(0.51)
(0.28)
0.02
-0.03
-0.02
(0.03)
(0.06)
(0.03)
-0.05**
0.01
-0.07***
(0.02)
(0.05)
(0.03)
0.25*
0.01
0.06
(0.14)
(0.31)
(0.16)
0.18
0.18
-0.23
(0.16)
(0.40)
(0.18)
-0.28***
-0.34**
-0.27***
(0.06)
(0.14)
(0.07)
0.08***
0.03
0.07***
(0.01)
(0.03)
(0.01)
-0.02***
0.01
-0.02***
(0.01)
(0.01)
(0.01)
-0.00
-0.00
-0.01
(0.01)
(0.01)
(0.01)
-5.29*** -6.97***
-4.10***
(1.02)
(2.42)
(1.25)
0.023*
(0.012)
0.16
(0.18)
1.52***
(0.11)
0.02
(0.20)
-0.10
(0.13)
-0.11***
(0.02)
0.94***
(0.19)
-0.02
(0.02)
-0.09***
(0.02)
0.11
(0.10)
0.13
(0.12)
-0.22***
(0.05)
0.07***
(0.01)
-0.01***
(0.00)
-0.01**
(0.00)
-2.68***
(0.78)
Mobility
Index
0.06
(0.04)
0.21***
(0.02)
-0.01
(0.03)
-0.02
(0.02)
-0.02***
(0.00)
0.12***
(0.03)
-0.00
(0.00)
-0.01***
(0.00)
0.02
(0.02)
0.02
(0.02)
-0.04***
(0.01)
0.01***
(0.00)
-0.00***
(0.00)
-0.00*
(0.00)
0.33**
(0.15)
0.059
(0.184)
a Each individual mobility measure regression equation is estimated using a random effects logit model. The overall
mobility index is estimated using a linear random effects model. The sample is limited to individuals between 50 and
75 years of age who are either in the labor force or fully retired and who had no health problems in the waves prior
to retirement. Robust standard errors are reported in parentheses. All models include fixed effects for census region
and HRS wave. * p<0.1. ** p<0.05. *** p<0.01.
b ATE: Average Treatment Effect. Bootstrap standard errors calculated for the 2SE based on 500 replications.
26
.2
.2
.1
.1
Figure 3: Cross-sectional Analysis of Retirement and Health among Healthy Retirees
0.086
-0.016
-0.016
0
0
0.042
-0.016
-0.008
-0.048
-0.199 -0.193
-.1
-0.088
-0.120
-.3
-0.204
-0.099
-.2
-0.124
-.3
-.2
-.1
-0.101
-0.115
1996 1998 2000 2002 2004 2006 2008 2010
Two-stage Estimator
1996 1998 2000 2002 2004 2006 2008 2010
OLS
27
Table 6: Summary Statistics for ISSG Data (N=362)
Variable
Mean
Standard
Deviation
Age
BMI
Baseline SF-6D
Follow-up SF-6D
56.76
26.59
0.61
0.66
Count
14.51
5.84
0.12
0.12
Percent
Operative
Female
193
53%
309
85%
Baseline
Count
Percent
Physical Functioning Domain
PF=1
0
PF=2
35
PF=3
117
PF=4
96
PF=5
100
PF=6
14
Role Limitations Domain
RL=1
41
RL=2
115
RL=3
10
RL=4
196
Social Functioning Domain
SF=1
110
SF=2
72
SF=3
99
SF=4
56
SF=5
25
Pain Domain
P=1
5
P=2
34
P=3
79
P=4
85
P=5
109
P=6
50
Mental Health Domain
MH=1
76
MH=2
127
MH=3
89
MH=4
53
MH=5
17
Vitality Domain
V=1
13
V=2
73
V=3
107
V=4
94
V=5
75
28
Follow-up
Count
Percent
0%
10%
32%
27%
28%
4%
0
54
121
83
95
9
0%
15%
33%
23%
26%
2%
11%
32%
3%
54%
53
144
11
154
15%
40%
3%
42%
30%
20%
27%
15%
7%
156
77
86
30
13
43%
21%
24%
8%
4%
1%
9%
22%
23%
30%
14%
19
47
123
88
66
19
5%
13%
34%
24%
18%
5%
21%
35%
25%
15%
5%
130
132
61
32
7
36%
36%
17%
9%
2%
4%
20%
30%
26%
21%
15
123
108
74
42
4%
34%
30%
20%
12%
Table 7: Regression Resultsa
OLS
Outcome:
Surgery
Age
Female
BMI
Baseline HRQoL
SF-6D Index
QALY
Beta
MLE
QALY
Beta
QMLE
QALY
PF
RL
SF
P
MH
V
0.03***
(0.01)
0.00*
(0.00)
-0.02
(0.01)
-0.00
(0.00)
0.17***
(0.05)
0.00
(0.00)
-0.10
(0.07)
-0.00
(0.00)
0.15***
(0.05)
0.00*
(0.00)
-0.09
(0.07)
-0.00
(0.00)
-0.06
(0.12)
-0.00
(0.00)
-0.12
(0.16)
-0.00
(0.01)
-0.06
(0.12)
-0.01
(0.00)
-0.27
(0.17)
-0.02**
(0.01)
0.14
(0.12)
0.00
(0.00)
0.07
(0.17)
0.01
(0.01)
0.54***
(0.12)
0.01**
(0.00)
-0.09
(0.16)
-0.02
(0.01)
0.28**
(0.12)
0.00
(0.00)
-0.47***
(0.18)
-0.00
(0.01)
0.26**
(0.12)
0.00
(0.00)
-0.31*
(0.17)
0.00
(0.01)
0.58***
(0.05)
2.61***
(0.25)
2.63***
(0.24)
PF
Ordered Probit
0.56***
(0.07)
RL
0.41***
(0.06)
SF
0.51***
(0.05)
P
0.44***
(0.05)
MH
0.62***
(0.06)
V
ATEb
RMSEc
0.59***
(0.06)
0.033***
(0.011)
0.098
0.038***
(0.011)
0.111
0.032***
(0.011)
0.098
0.029***
(0.010)
0.097
a Results based on OLS, Beta MLE, Beta QMLE, and Ordered Probit regressions. Beta MLE and QMLE estimation
follows the procedure and code available from Basu & Manca (2012). Standard errors in parenthesis, * p<0.1. **
p<0.05. *** p<0.01.
b ATE: Average Treatment Effect.
c RMSE: root mean squared error.
29
Download