Putting the Patient in Patient Reported Outcomes: Assessment May 2014

advertisement
Putting the Patient in Patient Reported Outcomes:
A Robust Methodology for Health Outcomes
Assessment
May 2014
Abstract
When analyzing many health-related quality-of-life (HRQoL) outcomes, statistical inference is often based on the summary score formed by combining the
individual domains of the HRQoL profile into a single measure. Through a series of Monte Carlo simulations, this paper illustrates that reliance solely on the
summary score may lead to biased estimates of incremental effects, and I propose
a novel two-stage approach that allows for unbiased estimation of incremental
effects. The proposed methodology essentially reverses the order of the analysis, from one of “aggregate, then estimate” to one of “estimate, then aggregate.”
Compared to relying solely on the summary score, the approach also offers a more
patient-centered interpretation of results by estimating regression coefficients and
incremental effects in each of the HRQoL domains, while still providing estimated
effects in terms of the overall summary score. I provide an application to the estimation of incremental effects of demographic and clinical variables on HRQoL
following surgical treatment for adult scoliosis and spinal deformity.
Word, Table, and Figure Count: Approximately 4950 words of body text (excluding
footnotes), 6 tables, 2 figures
Running Head: Putting the Patient in PROMs
1
JEL Classification: I10, C24, C25, C34, C35, C51
Keywords: patient-reported outcome measures, quality-adjusted life-years, cost-effectiveness,
comparative-effectiveness
Funding: This project was supported by grant number XX from the Agency for Healthcare Research and Quality. The content is solely the responsibility of the author and
does not necessarily represent the official views of the Agency for Healthcare Research
and Quality.
2
1
Introduction
Improving the efficiency of health care delivery hinges on accurate methodologies for
economic evaluation and comparative effectiveness. Accompanying results must also be
sufficiently parsimonious so as to ensure the appropriate interpretation and dissemination of findings. To this end, substantial research has been devoted to the appropriate
analysis of health-related quality-of-life (HRQoL) and, more generally, patient-reported
outcome measures (PROMs). The U.K.’s National Health Service (NHS) explicitly
mandates the use of such data in health care decision making, and the U.S. appears
to be following suit with substantial investment in the Patient-Centered Outcomes Research Institute (PCORI) created under the Patient Protection and Affordable Care
Act (PCORI, 2012; Selby et al., 2012; Devlin et al., 2010; Department of Health, 2008).
PCORI was specifically created to promote and ultimately fund the development of
comparative effectiveness research in health care, although they are statutorily prohibited from funding cost effectiveness research aimed at estimating costs per qualityadjusted life years (QALYs).
For the purposes of economic evaluation and comparative effectiveness, PROMs are
of interest for several reasons. First, they are outcome measures rather than process
measures, the latter of which dominate the quality measures reported by the Centers for
Medicare and Medicaid Services (CMS) and the National Committee for Quality Assurance (NCQA, 2008). Only recently has CMS started closely tracking outcome measures
such as 30-day readmissions and mortality. Second, PROMs can be consistently studied
across a range of conditions and treatment options, offering a more appropriate comparison of treatments than is typically available with purely clinical outcome measures.
Third, a patient’s self-reported HRQoL is generally considered to be a valuable health
outcome measure and one which providers should routinely seek to improve (Porter,
2010; Ahmed et al., 2012). A recent article in the Wall Street Journal described HRQoL
data as “[helping] medical providers see the big picture...and makes for happier, healthier patients,” stating that increased reliance on HRQoL measures was “transforming
3
health care” (Landro, 2012). Finally, and perhaps most importantly, PROMs offer the
potential for truly patient-centered care, allowing providers to administer and evaluate
health care based on outcomes elicited directly from patients themselves (Porter, 2010).
Despite the growing awareness and use of PROMs, I argue in this paper that existing
methodologies for analyzing HRQoL data are deficient because they rely solely on the
HRQoL summary score in estimating incremental effects. Specifically, the most common
approach to analyzing HRQoL data is to combine individual HRQoL domains into a
single summary score using some existing scoring algorithm. These summary or index
scores are often then used as weights over time in order to estimate QALYs (Powell,
1984; Austin, 2002; Manca et al., 2005; Drummond et al., 2005; Brazier & Ratcliffe,
2007; Gray et al., 2011; Basu & Manca, 2012). Aside from normative concerns regarding
which weights to use, an analysis based solely on the summary scores is flawed for at
least three reasons.
First, relying on the summary scores comes with an inherent loss of information and
may ultimately bias incremental effects estimates (Mortimer & Segal, 2008; Gutacker
et al., 2012; Parkin et al., 2010). For example, in many HRQoL outcome measures, there
exists variation in the underlying domain scores that is not reflected in the summary
score (Brazier & Ratcliffe, 2007; Gray et al., 2011). This loss of variation is inherent
to the scoring process and not due to any specific algorithm. Second, the empirical
distribution of summary scores is often subject to significant floor or ceiling effects
and may also be multi-modal, necessitating empirical methodologies more complicated
than a simple linear regression (Austin, 2002; Manca et al., 2005; Basu & Manca,
2012; Hernández Alava et al., 2012). The extent to which alternative distributional
assumptions regarding the summary score approximate the true distribution will vary
by application. Third, and perhaps more importantly, the reliance on summary scores
reflects a fundamental divide between the actual outcomes effected versus the outcomes
being analyzed. For researchers interested in the effect of some covariate on HRQoL,
these effects occur by definition at the individual domain level since this is the level at
which respondents are asked about their quality of life (e.g., the physical functioning
4
or mental health domains of a larger HRQoL profile). Effects on the summary score
are somewhat artificial as they exist only by combining the individual domains and
associated effects. It is unclear a priori whether the effects estimated at each domain
and then combined to form an effect on the summary score would yield the same result
as an analysis based solely on the summary score. In fact, as the findings in Section 3
indicate, the order of estimation and aggregation to the summary score is an important
(but unappreciated) aspect of statistical inference.
As a result, there is growing concern in the literature regarding the appropriateness
of HRQoL summary scores as the outcome of interest (Sculpher & Gafni, 2001; Brazier
et al., 2009). For example, Gutacker et al. (2012) considers an ordered probit model
in analyzing EQ-5D scores, accounting for baseline quality-of-life through the panel
structure and exploiting the ordered probit construct to explicitly model individual
domain scores. The authors avoid an analysis based solely on the summary scores.
Devlin et al. (2010) considers an alternative classification system and a health profile
grid, each of which exploit rankings of EQ-5D health states and attempt to summarize
patient outcomes based on those reporting an unequivocal improvement, worsening, or
no change in health. The studies of Gutacker et al. (2012), Devlin et al. (2010), and
others illustrate concern surrounding the appropriateness of relying solely on summary
scores in estimating the effects of an intervention and other covariates on a patient’s
well-being. However, in avoiding the summary scores entirely, these approaches are
silent as to the incremental effects on the summary score and offer little in terms
of comparing results across other studies (where summary scores remain the primary
outcome of interest).
This paper proposes a novel two-stage estimator (2SE) that first estimates regression coefficients and incremental effects based on the full HRQoL profile and then
re-interprets these effects in terms of the summary score. Through a series of Monte
Carlos simulations, the paper illustrates how a reliance solely on the summary score
may lead to biased incremental effects estimates, while the 2SE is shown to restore
the unbiased estimation of incremental effects. The proposed methodology essentially
5
reverses the order of the analysis, from one of “aggregate, then estimate” to one of
“estimate, then aggregate.” The 2SE also allows for a more patient-centered discussion
wherein the incremental effects of treatment or other covariates are domain-specific and
more applicable to areas of health deemed most important to a given patient. Importantly, by re-interpreting the incremental effects in terms of summary scores, the 2SE
maintains the parsimonious interpretation that has proven so valuable in the applied
cost- and comparative-effectiveness literature. I then apply the 2SE along with other
common estimators in the literature to a prospective, multi-center dataset on HRQoL
outcomes for adult scoliosis and spinal deformity patients.
The current paper therefore contributes to the growing empirical literature on the
appropriate analysis of HRQoL outcomes. This analysis is also broadly related to
theoretical econometric research surrounding the differences between marginal effects
calculated from multivariate estimation versus marginal effects calculated from univariate outcomes formed by collapsing the underlying multivariate outcomes (Mullahy,
2011). I discuss the empirical framework and 2SE in Section 2. Details of the Monte
Carlo exercise are presented in Section 3, with an application presented in Section 4.
Section 5 concludes.
2
Methodology
The primary goal of the current analysis is to accurately estimate the effect of a covariate, x, on a patient’s HRQoL summary score. For consistency with the empirical
application in Section 4, I adopt the SF-6D as the measure of HRQoL; however, the
intuition and methodological contribution of the paper extends to similar metrics such
as the EQ-5D.
6
2.1
Summary of the SF-6D
The SF-6D is a six-dimensional health profile derived from a subset of responses from the
SF-36 or SF-12 (Brazier et al., 2002; Brazier & Ratcliffe, 2007). The six dimensions of
health classified by the SF-6D are: 1) physical functioning; 2) role limitations; 3) social
functioning; 4) pain; 5) mental health; and 6) vitality. Each domain is characterized
numerically with a range of integers, where a 1 indicates the best value in each domain.
The worst value in each domain varies, with values up to 6 in the physical functioning
and pain domains, values up to 5 in the social functioning, mental health, and vitality
domains, and values up to 4 in the role limitations domain. The patient’s full SF-6D
profile is therefore characterized by a series of six integers, with the best health state
represented by {1, 1, 1, 1, 1, 1} and the worst health state represented by {6, 4, 5, 6, 5, 5}.
Taking all possible combinations of responses, the SF-6D defines 18,000 unique
health states. Each health state can then be converted into a single index score using
available scoring algorithms that essentially assign weights to each domain and interactions between domains. Following the algorithm in Brazier & Ratcliffe (2007), the
resulting SF-6D index score ranges from 0.30 to 1.0, with 0.30 representing the poorest
health state, {6, 4, 5, 6, 5, 5}, and 1 representing the best health state, {1, 1, 1, 1, 1, 1}.
The scoring algorithm from Brazier & Ratcliffe (2007) is reproduced in Table 1.
Table 1
The appropriate algorithm to calculate a summary score remains an area of debate
in the literature (Parkin et al., 2010). Importantly, the proposed methodology relies
on the scoring algorithm only to reinterpret the estimated incremental effects in terms
of the summary score. Although the estimated incremental effects will certainly differ
depending on the scoring algorithm adopted, the focus of this paper is on highlighting
the bias introduced when relying solely on the summary score. To this end, the intuition
underlying this analysis extends broadly to other scoring algorithms, including some
7
of the more recent literature on HRQoL crosswalks intended to convert responses from
one HRQoL instrument into those of another instrument (Dakin, 2013).
2.2
The Two-Stage Estimator
The proposed 2SE applies when one is interested in estimating the incremental effect
of some covariate on a summary score, which is itself derived from a combination of
individual responses. Several alternative models have also been proposed to estimate
such effects, including ordinary least squares (OLS), variations of the classic Tobit
model, censored least-absolute deviations models, Beta MLE, and Beta QMLE models
(Powell, 1984; Austin, 2002; Basu & Manca, 2012).1 Rather than rely on the univariate
outcome, the 2SE first estimates the coefficients of interest based on the underlying
SF-6D responses and then re-interprets the coefficients in terms of the summary score.2
The 2SE first models each individual health domain using an ordered probit model
(Gutacker et al., 2012), where the response in each domain intuitively follows from a
latent index variable,
∗
yid
= xi βd + εid .
(1)
Here, xi denotes a set of independent variables possibly including a constant term,
d denotes the relevant health domain, d = 1, ..., 6, and εid is assumed to follow a
normal distribution with µ = 0 and σ = 1. In general, εid could be correlated across
domains. Such correlation could be accounted for in the proposed methodology (e.g.,
by adopting a composite marginal likelihood estimation for multivariate ordered probit
or logit models as in Bhat et al. (2010)); however, such an approach would only impact
the efficiency of the estimated coefficients and would not impact the point estimates.
As such, I simplify the analysis by assuming zero cross-equation correlation.
1
We ignore issues of selection or the role of baseline HRQoL in order to focus solely on the estimation
of incremental effects in settings where standard regression models are considered appropriate.
2
Since QALYs generally reflect health states as well as the time spent in each health state, I do
not treat QALYs as synonymous with the HRQoL summary scores; however, as Basu & Manca (2012)
indicates, it is relatively common in practice that researchers estimate QALYs based on a single followup survey administered at one year after treatment, in which case the summary score is equivalent to
a QALY.
8
Denote by yid the observed response for patient i in domain d. For example, in the
∗
physical functioning domain (d = 1), yi1 ∈ {1, ..., 6}. As yi1
crosses several unknown
thresholds (denoted by αj ), the observed response moves up the health status ranking
∗
∗
≥ α5 . Note that the ordering
< α1 and yi1 = 6 for yi1
such that yi1 = 1 for α0 < yi1
from best to worst or worst to best is irrelevant provided the appropriate adjustments
are made when estimating summary scores. Since most statistical software programs
estimate ordered discrete choice models such that a higher value is better, I adopt a
worst to best ordering in the analysis, which I then convert to a best to worst ordering
to apply the scoring algorithm. More compactly, the observed dependent variable, yid ,
takes the form
∗
yid = j if αd,j−1 ≤ yid
≤ αd,j , j = {1, ..., Jd } ,
(2)
where Jd differs across domains as discussed previously. Importantly, even with a
∗
well-behaved distribution of latent variables, yid
, the ordered discrete choice framework
can generate distributions with strong floor and ceiling-effects via different threshold
values, αj . As a result, the estimation of ordered, discrete dependent variable models
can avoid the distributional and statistical difficulties present in models based solely on
the summary scores.
I estimate separate ordered probit models for each HRQoL domain, and the results
of each model are used to form predicted probabilities of responses, denoted P̂ijd , for person i, response j, and domain d. In the physical functioning domain, the regression re
PF
PF
PF
sults therefore provide six predicted probabilities for each person, P̂i1 , P̂i2 , ..., P̂i6 .
Continuing this process across all six domains yields a total of 31 predicted probabilities
- one for each possible response in each domain - for each person. Applied to HRQoL
measures like the SF-6D, one difficulty surrounds the “most severe” category, where
Brazier et al. (2002) defines “most severe” as any one of the following responses: a level
of 4 or more in the physical functioning, social functioning, mental health, or vitality
domains; a level of 3 or more in the role limitation domain; or a level of 5 or more in the
pain domain. The probability of a “most severe” health status can then be calculated
9
following the principle of inclusion and exclusion for probability.3
With a slight abuse of notation, the inclusion-exclusion principle states that the
probability of the union of N non-mutually exclusive events is given as:
P (A1 ∪ A2 ∪ ... ∪ AN ) = P (A1 ) + ... + P (AN ) +
N
X
(−1)n+1 P (∩ n events) .
(3)
n=2
Applied to the SF-6D, I denote by AP F the outcomes of the physical functioning domain
that enter into the “most severe” indicator, and similarly by ARL for the role limitations
domain, ASF for the social functioning domain, AP for the pain domain, AM H for the
mental health domain, and AV for the vitality domain. Since only one value can be
reported in each domain, these terms enter directly into equation 3, where
P (AP F ) = P r(P F = 4) + P r(P F = 5) + P r(P F = 6),
P (ARL ) = P r(RL = 3) + P r(RL = 4),
P (ASF ) = P r(SF = 4) + P r(SF = 5),
P (AP ) = P r(P ain = 5) + P r(P ain = 6),
P (AM H ) = P r(M H = 4) + P r(M H = 5), and
P (AV ) = P r(V = 4) + P r(V = 5).
An estimate of P (A1 ∪ A2 ∪ ... ∪ A6 ), denoted P̂ (Most Severe), can therefore be
obtained by applying the inclusion-exclusion principle to the individual estimates of
the probabilities of each outcome in each domain, P̂ijd . Based on the scoring algorithm
in Table 1, the probability estimates from the ordered probit estimation can then be
3
A similar term which combines the scores across several individual domains also appears in the
EQ-5D scoring algorithm (Shaw et al., 2005; Agency for Healthcare Research and Quality, 2005).
10
converted to a predicted SF-6D summary score, Ŝi :
+
P̂i3P F
− 0.044 × P̂i4P F − 0.056 × P̂i5P F − 0.117 × P̂i6P F
− 0.053 × P̂i2RL + P̂i3RL + P̂i4RL
Ŝi = 1 − 0.035 ×
P̂i2P F
(4)
− 0.057 × P̂i2SF − 0.059 × P̂i3SF − 0.072 × P̂i4SF − 0.087 × P̂i5SF
− 0.042 × P̂i2P ain + P̂i3P ain − 0.065 × P̂i4P ain − 0.102 × P̂i5P ain − 0.171 × P̂i6P ain
− 0.042 × P̂i2M H + P̂i3M H − 0.100 × P̂i4M H − 0.118 × P̂i5M H
V
V
V
− 0.071 × P̂i2 + P̂i3 + P̂i4 − 0.092 × P̂i5V
− 0.061 × P̂ (Most Severe) .
In an of itself, the predicted summary score is of little value. If researchers were
interested only in the value of a respondent’s summary score, then clearly the observed
summary score formed from the observed responses would be most relevant. The predicted summary score is instead critical to the estimation of incremental effects via
the method of recycled predictions (Oaxaca, 1973; Graubard & Korn, 1999; Basu &
Rathouz, 2005; Basu, 2005; Glick, 2007; Kleinman & Norton, 2009). For example, if
we are interested in the average effect of a one standard deviation increase in x on
respondents’ summary scores, the 2SE would proceed as follows. First, estimate ordered probit models in each domain and form the predicted summary score based on
the observed independent variables, Ŝi |xi . Second, replace xi with the hypothetical
values of interest, x0i = xi + σx , and based on the same coefficients estimated from
the ordered probit models, form the predicted summary scores for these hypothetical
values, Ŝi |x0i . Taking the difference in each predicted summary score, Ŝi |x0i − Ŝi |xi ,
and averaging across all individuals provides an estimate of the average effect of a one
standard deviation change in x. This recycled predictions method (also referred to as
predictive margins) also avoids the difficulty of computing and interpreting marginal
effects in nonlinear models (Norton et al., 2004) and can be particularly valuable when
the variable of interest is interacted with other covariates.
11
By definition, the predicted probabilities from the first stage regressions are estimates of the true probabilities and are therefore uncertain. To accommodate this
variation, standard errors and confidence intervals around the incremental effects are
estimated via bootstrap, where each iteration of the bootstrap includes both stages
of the 2SE. Uncertainty surrounding the parameters in the ordered probit model is
therefore incorporated into the final estimated effects.
3
Simulation
I simulate data consistent with the latent index model discussed above. Alternatively,
authors sometimes simulate summary scores directly under a series of different distributional assumptions (e.g., Basu & Manca (2012)); however, in application, the level of
measurement is always at the individual HRQoL domain, and summary scores are only
generated after converting the individual domain scores. Simulation based on the underlying HRQoL domains is therefore more consistent with the likely DGPs encountered
in practice.
3.1
Data
In practice, the distribution of summary scores is often highly skewed, censored, and
multi-modal. For example, in a large study of laparoscopic-assisted versus abdominal
hysterectomy (the EVALUATE trial), the observed distributions in both treatment
arms were highly left-skewed with strong ceiling-effects at 1 (Basu & Manca, 2012;
Garry et al., 2004; Sculpher et al., 2004). Basu & Manca (2012) reproduces graphs from
several additional applications in which the summary score distributions are similarly
skewed, censored, or bi-modal.
To reflect the breadth of distributions encountered in practice, I simulate data under
several alternative DGPs. The DGPs are intentionally over-simplified in order to generate distributional properties of interest and to focus specifically on the estimation of
12
incremental effects. In all cases, I simulate a latent continuous variable for each HRQoL
∗
domain (d = 1, ..., 6), denoted yid
, as a function of a single independent variable, xi ,
and a normal i.i.d. error term, εid .
Denote by γ the intercept coefficient and by β the coefficient on x. Then the D × 1
vector of latent HRQoL values, yi∗ , is as follows:
yi∗ = γ + βxi + εi , where
ε ∼ N (0D×1 , ID×D ) ,
x ∼ U[0, 1],
γ = ID×1 , and
β = 1.5 × ID×1 .
Observed HRQoL values, yid for d ∈ (1, 2, 3, 4, 5, 6), are then generated based on the
∗
, relative to the Jd × 1 vector of threshold values in each
value of the latent value, yid
domain, αd , where Jd = 6 in the physical functioning and pain domains, Jd = 4 in
the role limitations domain, and Jd = 5 in the social functioning, mental health, and
vitality domains.
Alternative specifications of α are used to generate different distributional properties of the summary scores. Specifically, I consider five different threshold values
corresponding to each of five distributions of interest. In each domain, threshold values
are set to specific quantiles of the empirical distribution of the latent variable, F (yd∗ ).
Denoting the τj th quantile by qyd∗ (τj ) for all j ∈ {1, ..., Jd }, data are simulated under
the following alternative specifications of τj :
1. τ = [.1, .3, .5, .7, .9, 1]0 in the physical functioning and pain domains, τ = [.1, .3, .6, .8, 1]0
in the social functioning, mental health, and vitality domains, and τ = [.1, .4, .8, 1]0
in the role limitations domain. These values for τ generate a bell-shaped distribution between 0.3 and 1, illustrated in panel (a) of Figure 1.
2. τj = 0.5 ×
j
,
Jd
which generates a right-censored distribution, illustrated in panel
13
(b) of Figure 1.
3. τj = 0.25 ×
j
,
Jd
which generates a heavily right-censored distribution, illustrated
in panel (c) of Figure 1.
4. τj = 0.25 × 1 − Jjd + Jjd , which generates a left-censored distribution, illustrated
in panel (d) of Figure 1.
j
5. τj = 0.5 × 1 − Jd + Jjd , which generates a heavily left-censored distribution,
illustrated in panel (e) of Figure 1.
Figure 1
3.2
Monte Carlo Results
The focus of the Monte Carlo study is to compare incremental effects in the summary
score domain calculated with existing regression methods to the incremental effects calculated using the 2SE. The primary hypothesis is that an ordered discrete choice model
(e.g., an ordered probit or logit) can better accommodate the idiosyncratic properties of
distributions encountered in practice. By modeling HRQoL domains directly and then
re-interpreting in terms of the summary score, the results are therefore (arguably) more
robust to a wide range of distributions relative to models based solely on the summary
score.
For each of the five DGPs discussed above, I simulate 1,000 datasets consisting of
N = 500 observations (patients). I estimate coefficients with four alternative estimators: 1) 2SE; 2) standard OLS; 3) the Beta MLE model proposed in Basu & Manca
(2012); and 4) the Beta QMLE also proposed in Basu & Manca (2012). In all cases,
incremental effects are calculated using the method of recycled predictions as discussed
previously, interpreted as the average change in summary scores following a one standard deviation change in x. The results are summarized in Table 2.
14
Table 2
The 2SE consistently provides accurate estimates of the true incremental effect
across a range of alternative distributions. By comparison, incremental effects estimated
with OLS are downward (upward) biased in the presence of sufficient ceiling (floor)
effects. The Beta MLE and QMLE estimators perform better than OLS; however,
the Beta MLE estimator still provides biased estimates in the presence of uniformly
distributed summary scores with mild ceiling effects (DGP 2). In addition, Beta MLE
and Beta QMLE estimators are both less accurate relative to the 2SE, where estimates
from the latter are generally centered around the true effects while estimates from the
Beta MLE and Beta QMLE models differ from the true effect by 10% or more on
average. The 2SE also provides the lowest RMSE in all cases, although the differences
in RMSE across estimators are minimal and statistically insignificant.4
As discussed in Basu & Manca (2012), if the true marginal effect is relatively small
and the data are subject to strong ceiling or floor effects, biases in marginal effects
may be relatively minor. I therefore simulated additional datasets with β = 5 × ID×1
rather than β = 1.5 × ID×1 . I focus on DGPs 3 and 5 above (strong ceiling and floor
effects, respectively), where any bias would be most apparent. Results are summarized
in Table 3. Here, the 2SE provides accurate estimates of the true incremental effect,
while all other estimators yield biased estimates. Differences in RMSE are also larger
relative to those in Table 2, with the 2SE again providing the minimum RMSE in all
cases.
Table 3
4
Although the efficiency of these estimates will clearly depend on the overall model fit, the results are
qualitatively unchanged when considering alternative simulations in which the model fit is intentionally
reduced (via a larger variance in the distribution of the error term, ε). Moreover, there would be no
reason in practice to propose a different set of independent variables for the 2SE compared to another
estimator such as standard OLS or Beta MLE. Concerns regarding the choice of covariates therefore
apply equally to all estimators considered in the analysis. Results are similarly unchanged when
allowing for non-zero cross-equation correlation across HRQoL domains. Results from these sensitivity
analyses are excluded for brevity but available upon request.
15
4
Application to Scoliosis Surgery
I apply the proposed 2SE to the estimation of the effect of observed pre-operative
variables on post-operative HRQoL and summary scores following surgical treatment
for adult spinal deformity (ASD). Surgical treatment of ASD is one of the lesser studied
but fastest growing and most expensive areas of spine surgery, affecting as much as 32%
of the adult population and up to 60% of the elderly (Robin et al., 1982; Schwab et al.,
2003, 2005, 2008).
4.1
Data
The data for this study were collected from a multi-center, prospective database maintained by the International Spine Study Group (ISSG). The dataset consists of 209
adult scoliosis and spinal deformity patients undergoing surgery at any participating
ISSG member site, with institutional review board approval obtained at all centers. For
purposes of this application, I limit the analysis to the following covariates: 1) age; 2)
gender; 3) baseline SF-6D scores; 4) total number of vertebrae fused at surgery (i.e.,
the number of “levels” fused); and 5) surgical approach. The outcome of interest is
patients’ HRQoL one year after surgery. Summary statistics are provided in Table 4.
Table 4
4.2
Results
Coefficient estimates are provided in Table 5. Although the coefficients in the ordered
probit regressions do not easily compare to those from the OLS, Beta MLE, and Beta
QMLE regressions, the ordered probit analysis immediately allows a more patientcentered interpretation than is provided by the other estimators. To the extent that
a given patient’s preferences are such that certain health domains are more important
16
than others, the results may support a more meaningful discussion for shared decisionmaking purposes. The ordered probit analysis also reveals important differences across
health domains that are not identified in the other estimators. Namely, the role of age,
gender, levels fused, and baseline HRQoL clearly differs across health domains, with
age having a significant positive impact in some domains, a significant negative impact
on others, with no significant impact on overall HRQoL. Similarly, gender and surgical
approach are estimated to have no significant impact on overall HRQoL despite having
a significant effect on the role limitations domain.
Table 5
The impact of baseline HRQoL is also more clearly represented with the ordered
probit results. For example, post-operative mental health scores are influenced heavily
by a patient’s baseline mental health score, much more so than in the other health
domains. This is consistent with the underlying nature of the disease, which can have
major negative effects on a patient’s daily activities and body image, but may not
generally impact a patient’s overall mental health. As such, for two patient’s with
an identical SF-6D index score, a patient with lower baseline mental health will have
relatively less opportunity for HRQoL improvement following surgery. This interpretation would not be available with the standard empirical framework based solely on the
summary scores (Manca et al., 2005).
Incremental effects estimated from the method of recycled predictions are summarized in Table 6. For binary variables such as “Female” and “Posterior Approach”,
the incremental effect represents the predicted change in summary scores for women
relative to men and for patient’s with a posterior approach relative to a combined anterior/posterior approach, respectively. For age, the incremental effect represents the
predicted change in the summary score following a one-year increase in age at surgery;
and for levels fused and each HRQoL domain, the incremental effects represent the predicted change in summary scores following a one-unit increase (improvement) from the
17
median (e.g., an increase from 9 to 10 levels fused or from a baseline physical functioning domain score of 4 to 3). As should be the case given the well-behaved distribution
of summary scores, the incremental effects for age, gender, levels fused, and surgical
approach are similar for all estimators considered.
Table 6
The results from Table 6 also illustrate the loss of variation when estimating effects
based solely on the summary score. For example, an improvement from 4 to 3 or from 3
to 2 in a patient’s baseline “role limitations” domain will have no impact on the patient’s
summary score because the scoring algorithm is such that the score does not vary along
these values of the role limitations domain. A similar scenario unfolds for certain
values of the physical functioning, pain, mental health, and vitality domains. Because
of this loss of variation due to the scoring algorithm, incremental effects estimates for
the role limitations or mental health domains are not available when relying solely on
the summary score in the current application. By modeling each domain separately,
the 2SE avoids this problem and allows for a more complete estimation of incremental
effects at all values of each baseline HRQoL domain.5
5
Discussion
This paper develops a new two-stage estimator (2SE) for analyzing HRQoL outcomes
which offers important benefits relative to existing methodologies. Primarily, the paper
illustrates how a reliance solely on the summary score may lead to biased incremental effects estimates, while the 2SE is shown to restore the unbiased estimation of
incremental effects. The proposed methodology essentially reverses the order of the
5
Such differences could be avoided somewhat by including each baseline HRQoL domain score as
a covariate in the OLS, Beta MLE, and Beta QMLE regressions; however, this is not the standard
approach adopted in the literature. Moreover, this approach would not fully resolve the differences, as
incremental effects under the 2SE remain higher in the mental health and vitality domains, and lower
in the pain domain. Results of this analysis are not included but are available upon request.
18
analysis, from one of “aggregate, then estimate” to one of “estimate, then aggregate.”
The 2SE also allows for a more patient-centered discussion wherein the incremental effects of treatment or other covariates are domain-specific and more applicable to areas
of health deemed most important to a given patient. Importantly, the 2SE offers a
unified framework by which to estimate incremental effects at the individual domain
level while still interpreting these same effects in terms of the overall summary score.
The improvements offered by the 2SE come at some cost. Namely, the 2SE is analytically more difficult to implement than a standard OLS and perhaps more complicated
than the Beta MLE, Beta QMLE, and other estimators relying solely on the summary
score. The 2SE also requires sufficient sample size (larger than standard OLS) in order to estimate the ordered dependent variable models. However, as shown through
the Monte Carlo exercise, the standard estimators are less robust to the idiosyncratic
distributional properties of summary scores than is the 2SE. Moreover, the 2SE allows
for an interpretation in terms of summary scores just as the OLS, Beta MLE, and
Beta QMLE models do. The added computational burden therefore falls solely on the
analyst rather than the end-user of the results. As such, the proposed 2SE offers an improvement over existing estimators with no additional complexity for the end-user. In
light of the growing use of patient-reported outcome measures for purposes of provider
comparison and quality reporting (Nuttall et al., 2013), the proposed 2SE should be
considered as an alternative estimator for analysis of HRQoL outcomes in practice.
19
References
Agency for Healthcare Research and Quality. 2005. Calculating the U.S. Populationbased EQ-5D Index Score.
Ahmed, Sara, Berzon, Richard A, Revicki, Dennis A, Lenderking, William R, Moinpour, Carol M, Basch, Ethan, Reeve, Bryce B, Wu, Albert W, et al. 2012. The use of
patient-reported outcomes (PRO) within comparative effectiveness research: implications for clinical practice and health care policy. Medical Care, 50(12), 1060–1070.
AHRQ. 2012. Healthcare Cost and Utilization Project (HCUP), National Inpatient
Sample.
Austin, P.C. 2002. A comparison of methods for analyzing health-related quality-of-life
measures. Value in Health, 5(4), 329–337.
Basu, A., & Manca, A. 2012. Regression Estimators for Generic Health-Related Quality
of Life and Quality-Adjusted Life Years. Medical Decision Making, 32(1), 56–69.
Basu, Anirban. 2005. Extended generalized linear models: simultaneous estimation of
flexible link and variance functions. Stata Journal, 5(4), 501–516.
Basu, Anirban, & Rathouz, Paul J. 2005. Estimating marginal and incremental effects
on health outcomes using flexible link and variance function models. Biostatistics,
6(1), 93–109.
Bhat, C.R., Varin, C., & Ferdous, N. 2010. A comparison of the maximum simulated
likelihood and composite marginal likelihood estimation approaches in the context of
the multivariate ordered-response model. Advances in Econometrics, 26, 65–106.
Brazier, J., & Ratcliffe, J. 2007. Measuring and valuing health benefits for economic
evaluation. Oxford University Press, USA.
Brazier, J., Roberts, J., & Deverill, M. 2002. The estimation of a preference-based
measure of health from the SF-36. Journal of health economics, 21(2), 271–292.
20
Brazier, John E, Dixon, Simon, & Ratcliffe, Julie. 2009. The role of patient preferences
in cost-effectiveness analysis. Pharmacoeconomics, 27(9), 705–712.
Dakin, Helen. 2013. Review of studies mapping from quality of life or clinical measures
to EQ-5D: an online database. Health and quality of life outcomes, 11(1), 151.
Department of Health. 2008. Guidance on the Routine Collection of Patient Reported
Outcome Measures (PROMs).
Devlin, N.J., Parkin, D., & Browne, J. 2010. Patient-reported outcome measures in
the NHS: new methods for analysing and reporting EQ-5D data. Health economics,
19(8), 886–905.
Drummond, M.F., Sculpher, M.J., & Torrance, G.W. 2005. Methods for the economic
evaluation of health care programmes. Oxford University Press, USA.
Garry, Ray, Fountain, Jayne, Mason, Su, Hawe, Jeremy, Napp, Vicky, Abbott, Jason, Clayton, Richard, Phillips, Graham, Whittaker, Mark, Lilford, Richard, et al.
2004. The eVALuate study: two parallel randomised trials, one comparing laparoscopic with abdominal hysterectomy, the other comparing laparoscopic with vaginal
hysterectomy. British Medical Journal, 328(7432), 129–133.
Glick, H. 2007. Economic evaluation in clinical trials. Oxford University Press, USA.
Graubard, Barry I, & Korn, Edward L. 1999. Predictive margins with survey data.
Biometrics, 55(2), 652–659.
Gray, A.M., Clarke, P.M., Wolstenholme, J., & Wordsworth, S. 2011. Applied Methods
of Cost-effectiveness Analysis in Healthcare. Oxford Univ Pr.
Gutacker, N., Bojke, C., Daidone, S., Devlin, N., & Street, A. 2012. Analysing Hospital
Variation in Health Outcome at the Level of EQ-5D Dimensions.
21
Hernández Alava, Mónica, Wailoo, Allan J, & Ara, Roberta. 2012. Tails from the peak
district: adjusted limited dependent variable mixture models of EQ-5D questionnaire
health state utility values. Value in Health, 15(3), 550–561.
Kleinman, Lawrence C, & Norton, Edward C. 2009. What’s the risk? A simple approach for estimating adjusted risk measures from nonlinear models including logistic
regression. Health services research, 44(1), 288–302.
Landro, L. 2012. The Simple Idea That Is Transforming Health Care. The Wall Street
Journal.
Manca, A., Hawkins, N., & Sculpher, M.J. 2005. Estimating mean QALYs in trial-based
cost-effectiveness analysis: the importance of controlling for baseline utility. Health
economics, 14(5), 487–496.
Mortimer, D., & Segal, L. 2008. Comparing the incomparable? A systematic review
of competing techniques for converting descriptive measures of health status into
QALY-weights. Medical decision making, 28(1), 66.
Mullahy, J. 2011. Marginal Effects in Multivariate Probit and Kindred Discrete and
Count Outcome Models, with Applications in Health Economics. Tech. rept. National
Bureau of Economic Research.
NCQA. 2008. National Committee for Quality Assurance (NCQA). HEDIS and quality
measurement: technical resources.
Norton, Edward C, Wang, Hua, & Ai, Chunrong. 2004. Computing interaction effects
and standard errors in logit and probit models. Stata Journal, 4, 154–167.
Nuttall, David, Parkin, David, & Devlin, Nancy. 2013. Inter-provider Comparison of
Patient-reported Outcomes: Developing and Adjustment to Account for Differences
in Patient Case Mix. Health Economics.
22
Oaxaca, Ronald. 1973. Male-female wage differentials in urban labor markets. International economic review, 14(3), 693–709.
Parkin, D., Rice, N., & Devlin, N. 2010. Statistical analysis of EQ-5D profiles: does
the use of value sets bias inference? Medical Decision Making, 30(5), 556–565.
PCORI. 2012. Draft National Priorities for Research and Research Agenda: version 1.
Porter, Michael E. 2010. What Is Value in Health Care?
New England Journal of
Medicine, 363(26), 2477–2481. PMID: 21142528.
Powell, J.L. 1984. Least absolute deviations estimation for the censored regression
model. Journal of Econometrics, 25(3), 303–325.
Robin, G., Span, Y., Steinberg, R., Making, M., & Menczel, J. 1982. Scoliosis in the
elderly: a follow-up study. Spine, 7(4), 355–359.
Schwab, Frank, Dubey, Ashok, Pagala, Murali, Gamez, Lorenzo, & Farcy, Jean P. 2003.
Adult scoliosis: a health assessment analysis by SF-36. Spine, 28(6), 602–606.
Schwab, Frank, Dubey, Ashok, Gamez, Lorenzo, El Fegoun, Abdelkrim Benchikh,
Hwang, Ki, Pagala, Murali, & Farcy, J-P. 2005. Adult scoliosis: prevalence, SF36, and nutritional parameters in an elderly volunteer population. Spine, 30(9),
1082–1085.
Schwab, Frank J, Lafage, Virginie, Farcy, Jean-Pierre, Bridwell, Keith H, Glassman,
Stephen, & Shainline, Michael R. 2008. Predicting outcome and complications in the
surgical treatment of adult scoliosis. Spine, 33(20), 2243–2247.
Sculpher, Mark, & Gafni, Amiram. 2001. Recognizing diversity in public preferences:
The use of preference sub-groups in cost-effectiveness analysis. Health economics,
10(4), 317–324.
Sculpher, Mark, Manca, Andrea, Abbott, Jason, Fountain, Jayne, Mason, Su, & Garry,
Ray. 2004. Cost effectiveness analysis of laparoscopic hysterectomy compared with
23
standard hysterectomy: results from a randomised trial. British Medical Journal,
328(7432), 134–139.
Selby, J.V., Beal, A.C., & Frank, L. 2012. The Patient-Centered Outcomes Research
Institute (PCORI) national priorities for research and initial research agenda. JAMA:
The Journal of the American Medical Association, 307(15), 1583–1584.
Shaw, J.W., Johnson, J.A., & Coons, S.J. 2005. US valuation of the EQ-5D health
states: development and testing of the D1 valuation model. Medical care, 43(3), 203.
24
6
Tables and Figures
Table 1: Scoring Algorithm for SF-6Da
Starting value = 1.0 (perfect health)
Physical Functioning (PF)
PF=2 or PF=3
-0.035
PF=4
-0.044
PF=5
-0.056
PF=6
-0.117
Role Limitations (RL)
RL=2 or RL=3 or RL=4 -0.053
Social Functioning (SF)
SF=2
-0.057
SF=3
-0.059
SF=4
-0.072
SF=5
-0.087
Pain (P)
P=2 or P=3
-0.042
P=4
-0.065
P=5
-0.102
P=6
-0.171
Mental Health (MH)
MH=2 or MH=3
-0.042
MH=4
-0.100
MH=5
-0.118
Vitality (V)
V=2 or V=3 or V=4
-0.071
V=5
-0.092
Combination of Domains
“Most Severe”
-0.061
a
Algorithm based on Brazier & Ratcliffe (2007). “Most Severe” denotes any one of the following
responses: a level of 4 or more in the physical functioning, social functioning, mental health, or
vitality domains; a level of 3 or more in the role limitation domain; or a level of 5 or more in the
pain domain.
25
Frequency
20
0
0
10
10
20
Frequency
30
30
40
50
40
Figure 1: Empirical QALY Distributions in Monte Carlo Study
.4
.6
SF-6D Index Score
.8
1
.2
(a)
.4
.6
SF-6D Index Score
(d)
τj = 0.25 × 1 −
0
j
Jd
1
j
Jd
+
0
0
10
20
Frequency
40
Frequency
20
30
60
40
80
50
τPF,Pain = [.1, .3, .5, .7, .9, 1]
τSF,MH,V = [.1, .3, .6, .8, 1]0
τRL = [.1, .4, .8, 1]0
.8
.4
.6
.8
1
.3
SF-6D Index Score
(b)
150
Frequency
100
50
0
.4
.6
.8
1
SF-6D Index Score
(c)
τj = 0.25 ×
.5
.6
SF-6D Index Score
(e)
τj = 0.5 × 1 −
j
Jd
200
τj = 0.5 ×
.4
j
Jd
26
j
Jd
.7
+
j
Jd
.8
Table 2: Incremental Effects Estimates under Alternative DGPsa
Model
Incremental Effect St. Dev. Mean % Bias Lower % Bias
DGP 1: τPF,Pain = [.1, .3, .5, .7, .9, 1]0 , τSF,MH,V = [.1, .3, .6, .8, 1]0 , τRL = [.1, .4, .8, 1]0
True Effect
0.070
0.002
Two-stage Approach
0.070
0.003
-0.73%
-11.85%
OLS
0.073
0.004
3.79%
-8.89%
Beta MLE
0.077
0.004
9.49%
-4.84%
Beta QMLE
0.075
0.004
6.27%
-6.66%
DGP 2: τj = 0.5 × Jjd
True Effect
0.093
0.003
Two-stage Approach
0.092
0.005
-0.64%
-12.62%
OLS
0.089
0.005
-3.84%
-15.36%
Beta MLE
0.142
0.010
52.57%
28.34%
Beta QMLE
0.102
0.006
10.14%
-4.26%
DGP 3: τj = 0.25 × Jjd
True Effect
0.076
0.003
Two-stage Approach
0.075
0.005
-1.34%
-15.60%
OLS
0.065
0.004
-15.02%
-29.91%
Beta MLE
0.075
0.008
-1.01%
-23.44%
Beta QMLE
0.086
0.006
12.71%
-5.97%
DGP 4: τj = 0.25 × 1 −
True Effect
Two-stage Approach
OLS
Beta MLE
Beta QMLE
DGP 5: τj = 0.5 × 1 −
True Effect
Two-stage Approach
OLS
Beta MLE
Beta QMLE
j
Jd
j
Jd
+
Upper % Bias
RMSE
11.64%
17.18%
25.44%
19.96%
0.0827
0.0828
0.0830
0.0829
11.48%
8.39%
76.59%
25.24%
0.1041
0.1043
0.1115
0.1043
15.21%
-1.40%
23.44%
32.68%
0.0916
0.0923
0.0935
0.0917
j
Jd
0.075
0.075
0.083
0.083
0.082
0.002
0.003
0.004
0.005
0.004
-0.22%
10.32%
10.71%
9.20%
-10.58%
-2.40%
-2.67%
-3.23%
11.14%
24.52%
25.71%
22.88%
0.0966
0.0968
0.0969
0.0968
0.062
0.061
0.072
0.070
0.070
0.002
0.003
0.004
0.004
0.004
-0.28%
16.70%
13.03%
13.46%
-11.20%
2.21%
-1.05%
-0.26%
11.19%
32.65%
28.53%
28.56%
0.0916
0.0920
0.0919
0.0919
+
j
Jd
a
Results based on 1,000 bootstrap iterations for N = 500 observations in each DGP. Upper % bias
and lower % bias denote the upper and lower 95% confidence intervals of the percent difference between
the estimated incremental effect and the true incremental effect. RMSE=root mean squared error.
27
Table 3: Incremental Effects Estimates with Larger True Effecta
Model
Incremental Effect
St. Dev.
Mean % Bias
Lower % Bias
Upper % Bias
RMSE
0.168
0.167
0.120
0.137
0.216
0.006
0.005
0.004
0.009
0.008
-0.48%
-28.43%
-18.27%
29.11%
-7.35%
-35.49%
-29.84%
19.12%
6.17%
-20.96%
-6.09%
39.04%
0.0676
0.0945
0.0940
0.0698
0.111
0.112
0.155
0.147
0.142
0.002
0.002
0.004
0.006
0.003
0.28%
39.58%
32.44%
27.71%
-3.65%
29.10%
21.11%
18.95%
4.52%
50.09%
45.85%
36.38%
0.0688
0.0887
0.0875
0.0864
j
Jd
DGP 3: τj = 0.25 ×
True Effect
Two-stage Approach
OLS
Beta MLE
Beta QMLE
DGP 5: τj = 0.5 × 1 −
True Effect
Two-stage Approach
OLS
Beta MLE
Beta QMLE
j
Jd
+
j
Jd
a
Results based on 1,000 bootstrap iterations for N = 500 observations in each DGP, with data
simulated using β = 5 × ID×1 rather than β = 1.5 × ID×1 . Upper % bias and lower % bias denote the
upper and lower 95% confidence intervals of the percent difference between the estimated incremental
effect and the true incremental effect. RMSE=root mean squared error.
28
Table 4: Summary Statistics for ISSG Data (N=209)
Variable
Mean
Standard
Deviation
Age
58.65
13.56
Levels Fused
10.36
4.34
Count
Percent
Female
175
84%
Posterior Approach
71
34%
Baseline
Count
Percent
Physical Functioning Domain
PF=1
0
0%
PF=2
10
5%
PF=3
43
21%
PF=4
65
31%
PF=5
77
37%
PF=6
14
7%
Role Limitations Domain
RL=1
8
4%
RL=2
68
33%
RL=3
4
2%
RL=4
129
62%
Social Functioning Domain
SF=1
38
18%
SF=2
38
18%
SF=3
63
30%
SF=4
47
22%
SF=5
12
11%
Pain Domain
P=1
1
0%
P=2
12
6%
P=3
25
12%
P=4
50
24%
P=5
75
36%
P=6
46
22%
Mental Health Domain
MH=1
37
18%
MH=2
65
31%
MH=3
56
27%
MH=4
37
18%
MH=5
14
7%
Vitality Domain
V=1
4
2%
V=2
29
14%
V=3
53
25%
V=4
61
29%
V=5
62
30%
29
Post-operative
Count Percent
0
27
61
35
74
12
0%
13%
29%
17%
35%
6%
23
80
7
99
11%
38%
3%
47%
79
44
50
26
10
38%
21%
24%
12%
5%
15
27
70
47
35
15
7%
13%
33%
22%
17%
7%
83
64
34
23
5
40%
31%
16%
11%
2%
6
69
67
37
30
3%
33%
32%
18%
14%
Table 5: Regression Resultsa
Outcome:
Age
Female
Levels Fused
Posterior Approach
Baseline HRQoL
SF-6D Index
OLS
OLS
QALY
0.00*
(0.00)
-0.02
(0.02)
-0.00
(0.00)
0.01
(0.02)
Beta
MLE
QALY
0.00
(0.00)
-0.10
(0.11)
-0.01
(0.01)
0.07
(0.09)
Beta
QMLE
QALY
0.00*
(0.00)
-0.07
(0.10)
-0.01
(0.01)
0.07
(0.09)
0.57***
(0.07)
2.49***
(0.40)
2.54***
(0.37)
PF
Ordered Probit
PF
0.01
(0.01)
0.01
(0.21)
-0.03*
(0.02)
0.23
(0.18)
RL
-0.01**
(0.01)
-0.59***
(0.22)
-0.03*
(0.02)
0.40**
(0.19)
SF
0.01
(0.01)
0.21
(0.21)
-0.02
(0.02)
0.24
(0.19)
P
0.01*
(0.01)
-0.15
(0.20)
0.02
(0.02)
0.00
(0.18)
MH
0.01*
(0.01)
-0.40*
(0.23)
-0.01
(0.02)
0.00
(0.19)
V
0.00
(0.01)
-0.22
(0.21)
0.01
(0.02)
0.15
(0.18)
0.49***
(0.09)
RL
0.36***
(0.08)
SF
0.39***
(0.07)
P
0.43***
(0.07)
MH
0.59***
(0.08)
V
0.42***
(0.07)
RMSE
0.1103
.1218
0.1100
0.1099
a
Results based on OLS, Beta MLE, Beta QMLE, and Ordered Probit regressions. Beta MLE and
QMLE estimation follows the procedure and code available from Basu & Manca (2012). Standard
errors in parenthesis, * p<0.1. ** p<0.05. *** p<0.01. RMSE: root mean squared error.
30
Table 6: Incremental Effectsa
Age
Female
Levels Fused
Posterior Approach
Baseline HRQoL
PF
RL
SF
P
MH
V
OLS
0.001
(0.001)
-0.016
(0.022)
-0.002
(0.002)
0.015
(0.019)
Beta MLE
0.001
(0.001)
-0.021
(0.024)
-0.003
(0.002)
0.016
(0.021)
Beta QMLE
0.001
(0.001)
-0.016
(0.022)
-0.002
(0.002)
0.015
(0.019)
2SE
0.001
(0.001)
-0.021
(0.021)
-0.001
(0.002)
0.018
(0.018)
0.011
(0.002)
0.000
–
0.001
(0.000)
0.025
(0.004)
0.000
–
0.004
(0.001)
0.010
(0.002)
0.000
–
0.001
(0.000)
0.024
(0.004)
0.000
–
0.003
(0.001)
0.011
(0.001)
0.000
–
0.001
(0.000)
0.024
(0.003)
0.000
–
0.003
(0.001)
0.008
(0.002)
0.005
(0.001)
0.011
(0.002)
0.015
(0.003)
0.015
(0.002)
0.005
(0.001)
a
Incremental effects on QALYs estimated via the method of recycled predictions following OLS,
Beta MLE, Beta QMLE, and 2SE (Oaxaca, 1973; Graubard & Korn, 1999; Basu & Rathouz, 2005;
Basu, 2005; Glick, 2007; Kleinman & Norton, 2009). Beta MLE and QMLE estimation follows
the procedure and code available from Basu & Manca (2012). Bootstrapped standard errors in
parenthesis based on 1,000 iterations.
31
Download