Introduction to Selection Bias in Observational Studies Steven D. Pizer June 29, 2009

advertisement
Introduction to Selection Bias in
Observational Studies
Steven D. Pizer
June 29, 2009
Outline
• Overview of selection bias: how it arises in
study designs and problems it causes.
• Methods
– Propensity scores methods (Paul).
– Propensity scores application (Matt).
– Instrumental variables (Steve).
• Discussion.
Overview
• Selection bias is well known.
– Randomized controlled trials eliminate it
• Why conduct observational studies?
–
–
–
–
Cost of data collection.
Ethical considerations.
Faster results.
“Real world” settings.
• But selection into treatment often correlated with
outcome.
– For example . . .
Study Suggests TV-watching Lowers Physical Activity
27 Aug 2006
A study of low-income housing residents has
documented that the more television people say they
watched, the less active they were, researchers from
Dana-Farber Cancer Institute and colleagues report.
The findings of television's effects on physical activity
are the first to be based on objective measurements
using pedometers, rather than the study subjects'
memories of their physical activity, say the
researchers. The study will be published online by the
American Journal of Public Health on July 27 and
later in the journal's September 2006 issue.
Overview:
Source of Bias in RCTs
Treatment
group
Flip of a Coin
Outcome
Sorting
Comparison
group
• In RCTs, randomization ensures that
– Observed (and unobserved) covariates are balanced
between treatment and control groups
– Only difference is treatment assignment
– Thus, only cause of outcome difference is treatment
• No bias b/c coin flip is only driver of sorting and
coin flip has no impact on outcomes
Overview: Source of Bias in
Observational Studies
Patient
characteristics
Observed: health,
income, ed, dist.
Unobserved: health,
skills, attitudes
Sorting
Outcome
Comparison
group
Provider
characteristics
Observed: staff,
costs, congestion,
Unobserved:
culture, attitudes,
leadership
Treatment
group
Institutional
factors
laws, programs
• In non-randomized studies, things get messy b/c there are
many drivers of sorting that also affect outcomes.
Observational Study Scenarios
• Scenario A: New clinical intervention. Self-care
training for CHF. Self-reported health &
satisfaction.
• Scenario B: Network-level study of guideline
adherence. Annual eye and foot exams for
diabetics. Amputations and retinopathy.
• Scenario C: National study of Medicaid HCBS.
NH admission, mortality.
Scenario A: CHF Self-Care Study
Patient
characteristics
Observed: health,
income, ed, dist.
skills, attitudes
Provider
characteristics
Sorting
Training
group
Outcome
health,
satisfaction
Comparison
group
Observed: staff,
costs, congestion,
culture, attitudes,
leadership
 Small study with primary data collection.
 All important factors are observed.
Scenario B: Guidelines Study
Patient
characteristics
Observed: health,
income, ed, dist.
Unobserved: health,
skills, attitudes
Sorting
Outcome
amputations
retinopathy
Non-adherent
group
Provider
characteristics
Observed: staff,
costs, congestion,
Unobserved:
culture, attitudes,
leadership
Adherent
group
Institutional
factors
standards
 Unobserved factors are important.
 Institutional factors are related to outcome.
Scenario C: HCBS Study
Patient
characteristics
Observed: health,
income, ed, dist.
Unobserved: health,
skills, attitudes
Sorting
Outcome
NH admits,
mortality
Comparison
group
Provider
characteristics
Observed: staff,
costs, congestion,
Unobserved:
culture, attitudes,
leadership
HCBS
group
Institutional
factors
program location
 Unobserved factors are important.
 Institutional factors drive sorting w/o affecting outcome.
Scenarios: Lessons
• A: Small study w/o unobservables.
Propensity scores.
• B: Larger study w/ unobservables & w/o
sorting variable uncorrelated w/ outcome.
Fatally flawed.
• C: Large study w/ unobservables &
uncorrelated sorting variable. Instrumental
Variables.
Translating Diagram Into
Equations
Patient
characteristics
Provider
characteristics
Sorting
Institutional
factors
Treatment
Outcome
Comparison
Eq 1: Outcome = Treatment + Xpatient + Xprovider + u1
Eq 2: Treatment = Xpatient + Xprovider + Xinstitutions + u2
Selection bias occurs in Eq 1 when u1 is correlated with u2,
and therefore with Treatment.
Intermission
This presentation adapted from:
Pizer, SD, “An Intuitive Review of Methods for Observational Studies
Of Comparative Effectiveness,” Health Services and Outcomes Research
Methodology, 9(1) (March 2009): 54-68.
Shameless Plug
Shameless Plug
Instrumental Variables
• Overview
• An application
Instrumental Variables: Overview
• Selection bias means naively estimated
effect of Treatment on Outcome includes
influence of unobservables correlated with
Treatment variable.
• So we have to either remove effect of
correlated unobservable or control for it.
IV Overview (2)
• Instrumental variables (IV) uses variables
that affect sorting but are not related to
patient or provider unobservables.
• These are often institutional factors.
– Example: Residence in county with HCBS
waiver program. HCBS recipients vs. others
=> bias. Waiver county residents vs. others =>
no bias.
IV: A General Approach
Eq 1: Outcome = O(Treat, u2hat, Xpatient, Xprovider) + u1
Eq 2: Treatment = T(Xpatient, Xprovider, Xinstitutions) + u2
• Applicable to linear or nonlinear models with
additive errors.
• Estimate Eq 2 (like propensity score estimation).
• Construct predicted value of u2 (u2hat).
• Add u2hat to outcome equation to control for
correlated unobservables.
IV: Issues
Eq 1: Outcome = O(Treat, u2hat, Xpatient, Xprovider) + u1
Eq 2: Treatment = T(Xpatient, Xprovider, Xinstitutions) + u2
• Must have identifying instrument(s): Xinstitutions in
this case.
• Identifying instrument(s) must be strongly
correlated with Treatment and excludable from
Outcome equation.
• Estimate only applies to those affected by
instrument(s).
IV Application: Is Fragmented
Financing Bad For Your Health?
• Joint work with John A. Gardner.
• Funded by VA HSR&D.
Introduction
• Health care financing in America is
decentralized.
• People change health plans due to changes in
job, residence, marital status, retirement,
income and wealth, health status, etc.
• Good side: Lots of choices, flexible
coverage.
• Bad side: Continuity of care disrupted.
Introduction (2)
• Empirical goal: How bad is the bad side?
• Outcome: Hospitalization for ambulatory
care sensitive conditions (ACSC).
• Population: Low-income or disabled
veterans in 1999-2000 (minimizes private
insurance).
• Fragmentation: VA & M’care/M’caid
networks don’t overlap, so mixed financing
potential discontinuity of care.
Conceptual Background
• Continuity of care: A relationship b/w a
patient and physician built over repeated
visits.
• CoC
lower risk of hospitalization and
ER, better immunizations.
• Why? Better access to information,
investment in relationship, communication
w/specialists.
Modeling Challenge
• Fragmentation and outcomes are
simultaneously determined.
• (Unobserved) health change leads to:
– Enrollment in new programs (fragmentation)
– Higher risk of ACSC hospitalization
• Address this w/ instrumental variables.
IV Technique
• Two-Stage Residual Inclusion (Terza et al.,
2008) accounts for simultaneity.
(1) f it = F(α 0 + α1Dist it ε+ α 2 Restrict st + α 3 Z it + α 6 DX it −1 ) +
(2) h it = εH(γ 0 +Zγ 1f it +DX
γ 2 ˆ itF +)γ 3 ε it + γ 4
Zit = Demogit, Priorityit, t
it −1
+
H
it
f
it
Data & Sample
• 30% sample of veterans w/any VHA care in
FY98-01.
• Exclude Priority 7 or 8, nursing home
residents, deaths, invalid data.
• Exclude those with any Medicare HMO
enrollment.
• Divide into 6-month periods for analysis:
1999a (lag for 99b), 1999b, 2000a.
Key Variables
• Fragmentation: 1 - Max{% MD visits VA, % MD
visits non-VA}; range = [0, 0.5]; higher values
imply more fragmentation.
• ACSC hospitalization: “Potentially preventable”
hosp for chronic conditions (AHRQ, 2001).
• Utilization: Count of MD visits in VA &
M’care/M’caid.
• Risk adjustment: 29 comorbidity indicators
(Elixhauser, 1998).
Key Variables (2)
• M’caid restrictiveness:
– Follow Cutler and Gruber (QJE 1996).
– National micro data from MEPS 1998.
– Apply State-year eligibility rules from TRIM3
(Urban Institute) to simulate proportion
eligible.
– 1-SimProp
higher = more restrictive.
Identification
• System identified by 2 instruments.
• 2 instruments for 1 endogenous variable permits
overidentifying restrictions tests.
(1) f it = F(α 0 + α1Dist it ε+ α 2 Restrict st + α 3 Z it + α 6 DX it −1 ) +
(2) h it = εH(γ 0 +Zγ 1f it +DX
γ 2 ˆ itF +)γ 3 ε it + γ 4
it −1
+
H
it
f
it
Estimation
• Eq 1 (Fragmentation): two-limit tobit.
• Eq 2 (ACSC Hospitalization): probit.
• Standard errors estimated by bootstrapping.
Derivation of Sample
Merged sample for 1999b and 2000a Personperiods
(1999a used as lag for 1999b)
Initial Sample
1,053,294
Exclude NH, invalid data, died,
Medicare HMO
902,742
Exclude if Utilization = 0
637,991
Selected Means
Age in 2000 61
Female
5%
% nonwhite 26%
in ZIP code
Priority 1-3 28%
Priority 5
57%
ACSC hosp in 3.4%
6 months
MD visits
6
Fragmentation 0.07
% MD visits
VA
% VA-only
72% (1999)
50% (1999)
Fragmentation Model
Variable
Coefficient
t-statistic
Distance to VA
0.001
39.47
M’caid Restrictiveness 0.006
4.87
Age in 2000
0.009
191.54
Female
0.026
9
% nonwhite in ZIP
-0.125
-51.86
Trend (d2000a)
0.03
28.01
ACSC Hospitalization Model
Coefficient
Marginal Effect
Fragmentation
1.1
5.4%
% nonwhite
0.24
1.2%
Female
-0.12
-0.5%
Renal failure
0.23
1.5%
Diabetes
0.25
1.5%
Simplified linear overID test
P = 0.39
Conclusion
• A standard deviation decrease in fragmentation
would reduce 6-mo prob of ACSC hosp from
3.4% to 2.7%.
• Could reduce hospital spending by about 2.3%.
• Suggests substantial coordination failures
associated with decentralized financing.
• Need to confirm result by checking other
outcomes (e.g., preventive care, medication
adherence) and other populations.
Discussion
Download