Introduction to Selection Bias in Observational Studies Steven D. Pizer June 29, 2009 Outline • Overview of selection bias: how it arises in study designs and problems it causes. • Methods – Propensity scores methods (Paul). – Propensity scores application (Matt). – Instrumental variables (Steve). • Discussion. Overview • Selection bias is well known. – Randomized controlled trials eliminate it • Why conduct observational studies? – – – – Cost of data collection. Ethical considerations. Faster results. “Real world” settings. • But selection into treatment often correlated with outcome. – For example . . . Study Suggests TV-watching Lowers Physical Activity 27 Aug 2006 A study of low-income housing residents has documented that the more television people say they watched, the less active they were, researchers from Dana-Farber Cancer Institute and colleagues report. The findings of television's effects on physical activity are the first to be based on objective measurements using pedometers, rather than the study subjects' memories of their physical activity, say the researchers. The study will be published online by the American Journal of Public Health on July 27 and later in the journal's September 2006 issue. Overview: Source of Bias in RCTs Treatment group Flip of a Coin Outcome Sorting Comparison group • In RCTs, randomization ensures that – Observed (and unobserved) covariates are balanced between treatment and control groups – Only difference is treatment assignment – Thus, only cause of outcome difference is treatment • No bias b/c coin flip is only driver of sorting and coin flip has no impact on outcomes Overview: Source of Bias in Observational Studies Patient characteristics Observed: health, income, ed, dist. Unobserved: health, skills, attitudes Sorting Outcome Comparison group Provider characteristics Observed: staff, costs, congestion, Unobserved: culture, attitudes, leadership Treatment group Institutional factors laws, programs • In non-randomized studies, things get messy b/c there are many drivers of sorting that also affect outcomes. Observational Study Scenarios • Scenario A: New clinical intervention. Self-care training for CHF. Self-reported health & satisfaction. • Scenario B: Network-level study of guideline adherence. Annual eye and foot exams for diabetics. Amputations and retinopathy. • Scenario C: National study of Medicaid HCBS. NH admission, mortality. Scenario A: CHF Self-Care Study Patient characteristics Observed: health, income, ed, dist. skills, attitudes Provider characteristics Sorting Training group Outcome health, satisfaction Comparison group Observed: staff, costs, congestion, culture, attitudes, leadership Small study with primary data collection. All important factors are observed. Scenario B: Guidelines Study Patient characteristics Observed: health, income, ed, dist. Unobserved: health, skills, attitudes Sorting Outcome amputations retinopathy Non-adherent group Provider characteristics Observed: staff, costs, congestion, Unobserved: culture, attitudes, leadership Adherent group Institutional factors standards Unobserved factors are important. Institutional factors are related to outcome. Scenario C: HCBS Study Patient characteristics Observed: health, income, ed, dist. Unobserved: health, skills, attitudes Sorting Outcome NH admits, mortality Comparison group Provider characteristics Observed: staff, costs, congestion, Unobserved: culture, attitudes, leadership HCBS group Institutional factors program location Unobserved factors are important. Institutional factors drive sorting w/o affecting outcome. Scenarios: Lessons • A: Small study w/o unobservables. Propensity scores. • B: Larger study w/ unobservables & w/o sorting variable uncorrelated w/ outcome. Fatally flawed. • C: Large study w/ unobservables & uncorrelated sorting variable. Instrumental Variables. Translating Diagram Into Equations Patient characteristics Provider characteristics Sorting Institutional factors Treatment Outcome Comparison Eq 1: Outcome = Treatment + Xpatient + Xprovider + u1 Eq 2: Treatment = Xpatient + Xprovider + Xinstitutions + u2 Selection bias occurs in Eq 1 when u1 is correlated with u2, and therefore with Treatment. Intermission This presentation adapted from: Pizer, SD, “An Intuitive Review of Methods for Observational Studies Of Comparative Effectiveness,” Health Services and Outcomes Research Methodology, 9(1) (March 2009): 54-68. Shameless Plug Shameless Plug Instrumental Variables • Overview • An application Instrumental Variables: Overview • Selection bias means naively estimated effect of Treatment on Outcome includes influence of unobservables correlated with Treatment variable. • So we have to either remove effect of correlated unobservable or control for it. IV Overview (2) • Instrumental variables (IV) uses variables that affect sorting but are not related to patient or provider unobservables. • These are often institutional factors. – Example: Residence in county with HCBS waiver program. HCBS recipients vs. others => bias. Waiver county residents vs. others => no bias. IV: A General Approach Eq 1: Outcome = O(Treat, u2hat, Xpatient, Xprovider) + u1 Eq 2: Treatment = T(Xpatient, Xprovider, Xinstitutions) + u2 • Applicable to linear or nonlinear models with additive errors. • Estimate Eq 2 (like propensity score estimation). • Construct predicted value of u2 (u2hat). • Add u2hat to outcome equation to control for correlated unobservables. IV: Issues Eq 1: Outcome = O(Treat, u2hat, Xpatient, Xprovider) + u1 Eq 2: Treatment = T(Xpatient, Xprovider, Xinstitutions) + u2 • Must have identifying instrument(s): Xinstitutions in this case. • Identifying instrument(s) must be strongly correlated with Treatment and excludable from Outcome equation. • Estimate only applies to those affected by instrument(s). IV Application: Is Fragmented Financing Bad For Your Health? • Joint work with John A. Gardner. • Funded by VA HSR&D. Introduction • Health care financing in America is decentralized. • People change health plans due to changes in job, residence, marital status, retirement, income and wealth, health status, etc. • Good side: Lots of choices, flexible coverage. • Bad side: Continuity of care disrupted. Introduction (2) • Empirical goal: How bad is the bad side? • Outcome: Hospitalization for ambulatory care sensitive conditions (ACSC). • Population: Low-income or disabled veterans in 1999-2000 (minimizes private insurance). • Fragmentation: VA & M’care/M’caid networks don’t overlap, so mixed financing potential discontinuity of care. Conceptual Background • Continuity of care: A relationship b/w a patient and physician built over repeated visits. • CoC lower risk of hospitalization and ER, better immunizations. • Why? Better access to information, investment in relationship, communication w/specialists. Modeling Challenge • Fragmentation and outcomes are simultaneously determined. • (Unobserved) health change leads to: – Enrollment in new programs (fragmentation) – Higher risk of ACSC hospitalization • Address this w/ instrumental variables. IV Technique • Two-Stage Residual Inclusion (Terza et al., 2008) accounts for simultaneity. (1) f it = F(α 0 + α1Dist it ε+ α 2 Restrict st + α 3 Z it + α 6 DX it −1 ) + (2) h it = εH(γ 0 +Zγ 1f it +DX γ 2 ˆ itF +)γ 3 ε it + γ 4 Zit = Demogit, Priorityit, t it −1 + H it f it Data & Sample • 30% sample of veterans w/any VHA care in FY98-01. • Exclude Priority 7 or 8, nursing home residents, deaths, invalid data. • Exclude those with any Medicare HMO enrollment. • Divide into 6-month periods for analysis: 1999a (lag for 99b), 1999b, 2000a. Key Variables • Fragmentation: 1 - Max{% MD visits VA, % MD visits non-VA}; range = [0, 0.5]; higher values imply more fragmentation. • ACSC hospitalization: “Potentially preventable” hosp for chronic conditions (AHRQ, 2001). • Utilization: Count of MD visits in VA & M’care/M’caid. • Risk adjustment: 29 comorbidity indicators (Elixhauser, 1998). Key Variables (2) • M’caid restrictiveness: – Follow Cutler and Gruber (QJE 1996). – National micro data from MEPS 1998. – Apply State-year eligibility rules from TRIM3 (Urban Institute) to simulate proportion eligible. – 1-SimProp higher = more restrictive. Identification • System identified by 2 instruments. • 2 instruments for 1 endogenous variable permits overidentifying restrictions tests. (1) f it = F(α 0 + α1Dist it ε+ α 2 Restrict st + α 3 Z it + α 6 DX it −1 ) + (2) h it = εH(γ 0 +Zγ 1f it +DX γ 2 ˆ itF +)γ 3 ε it + γ 4 it −1 + H it f it Estimation • Eq 1 (Fragmentation): two-limit tobit. • Eq 2 (ACSC Hospitalization): probit. • Standard errors estimated by bootstrapping. Derivation of Sample Merged sample for 1999b and 2000a Personperiods (1999a used as lag for 1999b) Initial Sample 1,053,294 Exclude NH, invalid data, died, Medicare HMO 902,742 Exclude if Utilization = 0 637,991 Selected Means Age in 2000 61 Female 5% % nonwhite 26% in ZIP code Priority 1-3 28% Priority 5 57% ACSC hosp in 3.4% 6 months MD visits 6 Fragmentation 0.07 % MD visits VA % VA-only 72% (1999) 50% (1999) Fragmentation Model Variable Coefficient t-statistic Distance to VA 0.001 39.47 M’caid Restrictiveness 0.006 4.87 Age in 2000 0.009 191.54 Female 0.026 9 % nonwhite in ZIP -0.125 -51.86 Trend (d2000a) 0.03 28.01 ACSC Hospitalization Model Coefficient Marginal Effect Fragmentation 1.1 5.4% % nonwhite 0.24 1.2% Female -0.12 -0.5% Renal failure 0.23 1.5% Diabetes 0.25 1.5% Simplified linear overID test P = 0.39 Conclusion • A standard deviation decrease in fragmentation would reduce 6-mo prob of ACSC hosp from 3.4% to 2.7%. • Could reduce hospital spending by about 2.3%. • Suggests substantial coordination failures associated with decentralized financing. • Need to confirm result by checking other outcomes (e.g., preventive care, medication adherence) and other populations. Discussion