Analyzing Observational Data: Focus on Propensity Scores Arlene Ash QMC - Third Tuesday September 21, 2010 (as amended, Sept 23) 1 The Problem • Those with the intervention and those without have markedly different values for important measured risk factors & • Outcome is related to the risk factors that are imbalanced between the groups & • It is not clear how the risk factors and outcome are related • Why may standard analyses be misleading? 2 True and Modeled Relationship Between Risk and Outcome 1.0 Outcome 0.8 0.6 0.4 0.2 0 0 0.5 1.0 Risk 1.5 2.0 3 Is Imbalance in Risk a Problem? • If we correctly model the relationship between risk factors and outcome, we correctly estimate effect of the intervention • With many risk factors, hard to know if the relationship between risk factors and outcome is correctly modeled • Propensity score - a way to reduce the effect of imbalance in measured risk when models may be inadequate 4 Propensity Score Method (Key Idea) • The propensity score (PS) for an observation is the probability that the observation is “exposed” or “got the intervention” • Use the PS model in pre-processing the data – To draw a sub-sample where the exposed and nonexposed groups are fairly balanced on risk factors. Then – Use standard techniques to analyze the sub-sample5 Simple Propensity Score Approach • Estimate a model to predict the “probability of intervention/exposure” – This is “the propensity score” • Divide the population into PS quintiles • Create a subsample by taking equal numbers of exposed and unexposed observations from each quintile • Use a subsequent regression model to estimate the effect of the intervention in the subsample 6 Propensity Score Sampling Example PS Quintile # Cases # Controls # Sampled Lowest 12 81 24 2nd 30 67 60 Middle 44 38 76 4th 53 15 30 Highest 78 8 16 217 209 206 Total 7 Propensity Score Sampling Example: Treatments for Drug Abusers • Patients seeking substance abuse detoxification in Boston receive either Residential detoxification Lasts ~ one week + encouragement for postdetox treatment, or Acupuncture Acute (daily) detox + 3-6 months of maintenance with acupuncture and motivational counseling 8 Data • From Boston’s publicly-funded substance abuse treatment system • All cases discharged from residential detox or acupuncture between 1/93 and 9/94 • Client classified (only once) as residential or acupuncture based on the modality of first discharge 9 Outcome • Is client re-admitted to detox within 6 months? (Y/N) • Study question: Are acupuncture clients more likely to be re-admitted than residential detox clients? – Exposure = assigned to accupuncture 10 Client Characteristics Available At Time Of Admission • • • • • • • Gender Race/ethnicity Age Education Employment status Income Health insurance status • Living situation • Prior mental health treatment • Primary drug • Substance abuse treatment history 11 Residential Detox & Acupuncture Cases: % with Various Characteristics Characteristic Gender: female Race/ethnicity: black Hispanic Residential Acupuncture (n = 6,907) 29 (n = 1,104) 33 46 46 12 White 41 10 43 Education: HS grad College graduate 56 4 59 13 12 Characteristics of Residential Detox & Acupuncture Clients (2) Characteristic Residential Acupuncture (n = 6,907) (n = 1,104) Employment: unemployed Insurance: uninsured Medicaid Private insurance Lives: with child In shelter 86.8 43.2 65.4 52.3 28.2 21.2 3.0 15.4 9.5 19.3 30.3 2.9 13 Characteristics of Residential Detox & Acupuncture Clients (3) Residential Acupuncture Characteristic (n = 6,907) (n = 1,104) Prior mental health treatment 12.3 27.8 Primary drug: alcohol 42.3 32.4 Cocaine 16.2 16.6 Crack 15.9 20.2 Heroin 24.6 19.0 14 Characteristics of Residential Detox & Acupuncture Clients (4) Residential Acupuncture Characteristic (n = 6,907) Substance abuse admits in the last year Residential detox: 0 1 2+ Short-term residential: 0 Long-term residential: 0 Outpatient: None Acupuncture: None 56.7 20.2 23.1 76.2 80.5 80.6 95.9 (n = 1,104) 81.0 12.1 7.0 94.8 93.5 54.3 90.1 15 Results Of Standard Analysis Percentage of clients re-admitted to detox within 6 months • Among 1,104 acupuncture cases, 18% re-admitted • Among 6,907 residential detox cases, 36% re-admitted • Raw odds ratio = 0.40 From a multivariable stepwise logistic regression model: • Odds ratio for acupuncture: 0.71 (CI = 0.53-0.95) 16 What’s the Worry? How Do We Address It? • Given how different the two groups are, can we trust a model to correctly estimate the effect of acupuncture? • PS methods generalize (long-standing) matching-withinstrata methods that work well with 1 or 2 predictors • PS can address imbalances in many important predictors simultaneously • Both traditional and PS matching allow for – A pooled estimate (across all strata) or – When N is large enough, stratum-specific estimates 17 Propensity Score Application • Use stepwise logistic regression to build a model to predict whether a client “is exposed”(i.e., receives acupuncture) • Select sub-samples of exposed and non-exposed with similar distributions of the “propensity score” (predicted probability of being exposed) • Model (as before) on the sub-sample 18 Sampling Results • Able to match 740 who received acupuncture (out of 1,104) with 740 people who did not (out of 6,907) • The risk factors in this subsample of 1480 are much more balanced between the two groups 19 Characteristics of Clients in Subsample (vs. Full Sample) Characteristic College graduate Employed Private Insurance Lives with child or adult Lives in shelter Prior mental health Rx Residential 7% 41% 9% 72% 5% 21% (4%) (13%) (3%) (55%) (30%) (12%) Acupuncture 7% 42% 6% 77% 4% 21% (13%) (57%) (15%) (76%) (3%) (28%) 20 Comparing Standard and Propensity Score Findings From the multivariable model fit to all cases: Odds Ratio for acupuncture: 0.71 95% Confidence Interval: 0.53-0.95 From multivariable model fit to more comparable subsample: OR for acupuncture: 0.61 95% CI: 0.39-0.94 21 Summary • In this case, results were similar - Why? Original model was very good (C-statistic = 0.96) • What we learned from the PS analysis: – Could find a subset of (about 10% of) patients who got residential detox who look very similar to those who got acupuncture – Skeptics were more receptive to findings from the PS analysis 22 Which X’s Belong in the PS Model? The goal is to estimate the effect of exposure E on outcome Y • Confounders (Brookhart’s X1 variables)? – Directly affect both E and Y • Simple predictors (X2 s)? – Affect Y but not E • Simple selectors (X3 s)? – Affect E but not Y 23 Example The goal is to estimate the effect of E = CABG surgery on Y = 30-day mortality following admission for a heart attack – Confounder (e.g., disease severity) – Simple predictors (e.g., home support) – Simple selectors, aka “instrumental variables” (e.g., random assignment) 24 Variable type Belongs in Directly affects which model Outcome Exposure Subsequent (Y) (E) PS Regression X1 Confounder 1 1 Yes Yes X2 Predictor 1 0 ? Yes X3 Selector 0 1 No ? ? = inclusion should neither harm nor help 25 Discussion • The “pre-processing” that occurs when subsampling to create “PS-balanced” comparison groups protects against bias from confounding variables • Putting selector variables in the PS model will hurt accuracy (by reducing the numbers of good matches) without making the groups more comparable • Subsequent regression improves accuracy 26