Combining gp propensity p y scores with Instrumental Variable analysis Paul L. Hebert, PhD AcademyHealth Annual Research Meeting June 29, 29 2010 Motivation Propensity score matching is good for addressing limited overlap of distributions of observed covariates Instrumental Variables (IV) good for addressing unequal distribution of unobserved covariates What if you have both? You know you have bad distribution of observed covariates across treatments You think you also have bad distribution of unobserved covariates Should Sh ld you make k an attempt to balance b l the h observed b d covariates before applying an IV? Motivation continued Motivation, We often do some ad hoc “trimming” or “screening” of the sample when there is limited overlap in covariates E.g., In a comparison between VA and Medicare beneficiaries, get rid id off allll the h women iin the h sample l bbecause VA iis 95% male l Pre-screening using the propensity score (Crump, 2009) is more systematic i and d can improve i precision i i and d accuracy Crump, RK(2009) Biometrika 96: 187-199 Eg. Is it beneficial for Emergency Medical Technicians (EMT ) tto place (EMTs) l an iintravenous t catheter th t out-of-hospital t fh it l Data on patients who did/did not get intravenous access placed by EMT. (Seymour et al working paper) Detailed clinical information* Large differences in observed covariates Good Instrumental Variable Certain EMTs place out-of-hospital catheters more often than others Strongly correlated with observed treatment Uncorrelated with outcomes among patients who categorically require an in-field catheter placement No outofhospital catheter Out of hospital p catheter N 28,254 28,078 Age 63 65 Male 42% 49% CVD 20% 56% R i t Respiratory 18% 13% Life threateningg 1% 8% Urgent 31% 78% Non-urgent 67% 21% In-hospital deaths 732 805 Cause of disease EMS Severity code * Other variables: BP, heart rate, respiratory rate, oxygen saturation, Glasgow score, interventions, response time, etc. Why propensity score screening + regression analysis works Distribution of patients across deciles of the propensity score, by treatment status 0 10 20 30 40 Treatment B 0 Percent 10 20 30 0 40 Treatment A 0 .5 Pr(tx) Graphs p by y tx Folks in these groups probably weren’t giving you reliable estimates of the treatment effect anyways 1 Why it might work for IV analysis Depends on where the “marginal patient” is along the propensity score continuum If few people at the extremes of the propensity score are affected by the instrument, then trimming might improve precision Eg., If severely ill patients almost always get out-of-hospital access, regardless of practice patterns of the EMT, then severely ill patients are not induced to change their treatment status by the instrument, and so are not “marginal” patients. Trimmingg the sample p of these patients p might g improve p the precision of the IV estimate Why propensity score screening + regression analysis works Distribution of patients across deciles of the propensity score, by treatment status 0 10 20 30 40 Treatment B 0 Percent 10 20 30 0 40 Treatment A 0 .5 Pr(tx) Graphs p by y tx If there are no “marginal patients” in the tails, then maybe IV is better off not having them in the model 1 Simulations Ob Observed d patient i characteristics h i i X N(0 1) X~N(0,1) Instrumental Variable (IV): EMT proclivity to place catheter in-field Z=(0,1); 1=High use of in-field catheter 0=low use of in-field catheter Treatment: Out-of-hospital catheter placement Tr=(0,1); 1=Out-of-hospital catheter 0=No Out-of-hospital catheter Scenario 1: IV affects the probability of treatment in all patients equally Prob(Tr=1)=logit(βX+Z+u1) β>0 separation between treatments on observed patient characteristics SScenario 2: 2 IV affects ff probability b b l off treatment only for patients in the middle of the distribution of X Prob(Tr=1| P b(T 1| |x|≤1)=logit(βX+Z+u | |≤1) l i (βX Z 1) Prob(Tr=1| |x|>1)=logit(βX+u1) Outcome: Mortality Prob(Y=1)=logit(X+δTrueTr+u2) δTrue=0 u1, u2 are normally distributed and correlated, therefore selection bias Simulation continued Simulation, • • • Perform 1000 simulations Vary separation of observed covariates (X) between treatments from 0 to 1.75 standard deviations Estimate treatment effect δ (recall δTrue=0) – – – • δIV= 2SRI IV estimate δPS= Propensity score matched estimate δIV+PS= 2SRI IV performed on propensity score matched sample Report bias=(δIV- δTrue) and mean squared deviation=(δIVδTrue)2 Simulation results: Scenario 1 Bi (sd) Bias ( d) Standardized S d di d difference diff in i observed b d covariates i (X) between treatment groups 0 P-Score IV IV+PS 0.5 1 1.5 1.75 0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07) ‐0.01(0.23) ‐0.04(0.24) ‐0.03(0.25) ‐0.02(0.21) ‐0.02(0.18) 0.01(0.25) ‐0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32) Estimate from ppropensity p y score is badlyy biased because of unobserved selection bias (odds ratio on treatment≈2 when it should be 1.0) Simulation results: Scenario 1 Bi ((sd) Bias d) Standardized S d di d difference diff in i observed b d covariates i (X) between treatment groups 0 P-Score IV IV+PS 0.5 1 1.5 1.75 0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07) -0.01(0.23) -0.04(0.24) -0.03(0.25) -0.02(0.21) -0.02(0.18) 0.01(0.25) -0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32) Bias on IV estimate is small and does not systemically increased with greater seperation Simulation results: Scenario 1 Bi ((sd) Bias d) Standardized S d di d difference diff in i observed b d covariates i (X) between treatment groups 0 P-Score IV IV+PS 0.5 1 1.5 1.75 0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07) -0.01(0.23) -0.04(0.24) -0.03(0.25) -0.02(0.21) -0.02(0.18) 0.01(0.25) -0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32) Bias of IV conducted on P-Score matched sample is similar to bias of IV conducted on full-sample Simulation results: Scenario 1 Standardized S d di d difference diff in i observed b d covariates i (X) between treatment groups Bias (sd) 0 0.5 1 1.5 1.75 P-Score 0.69(0.05) 0.68(0.05) 0.69(0.05) 0.69(0.06) 0.70(0.07) IV -0.01(0.23) -0.04(0.24) -0.03(0.25) -0.02(0.21) -0.02(0.18) IV+PS 0.01(0.25) -0.00(0.25) 0.04(0.27) 0.08(0.31) 0.09(0.32) Mean Squared Deviation P Score P-Score 0 48 0.48 0 47 0.47 0 47 0.47 0 48 0.48 0 49 0.49 IV 0.05 0.06 0.06 0.05 0.03 IV+PS 0.06 0.06 0.07 0.10 0.11 MSD on IV estimates are similar, regardless g of matching on P-score Simulation results: Scenario 2 Standardized S d di d difference diff in i observed b d covariates i (X) between treatment groups Bias (sd) 0 0.5 1 1.5 1.75 P‐Score 0.69(0.05) 0.69(0.05) 0.69(0.05) 0.69(0.06) 0.69(0.07) IV ‐0.09(0.33) ‐0.11(0.33) ‐0.15(0.29) ‐0.06(0.23) ‐0.03(0.20) IV+PS ‐0.07(0.36) ‐0.02(0.33) ‐0.00(0.32) 0.04(0.34) 0.08(0.35) Mean Squared Deviation P Score P-Score 0 48 0.48 0 48 0.48 0 47 0.47 0 48 0.48 0 48 0.48 IV 0.12 0.12 0.11 0.05 0.04 IV+PS 0.13 0.11 0.10 0.12 0.13 Bias of IV conducted on P-Score matched sample is smaller than bias off IV cconducted nd cted on n ffull-sample, ll sam le but not by much. MSDs are similar. Empirical Example: Out-of-hospital placement of catheter by EMT Out-of-hospital placement of intravenous catheter by EMTs is associated with l lower iin-hospital h i l mortality li regardless of the estimation method used. used Seymour, C (2010) U Wash Working paper. Odds ratio Odd i on out-of-hospital use of i t intravenous catheter Unadjusted 1.11 (p=0.054) Logistic regression 0.67 (p<0.001) Propensity score matched 0.69 (p=0.002) 2SRI Instrumental variable on full sample l 0.78 (p=0.112) 2SRI Instrumental variable on P-score matched sample 0.68 (p=0.01) Conclusion Preliminary simulations suggest limited benefit (but no apparent harm) to using IV in combination with propensity score matching IV estimates were impressively close to true estimate even with large separation between the treatment groups on observed b d covariates Further study needs to address the sensitivity of these results l A Practical Guide to Propensity Score Models Paul L. Hebert, PhD I Investigator, VA HSR&D Puget Sound and ti t VA HSR&D P t S d d Research Associate Professor, Department of Health Services University of Washington School of Public Health Paul.Hebert2@va.gov Heberp@u.washington.edu June 29, 2009 Funding provided by NIH/NIDDK Motivation Researcher is using observational data to compare outcomes among two or more treatments Observed covariates differ substantially between Ob d i t diff b t ti ll b t treatment groups Propensity Score Models attempt to affect a balance in p y p observed covariates between treatment groups Create a single variable—a propensity score– that captures how differences in these covariates contribute to a patient’s differences in these covariates contribute to a patient s probability of receiving treatment A vs. treatment B Use this propensity score to create groups of treatment A versus treatment B patients that look similar to each other. Compare outcomes between these groups of well‐matched patients Motivation continued Motivation, continued Especially useful in three situations Substantial non‐overlap of treatment groups on i important covariates. i Some people in Treatment A group don’t look like anybody in Treatment B group Sparse data with binary (0,1) outcomes Rare outcomes with common treatments. Multivariate models would have too few events per right Multivariate models would have too few events per right‐ hand side variable Rare co‐morbidities that are perfectly correlated with binary outcome Applications where timing of treatment is critical Match an “early dialysis” patient to a “late” dialysis patient ti t Using propensity score models requires five steps 1. Estimate propensity score 2 Use the propensity score to create a 2. balance in observed covariates across treatment groups 3. Evaluate the quality of the balance 4. Estimate differences in outcomes between balanced treatment groups 5. Perform sensitivity analyses This talk focuses on the first two steps 1. Estimate propensity score 2 Use the propensity score to create a 2. balance in observed covariates across treatment groups 3. Evaluate the quality of the balance 4. Estimate differences in outcomes between matched treatment groups 5. Perform sensitivity analyses What is a propensity score? Usually a logit (probit if you’re an econometrician) model of treatment (W) as a function of X’s Tr 1 if patient i takes treatment A, 0 if treatment B Tr=1 if patient i takes treatment A 0 if treatment B For each patient, calculate the propensity score ßX from the ß’s and X’s exp ˆX i ˆ Pi 1 exp ˆX i Note that K variables for each patient just got reduced to 1 propensity score Step 1: Estimate PS equation What X’s do you use? How do you build the model? E.g, transformations, interactions g, , What X s do you use? What X’s do you use? Risk Factor Mediator Instrument Outcome Treatment C f Confounders d What X s do you use? What X’s do you use? Rubin (2007) suggests expansive definition of X. To estimate effect of smoking on costs, modeled smoking as a function of: Age, education, occupation, etc.. Seatbelt use, arthritis, number of friends, frequency of having friends over for dinner, membership in clubs, etc. Variables that should not be included are Variables that should not be included are “…are effectively are effectively known to have no possible connection to the outcomes, such as random numbers…, or the weather half‐way around the world (Rubin 2007; p 29) world” (Rubin 2007; p 29) Rubin, DB, Statist Med 2007; 26:20-36 What X s do you use? What X’s do you use? Monte Carlo Simulations by Austin (2006, 2008) and Brookhart (2006) DO include all variables related to the outcome Could get biased results (Brookhart) or imbalance in covariates in matched samples (Austin, 2006) if you include only known confounders. DO NOT include variables related to the treatment but not the outcome (i.e, instrumental variables) ( , ) Including these variables increases the variance of the estimated treatment effect but doesn’t decrease the bias ( (Brookhart) ) Including these variables reduces ability to match (Austin, 2006) Austin PC, PC et al Statisti Med 2006; 26(4) 734 734-753 753 Brookhart, et al, Am J Epi 2006; 163:1149-56 Austin PC, J Clinical Epi 2008; 26: 537-545 What X s do you use? What X’s do you use? Risk Factor Mediator Instrument Outcome Treatment C f Confounders d How do you build the propensity score model? Rubin (2007) suggests an expansive use of transformations and interaction terms. Smoking equations includes Log(weight)*log(height) Log(weight)*log(height), log(weight) log(weight)2, etc. etc “Other, unspecified non‐linear terms” Should not include “five‐way interactions” Austin (2006): Modeled statin A i ( 6) M d l d i use using 257 variables: 24 i i bl main variables and 233 transformations and two‐way interactions of those variables. Schneeweiss (2009) High‐dimensional Propensity Score (hd‐PS) Adjustment SAS Macro does this to your data Don Don’t worry about p‐values t worry about p values, don don’t use step‐wise model t use step wise model building Austin PC et al, Statist. Med 2006; 25:2084‐2106 Rubin D, Statist. Med 2007; 26:20‐36 Schneeweiss S, et al. Epidemiology 2009. 20:512–22 http://www.drugepi.org/downloads/index.php Using propensity score models requires five steps 1. Estimate propensity score 2 Use the propensity score to create a 2. balance in observed covariates across treatment groups 3. Evaluate the quality of the balance 4. Estimate differences in outcomes between matched treatment groups 5. Perform sensitivity analyses Kernal Density plot of logit of the propensity score for Ramipril users V Versus captopril t il users Before PScore Matching 0 pr kdensity p .2 .4 .6 6 Density plots of linear propensity score -4 -2 x Ramipril 0 Captopril 2 Four basic options for using a propensity score 1. Conditioning on the propensity score 2 Stratification on the propensity score 2. 3. Matching on the propensity score 4. Weighting on the Inverse Propensity Score Combinations of the above Four basic options for using a propensity score 1. Conditioning on the propensity score g(Yi)=b0+b1Tri+b2Pscorei+ei Can yield unbiased estimates of rate ratios (Austin 2007), risk C i ld bi d i f i (A i ) i k ratios (Austin, 2008), or differences in means or proportions (Rosenbaum and Rubin, 1985) Problems: Conditioning on the propensity score results in biased (toward null) estimates of odds ratios and hazard ratios Ruben (2004) warns that propensity score must be carefully specified if used directly in the estimation of treatment Rosenbaum and Rubin (1983) Biometrika; 70:41-45 Austin PC, et al Stat Med 2006; 26(4): 734-753 Austin PC PC, et al Stat Med 2007; 26(4): 754 754-768 768 Austin PC, et al J Clinical Epi 2008; 26: 537-545 Rubin DB (2004) Pharmacoepidemiology and Drug Safety 2004; 14:885-887 Four basic options for using a propensity score 2. Stratification on the propensity score Divide sample into (usually) quintiles based on propensity score Calculate differences in outcomes within each quintile (j) and average to get overall treatment effect Four basic options for using a propensity score 2. Stratification on the propensity score Problems: Problems Stratifying on the quintiles of any propensity score model resulted in residual imbalance between treated and untreated subjects in the upper and lower quintiles. (Austin, 2006) When estimating relative risks, stratifying resulted in the greater bias than matching (Austin, 2008) g g( , ) Austin A ti PC, PC ett all Stat St t Med M d 2006; 2006 26(4) 26(4): 734 734-753 753 Austin PC, et al J Clinical Epi 2008; 26: 537-545 Example of density plot of the linear propensity score for two users of two antihypertensives: ramipril and captopril Before PScore Matching You can get a lousy estimate of treatment effect in tails because of limited overlap of covariates in the first and last strata 0 nsity pr kden .2 .4 .6 Density plots of linear propensity score -4 -2 x Ramipril 0 Captopril 2 Four basic options for using a propensity score 3. Matching on the propensity score For each patient in the Treatment A group, find a For each patient in the Treatment A group find a Treatment B patient with a propensity score similar to the Treatment A patient s propensity similar to the Treatment A patient’s propensity score. Advantages Increasingly popular technique with good properties in simulations. Balance on observables is obvious (i.e., it makes a beautiful Table 1)) Covariate balance before and after propensity score matching Before Matching Ramipril Captopril P-value N Demographics Mean Age Female Race White African American Hispanic Asian Other/Unknown Race Long Term Care Disabled Hospitalized Charlson # Meds 1 269 1,269 1 521 1,521 70.3 65% 71.4 70% 0.000 0 002 0.002 53% 11% 11% 7% 19% 4% 26% 13% 16 1.6 7.1 59% 14% 8% 5% 15% 14% 32% 20% 17 1.7 12.1 0.024 0.002 0.000 0.016 0.000 0.000 0.000 0 000 0.000 0.000 Covariate balance before and after propensity score matching Before Matching After matching Ramipril Captopril P-value Ramipril Captopril P-value N Demographics Mean Age Female Race White African American Hispanic Asian Other/Unknown Race Long Term Care Disabled Hospitalized Charlson # Meds 1 269 1,269 1 521 1,521 70.3 65% 71.4 70% 53% 11% 11% 7% 19% 4% 26% 13% 16 1.6 7.1 59% 14% 8% 5% 15% 14% 32% 20% 17 1.7 12.1 859 859 0.000 0 002 0.002 70.6 69% 70.6 69% 0.944 0 557 0.557 0.024 0.002 0.000 0.016 0.000 0.000 0.000 0 000 0.000 0.000 56% 13% 9% 5% 17% 5% 29% 17% 16 1.6 10.6 57% 12% 8% 6% 17% 5% 28% 17% 16 1.6 10.4 0.605 0.949 0.736 0.748 0.103 0.662 0.678 0.481 0 658 0.658 0.923 Four basic options for using a propensity score 3. Matching on the propensity score For each patient in the Treatment A group, find a For each patient in the Treatment A group find a Treatment B patient with a propensity score similar to the Treatment A patient s propensity similar to the Treatment A patient’s propensity score. Advantages Disadvantages: Many options for matching and no standardization If you choose to match, there are If you choose to match there are several techniques Nearest Neighbor Match subject in Treatment A group to subject in Treatment B group with the closest propensity score 5→1 match Match on the 5th digit of the propensity score first Of the resulting unmatched sample, match on the 4 f h l h d l h h th digit of the d f h propensity score Repeat until 1 digit match Mahanalobis Matching Match on the basis of the Mahanalobis distance between subjects Distance between vectors of characteristics, including the propensity score (more later). Options for use with each type of matching procedure 1‐1 matching versus 1‐many Gain efficiency if you have many treatment B subjects that match one Treatment A subject one Treatment A subject. Efficiency gain of 1‐many is surprisingly small (Rosenbaum, 1985) Matching with replacement Allow a subject in treatment group B to serve as a match for ll b hf multiple subjects in treatment group A Matching with calipers g Match Treatment B subjects within a specified caliper or range of a Treatment A’s score (e.g., +/‐0.01 of the propensity score) e.g., discard a nearest neighbor match if the propensity score of the matched Treatment B patient is > caliper Rosenbaum and Rubin, (1985) Biometrics 41, 103-116 What should you do? Bias versus efficiency y y tradeoffs in matching techniques Bias 1‐1 matching 1‐N matching With replacement With t l Without replacement t With calipers Without calipers Inefficiencyy Do like Don: Rubin’s suggested Do like Don: Rubin s suggested method for using propensity scores 1. 2 2. 3. Anti‐parsimonious logit Anti parsimonious logit for propensity score equation For each Treatment A patient (i) define a “donor For each Treatment A patient (i), define a donor pool” of Treatment B patients (j) with linear propensity score within .2 sd p p y of patient i’s p Calculate the Mahalanobis distance (M) to each p patient in the donor pool p Mij=(xi‐xj)Ω‐1(xi‐xj) x= vector of linear propensity score and a few other important covariates Ω=covariance matrix for X’s Rubin continued Rubin, continued 1‐1 matching t hi 4. 5. Match without replacement 6. Use a greedy matching algorithm Match the hardest to match patient to the p best match in the donor pool first based on Mahanalobis distance Repeat with next hardest to match patient until all Treatment A patients are matched or th T t there are no Treatment B patients left in the t B ti t l ft i th donor pool. Four basic options for using a propensity score 4. Weighting by inverse of the propensity score Assign weights to each patient that are equal to the inverse of the probability of receiving the treatment the subject received (i.e., 1/p‐score for patients who got Treatment A; 1/(1‐pscore) for patients who got Treatment B) Calculate the weighted average difference in outcomes between groups Advantages: Good properties in simulations (Austin 2010; L f d & Davidian, 2004) Luceford & D idi ) Disadvantages: Not used very often. Same warning from Ruben (2004) as regards using the propensity score to directly estimate treatment. Lunceford, J. K. and Davidian, M. (2004). Statistics in Medicine 23, 2937–2960 Austin PC, (2010). Statistics in Medicine, forthcoming. Summary Consider propensity scores for sparse data, limited overlap in covariates, and when timing of treatment is important Goal of Propensity Score modeling is to create treatment groups that are balanced on observed covariates To estimate the propensity score Include all variables believed to be related to the outcome, not just the , j confounders. Be anti‐parsimonious. Do not include variables that are only related to the treatment Use interactions, transformations, liberally to effect a balance Use interactions transformations liberally to effect a balance Condition, stratify, match or weight using the propensity score Be careful about odds ratios and hazard ratios from logistic or Cox g regressions using conditioning Last Slide Thank You Last Slide‐