Matching Methods Why match? The Rubiin causal model (1973, 1979) reminds us that we are always trying to get to the counterfactual: how would the world have turned out differently, if one factor changed but everything else remained the same. This is the same as asking what Y values would we have seen under the same social system X, with one treatment applied. It points our attention to making causal inferences not from widely disparate cases that vary on many factors, but from comparable cases that are alike in every important regard except that some receive the treatment and some the control. If we have otherwise comparable treatment and control groups, then our observational studies begin to look a bit more like randomized experiments. What is matching? Matching algorithms (such as nearest neighbor, genetic, etc.) find control group cases that look like the treatment group cases (except, of course, that they didn’t get treated. You can then either downweight the “incomparable” control cases (for instance, weighting by their propensity score) or throw them out of your analysis entirely (“pre-processing” your dataset). Either way, you will “improve the balance” between your treatment and control groups, making them look more similar on other variables and thus approaching what your dataset would look like in a randomized experiment. Advantages of Matching: Removes outlying control cases that fall outside the range of variation within your treatment group, and are thus not relevant counterfactual cases. This “lack of common support.” Reduces model dependence, because your results are now much less likely to vary by the assumptions that you make in order to put control variables into your multivariate model. Even if you enter these variables into a later model, matching has broken the link between these variables and your treatment, which means that your estimated effect of the treatment is much more robust to (less likely to be biased by) the parametric choices that you make regarding these other variables. Easy to interpret. Because researchers often simply match and then compare mean values of the DV in the treatment and control groups, this looks like an experiment and can be presented to a lay audience quite simply. Limits of Matching: A match is only as good as the covariates you match upon. It doesn’t solve the threat of omitted variables. How to Match in Practice: The easiest way now is to use R, call the Zelig package, and then use the MatchIt package (Gary King’s Shop, gking.harvard.edu/matchit) or genetic matching (Jas Sekhon’s program, (http://sekhon.berkeley.edu/matching/) library(foreign) > setwd("C:/Users/Thad/Documents/San Diego Files/Absentee Voters/San Diego 2008 Matching") > Matchtrd800ls <- read.dta("m300ls_trd800ls.dta") > dim(Matchtrd800ls) [1] 937 18 > library(MatchIt) > install.packages("MatchIt") > names(Matchtrd800ls) [1] "fid.precin" "consnum" [5] "pvbm" "pop2000" [9] "pcthisp" "pctasian" [13] "pctownerocchh" "mail" [17] "near.dist" "stratum" "consname" "pcturban" "pct18under" "adjac" "rv.totals" "pctblack" "pct65pl" "near.fid" > m.out <- matchit(mail ~ rv.totals + pcturban + pctblack + pcthisp + pctasian + pct18under + pct65pl + pctownerocchh, data = Matchtrd800ls, method = "genetic", replace = FALSE) Call: matchit(formula = mail ~ rv.totals + pcturban + pctblack + pcthisp + pctasian + pct18under + pct65pl + pctownerocchh, data = Matchtrd800ls, method = "genetic", replace = FALSE) Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean distance 0.889 0.014 0.081 0.875 0.988 0.871 rv.totals 159.214 603.254 139.474 -444.041 481.000 442.592 pcturban 43.424 89.523 27.552 -46.099 48.225 45.726 pctblack 3.765 4.032 6.329 -0.267 0.847 1.222 pcthisp 20.561 19.760 18.800 0.801 3.281 3.315 pctasian 4.909 7.581 8.779 -2.672 1.686 2.837 pct18under 25.029 23.868 9.389 1.161 1.760 2.583 pct65pl 13.744 13.534 11.160 0.211 1.635 2.116 pctownerocchh 74.809 65.261 27.087 9.547 5.722 10.003 eQQ Max distance 0.999 rv.totals 526.000 pcturban 100.000 pctblack 23.008 pcthisp 14.859 pctasian 16.393 pct18under 42.637 pct65pl 34.746 pctownerocchh 28.874 Summary of balance for matched data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean distance 0.889 0.101 0.205 0.788 0.903 0.788 rv.totals 159.214 453.505 163.975 -294.291 284.000 294.291 pcturban 43.424 44.833 45.507 -1.409 0.000 1.631 pctblack 3.765 3.777 6.885 -0.012 0.195 0.441 pcthisp 20.561 18.288 17.393 2.273 2.610 3.036 pctasian 4.909 4.381 6.333 0.528 0.325 0.770 pct18under 25.029 25.662 7.271 -0.633 0.409 0.799 pct65pl 13.744 13.656 9.376 0.089 0.859 1.033 pctownerocchh 74.809 75.196 19.982 -0.388 0.878 1.546 eQQ Max distance 0.993 rv.totals 497.000 pcturban 18.395 pctblack 7.002 pcthisp 20.665 pctasian 8.840 pct18under 4.096 pct65pl 3.796 pctownerocchh 13.915 Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance 10.03 8.608 9.555 0.654 rv.totals 33.72 40.956 33.507 5.513 pcturban 96.94 100.000 96.434 81.605 pctblack 95.58 76.973 63.901 69.566 pcthisp -183.61 20.476 8.405 -39.076 pctasian 80.22 80.741 72.860 46.076 pct18under 45.43 76.738 69.070 90.394 pct65pl 57.99 47.451 51.168 89.074 pctownerocchh 95.94 84.652 84.547 51.808 Sample sizes: Control Treated All 834 103 Matched 103 103 Unmatched 731 0 Discarded 0 0 Here is how we save the matched data, and then turn it into a Stata file (with unmatched cases thrown out): > output_trd800ls <- match.data(m.out, subclass="subclass") > write.dta(output_trd800ls, file="C:/Users/Thad/Documents/San Diego Files/Absentee Voters/San Diego 2008 Matching/output_trd800ls.dta") Before Matching 100.0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 80.1 74.8 71.1 43.4 25.8 25.0 20.6 19.2 13.7 13.0 4.95.6 3.84.6 VBM Precincts (Mean for 103 Precincts) Traditional Precincts (Mean for 495 Precincts) After Matching 100.0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 76.6 75.9 50.8 50.5 26.1 25.1 19.3 18.1 3.73.9 4.74.3 13.6 13.6 VBM Precincts (Mean for 101 Precincts) Traditional Precincts (Mean for 101 Precincts) Call: matchit(formula = mailblt ~ vap.whi.pct + vap.blk.pct + vap.asi.pct + vap.his.pct + vap.muloth.pct + reg.dem.pct + reg.rep.pct + reg.oth.pct + pct1824 + pct2534 + pct3544 + pct4554 + pct5564 + pct65pl + reg.male.pct + cdmargin + cdspend + admargin + adspend, data = data2000, method = "nearest", ratio = 3) matchit(formula = mailblt ~ vap.whi.pct + vap.blk.pct + vap.asi.pct + vap.his.pct + vap.muloth.pct + reg.dem.pct + reg.rep.pct + reg.oth.pct + pct1824 + pct2534 + pct3544 + pct4554 + pct5564 + pct65pl + reg.male.pct + cdmargin + cdspend + admargin + adspend, data = data2000, method = "nearest", ratio = 1) What to Do After You Match: Option #1: In the most commonly used approach, you compare levels of the DV in your treatment and control group either through an Average Treatment Effect (ATE) or an Average Treatment Effect on the Treated (ATT). See works by Imbens, the definition in Ho et al. and the application in Kousser and Mullin. Guido Imbens has the “match” package for Stata to calculate these (http://www.economics.harvard.edu/faculty/imbens/software_imbens) Option #2: In the approach recommended by Ho et al., you take your pre-processed dataset (the one that has thrown out all of the incomparable, unmatched cases) and then run multivariate models like you always would have done. If you don’t have perfect balance between the treatment and control groups (because you didn’t want to throw out too many cases, or because the world isn’t perfect), than estimating a model with controls is helpful. It is less potentially hurtful than if you hadn’t matched, because the specific parametric assumptions you make about the control variables will not dramatically affect your estimated treatment effect. regress turnpct mailblt vp_blk_p vp_s_pct vp_hs_pc vp_mlt_p rg_dm_pc rg_rp_pc rg_th_pc rg_ml_pc pct2534 pct3544 pct4554 pct5564 pct65pl cdmargin cdspend admargin adspend [weight=totreg] (analytic weights assumed) (sum of wgt is 2.9035e+06) Source | SS df MS -------------+-----------------------------Model | 315209.548 18 17511.6415 Residual | 144179.132 4569 31.5559491 -------------+-----------------------------Total | 459388.679 4587 100.150137 Number of obs F( 18, 4569) Prob > F R-squared Adj R-squared Root MSE = = = = = = 4588 554.94 0.0000 0.6862 0.6849 5.6175 -----------------------------------------------------------------------------turnpct | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------mailblt | -2.725629 .3794405 -7.18 0.000 -3.469515 -1.981742 vp_blk_p | -.2819971 .0122151 -23.09 0.000 -.3059447 -.2580496 vp_s_pct | -.0406101 .0104169 -3.90 0.000 -.0610323 -.0201879 vp_hs_pc | -.2364223 .0059486 -39.74 0.000 -.2480844 -.2247602 vp_mlt_p | -.9394277 .2440667 -3.85 0.000 -1.417916 -.460939 rg_dm_pc | .5404006 .0293499 18.41 0.000 .4828606 .5979406 rg_rp_pc | .4728693 .0289845 16.31 0.000 .4160456 .529693 rg_th_pc | -.5682599 .0582184 -9.76 0.000 -.682396 -.4541237 rg_ml_pc | -.1080187 .0163326 -6.61 0.000 -.1400384 -.075999 pct2534 | -.1708281 .022612 -7.55 0.000 -.2151585 -.1264977 pct3544 | .1431211 .0218624 6.55 0.000 .1002601 .185982 pct4554 | .3058852 .0235917 12.97 0.000 .259634 .3521364 pct5564 | .1913343 .0269946 7.09 0.000 .1384118 .2442568 pct65pl | .0522553 .0139891 3.74 0.000 .0248298 .0796808 cdmargin | .0293757 .006545 4.49 0.000 .0165443 .0422071 cdspend | -.0407063 .0463789 -0.88 0.380 -.1316314 .0502189 admargin | .0240662 .0073524 3.27 0.001 .009652 .0384803 adspend | -.0852189 .1133447 -0.75 0.452 -.3074293 .1369915 _cons | 35.49562 2.913364 12.18 0.000 29.78402 41.20722 ------------------------------------------------------------------------------ Tips: Ho et al. usefully note that “matching” should really be called “pruning,” because you aren’t basing any of your analyses on how one observation in the treatment group is paired with one or more specific observations in the control group. Jonathan Wand has a good reading list with links: http://www.stanford.edu/class/polisci353/2004winter/reading.html Elizabeth Stuart has a great website with links to various matching programs in R and Stata, http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html “Synthetic controls” are matching meets case studies. Suppose you’ve got only one treated case, and no perfectly comparable control case. You can use this approach to apply different weights to a number of control cases and then synthesize a control case that looks just like the treatment case on covariates.