Blending Propensity Score Matching and Logistic Regression in

advertisement
BLENDING PROPENSITY
SCORE MATCHING AND
LOGISTIC REGRESSION IN
SUPPORT SERVICE
EVALUATIONS
TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN
CAIR CONFERENCE
SAN DIEGO, NOVEMBER 2014
OUTCOMES
• Describe purpose of regression and propensity score
matching (PSM)
• Explain data requirements and procedures for regression
and PSM
• Compare and contrast regression and PSM
• Identify additional resources for further exploration
WHY CAUSAL INFERENCE
• If you need to use statistics, then you should design a
better experiment. –attributed to Rutherford
• Most education research is observational/correlational,
not experimental.
COMMON SCENARIO
Research question: Did participation in “intervention X” result
in better outcomes for students than would have happened
had they not participated?
•
Students self-selected to participate and/or were recruited to
participate.
•
Differences between participants and non-participants, can
reasonably be attributed to differences in background
variables or motivation.
•
Can we determine if the participation caused a change in
outcomes?
•
No, but…
REGRESSION
𝑦 = π‘Ž0 + π‘Ž1 π‘₯1 + π‘Ž2 π‘₯2 …+π‘Žπ‘› π‘₯𝑛 + 𝑒
• Classic correlational technique
• Covariates used in model to attempt to control for differences in
background variables or motivation
• Background variables can include measures of or proxies for skill
level, social capital, or socio-economic status
• Measures of self-motivation often unavailable
• Models are imperfect and generally must be combined with other
evidence to more completely describe the possible influence of an
intervention, program or strategy
PROPENSITY SCORE MATCHING
• One of several ways to create a matched comparison
group of non-participants intended to be similar to
participant group for a valid comparison
• Logistic regression or other techniques used to create a
score indicating the likelihood that a particular nonparticipant would have been a participant based on
similarity to one or more participants
• Resolves issue of matching on many dimensions
THE COUNTERFACTUAL
(POTENTIAL OUTCOMES) FRAMEWORK
FOR PSM
• Based on counterfactual theories of causation, which is
•
•
•
a set of conceptual tools for analyzing causal claims
Originated Early 70’s
Introduced by philosopher David Lewis
Subsequently taken up by scientists and extended
(Statistician Donald Rubin at Harvard in the 80’s, and my others
since)
In the following slides I will:
1) Introduce some of the notation developed in this area
2) Explain the logic of PSM
3) Alternative criteria for deciding between PSM/Reg.
TO START….
What does it mean for a program to make a difference in someone’s
outcome?
Easy:
βˆ† = π‘Œπ‘‘ − π‘Œπ‘
π‘Œ 𝑑 is the potential outcome under treatment.
(Treatment = participation = choosing to do x = ….)
π‘Œ 𝑐 is the potential outcome under control (non-treatment).
However, there is a problem here….
π‘Œ 𝑐 is counterfactual. It is not observed in our universe*.
*Recently published evidence for inflationary theories of the cosmos suggests
we may be living in a multiverse (Alan Guth, 2014). Counterfactual conditions
may obtain in alternate universes. I am not going to address the possibility of
information transfer between alternate realities, but that possibility is being
taken seriously by some.
THE POTENTIAL OUTCOMES MATRIX
Potential Outcome
π‘Œπ‘‘
Actual
Treatment
Status
(T)
1
0
= observable
= not observable
π‘Œπ‘
CAUSAL EFFECTS AT THE PROGRAM
AND POPULATION LEVELS
• Our focus is usually not on individuals, but just about
always on the aggregate effect – the average effect of
a program on the outcomes of groups of individuals or
populations.
AVERAGE TREATMENT EFFECT (ATE)
•
•
BRUTE FACTS:
•
Different people respond differently to treatment (differential
response)
•
These facts must be taken into account when modeling/
computing treatment effects.
•
This means all four cells of the matrix must be estimated in
order to obtain an average treatment effect!
Participants and non-participants differ systematically (w.r.t.
demographics, trajectories, risk profiles, self-selection, etc.)
• How?
• Here comes the assumptions….
CONDITIONAL INDEPENDENCE
(PERFECT STRATIFICATION)
(SELECTION ON OBSERVABLES)
• If our observations include information on every one of
the variables influencing likelihood of participation or
differential responses*, then it is possible to avoid
omitted variable bias and so achieve CI (PS, SOO). And
if we have CI, this is like randomized assignment ….
𝐸 π‘Œ 𝑑 𝑇 = 1, 𝑿 = 𝐸 π‘Œ 𝑑 𝑇 = 0, 𝑿
𝐸 π‘Œ 𝑐 𝑇 = 1, 𝑿 = 𝐸 π‘Œ 𝑐 𝑇 = 0, 𝑿
* How often do we encounter such datasets in institutional research?
SYMBOLIC DERIVATION OF
AVERAGE EFFECTS
ATE
1
βˆ† = π‘Œπ‘– 𝑑 − π‘Œπ‘– 𝑐
2
π‘Œπ‘– = 𝑇𝑖 π‘Œπ‘– 𝑑 − (1 − 𝑇𝑖 )π‘Œπ‘– 𝑐
3
𝐸[βˆ†] = 𝐸[π‘Œπ‘– 𝑑] − 𝐸[π‘Œπ‘– 𝑐]
4
𝐸𝑁 [π‘Œπ‘– |𝑇𝑖 = 1] − 𝐸𝑁 [π‘Œπ‘– |𝑇𝑖 = 0]
5
𝐸 βˆ† = 𝜌𝐸 π‘Œπ‘– 𝑑 𝑇𝑖 = 1 + 1 − 𝜌 𝐸 π‘Œπ‘– 𝑑 𝑇𝑖 = 0
−{𝜌𝐸 π‘Œπ‘– 𝑐 𝑇𝑖 = 1 + 1 − 𝜌 𝐸 π‘Œπ‘– 𝑐 𝑇𝑖 = 0 }
ATT
6
𝐸 βˆ† 𝑇𝑖 = 1 = 𝐸 π‘Œπ‘– 𝑑 𝑇𝑖 = 1 − 𝐸 π‘Œπ‘– 𝑐 𝑇𝑖 = 1
ATU
7
𝐸 βˆ† 𝑇𝑖 = 0 = 𝐸 π‘Œπ‘– 𝑑 𝑇𝑖 = 0 − 𝐸 π‘Œπ‘– 𝑐 𝑇𝑖 = 0
AND NOW WE PLAY THE MATCHING GAME.
ALSO KNOW AS….
May I please borrow your outcome?
Potential Outcome
π‘Œπ‘‘
Actual
Treatment
Status
(T)
Y(t)
1
0
π‘Œπ‘
Y(c)
= observable
= not observable
ESTIMATING TREATMENT EFFECTS
Potential Outcome
Actual
Treatment
Status
(T)
π‘Œπ‘‘
π‘Œπ‘
1
A
B
0
C
D
ATE = (A + B) – (C + D)
ATT = (A – B)
ATU = (C – D)
AVERAGE TREATMENT EFFECT (ATE)
• PSM offers a way to plug values into (5), (6) and (7) to
obtain unbiased estimates.
• However, if we have CI, then why not just something
like….
π‘Œπ‘– = 𝛼 + 𝛾𝑇𝑖 + 𝜷′𝑿𝑖 + πœ€π‘– ?
(and many other regression techniques)
There is no firm and fast answer to this question. Decisions
are based on pragmatic considerations.
However, there may revisit this question later….
EXAMPLE
PROS AND CONS
• Regressions can be “easier” to run but harder to explain to a general audience
• PSM can be more time consuming to conduct but easier to explain to a general
audience
• Regressions tend to perform better with large data sets while PSM tends to
perform better with few observations provided the non-participant group has
sufficient numbers of individuals with the key confounding variables
• Regressions have been used for many years and are well described
mathematically with broad consensus on proper error terms
• PSM is newer and there is not consensus on optimal matching procedures or
proper error terms
• Regression will use all cases with non-missing data while PSM may only a subset
of cases from the pool of non-participants
• All analytic methods suffer if key variables are not available
• Conclusions can often be the same with either method
HOW TO RUN PSM
• Create data file (95% of effort)
• Match participants and non-participants on a set of control variables to create a
comparison group with similar proportions on all characteristics (i.e. comparison
group would have a similar percent female, Hispanic, low income, etc. as
compared to the participant group)
• This step is referred to as “balancing” and generally must be repeated several times to
obtain balance on all variables of interest either by adjusting matching criteria or removing
variables
• Run comparative analyses, which can include simple t-tests, post-PSM regressions,
or other techniques
• Major packages that conduct PSM include STATA, R, and SAS
• STATA version 12 and older have psmatch2, v13 has teffects psmatch
• http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm compares the two packages
• Note SPSS/PASW does not do PSM directly but there is an R plugin for SPSS
• http://arxiv.org/ftp/arxiv/papers/1201/1201.6385.pdf
ALTERNATIVE PERSPECTIVES
• A criterion that can be applied to regression and PSM:
how do they perform at predicting new observations?
(false positives, false negatives, etc.)
• Regression and PSM methods can both be used as tools
of discovery
• SO: CHOOSE THE METHOD/MODEL WHICH YIELDS THE
SMALLEST PREDICTION ERROR.
• That may decide a battle in a particular setting (or
occasion), but the war between methods will go on…
FURTHER READING
•
Angrist, J. D., & Pischke, J. (2008). Mostly Harmless Econometrics: An Empiricists
Companion
•
Morgan, S., Harding, D. (2006) Matching Estimators of Causal Effects: From
Stratification and Weighting to Practical Data Analysis Routines
•
Caliendo and Kopeinig. 2005. Practical Guide for PSM
•
www.caliendo.de/Papers/practical_revised_DP.pdf
•
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in
observational studies for causal effects. Biometrika, 70, 41-55.
•
Padgett, R.; Salisbury, M.; An, B.; & Pascarella, E. (2010). Required, Practical, or
Unnecessary? An Examination and Demonstration of Propensity Score Matching Using
Longitudinal Secondary Data. New Directions for Institutional Research – Assessment
Supplement (pp. 29-42). San Francisco, CA: Jossey-Bass.
•
Soledad Cepeda, M.; Boston, R.; Farrar, J., & Strom, B. (2003). Comparison of Logistic
Regression versus Propensity Score When the Number of Events Is Low and There Are
Multiple Confounders. American Journal of Epidemiology, 158, 280-287.
•
http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html
THANK YOU
Terrence Willett
Director of Planning, Research,
and Knowledge Systems
Cabrillo College
terrence@cabrillo.edu
Craig Hayward
Director of Planning, Research,
and Accreditation
Irvine Valley College
chayward@ivc.edu
Nathan Pellegrin
Director of Institutional Research
Peralta District
npellegrin@peralta.edu
Download