Responses re: causal inference

advertisement
Responses to Concerns about Causal Inference for
Thomas Dietz , Kenneth A. Frank, Cameron Whitley, Jennifer Kelly, Rachel Kelly.Political
Influences on Greenhouse Gas Emissions from U.S. States. Proceedings of the National
Academy of Sciences. June 15 2015.
Some reporters raised concerns about the causal inferences we made from Table 1.
Here are some responses:
By analogy: do you trust your car to accelerate?
We can make the case for causal inference by the following analogy.
You are driving and come to an intersection with a traffic light where you want to make a left
turn. You wait for traffic to clear, step on the accelerator, and then go through. Could you have
proved that the car would react and accelerate uniformly quickly when you stepped on the
accelerator? Lots of things could have happened: it could have backfired (I had an old car that
did this to me), the accelerator pedal may have stuck, you could have run out of gas. And yet the
car’s response is consistent enough to warrant your inference that you will get through the
intersection – the evidence, while not proof, was actionable. And how many times would it have
to fail for you to change your mind and never rely on the accelerator again?
Just so, we believe our evidence is strong enough to be actionable, to be the basis of the
inference that environmentalism reduces CO2 emissions: 44% of the estimate would have to be
due to bias to invalidate the inference. We might be wrong, but there would have to be a lot of
misfires to reduce the evidence below our threshold for inference.
Cross-sectional analysis.
The first basis of our inference was a state by state comparison at 1990. This seems to be the one
to which your reader is responding below. We acknowledge that there could be an alternative
explanation such as having fossil fuel production in the state. But we note that to invalidate our
inference the alternative explanation would have to account for 44% of the estimated effect of
environmentalism on CO2 emissions, and that is net of the other factors we controlled for
(population, employment and gross state product). The strongest alternative explanation we
found was % women in the legislature, which accounted for about a 15% change in our
estimated effect. So the alternative fuels explanation would have to be about 2.5 times stronger
than the strongest alternative explanation we could find and measure to invalidate our inference.
So we can acknowledge that some of our estimate may be due to alternative factors, but make the
inference that environmentalism reduces CO2 emissions.
spreadsheet for calculating indices [KonFound-it!]
The more sophisticated techniques referred to by your reader one might consider are propensity
score matching or instrumental variables. Regarding propensity score matching (in which you
first model assignment to a treatment, such as environmentalism, and then account for that
assignment in a model): it all boils down to what you controlled for (Heckman, 2005; Morgan &
Harding, 2006, page 40; Rosenbaum, 2002, page 297; Shadish et al., 2002, page
164). Regarding instrumental variables (in which you first model assignment to treatment and
then use the predicted value of assignment in a model – it’s an alternative instrument to measure
the predictor): they only work well when there is a strong instrument, otherwise standard errors
can be severely inconsistent (see Wooldridge, 2002, page 102). We could not identify any such
instrument in our data.
Moreover, there is very recent work in education and the social sciences that compares results of
randomized experiments with observational (correlational) studies by randomly assigning
subjects to one or the other type of study and then comparing results. They conclude that choice
of covariates is critical (propensity scores work about the same as regression based techniques
that we used, but instrumental variables perform very poorly). See references below.
Longitudinal analysis. The literature I have cited below all points to the fact that covariates are
important, and the best covariates are measures of the outcome measured at previous points in
time which allow you to model change in an outcome over time. True, our analysis of the effect
of environmentalism at 1990 is cross-sectional, but the causal inference in the paper is also based
on a longitudinal analysis in which we are modeling change over time in CO2 emissions for each
state. This is the level 1 analysis in the Table 1. And in this analysis environmentalism again
has a strong effect, reducing the change in CO2 output over time. Now the alternative
explanation of fossil fuel production is more difficult to make. In this context we would accept
that having fossil fuel plants may quite likely create increases in CO2 over time – but that a
strong environmentalism orientation reduces this tendency, or any other tendency the state has in
its trajectory that was initiated in 1990. Moreover, this inference is even more robust to
alternative explanations than the cross-sectional analysis. 60% of the estimate would have to be
due to bias to invalidate the inference of an effect of environmentalism on change in CO2 over
time (we did not report this sensitivity analysis in the paper, choosing the more conservative but
still persuasive sensitivity analysis from the cross-sectional data in 1990, which we thought
would also be more intuitive). This 60% is also healthy enough to cover some of the concerns
about the measurement of our variables.
spreadsheet for calculating indices [KonFound-it!]
So the upshot is that we acknowledge there could be alternative explanations to our findings, but
those explanations would have to be extremely powerful, far more so than anything we could
find in our data, to invalidate our inference.
On the importance of covariates
– the quote below is from one of the most preeminent social scientists on causal inference,
Thomas Cook:
Results are similar across the samples of studies reviewed with their wide range of nonexperimental designs and topic areas. Covariate choice counts most, unreliability next most, and
the mode of data analysis hardly matters at all. Unreliability has larger effects the more important
a covariate is for bias reduction, but even so the very best covariates measured with a reliability
of only .60 still do better than substantively poor covariates that are measured perfectly. Why
regression methods do as well as propensity score methods used in several different ways is a
mystery still because, in theory, propensity scores would seem to have a distinct advantage in
many practical applications, especially those where functional forms are in doubt.
Abstract from: Cook, T. D., P. Steiner, and S. Pohl. 2010. Assessing how bias reduction is
influenced by covariate choice, unreliability and data analytic mode: An analysis of different
kinds of within-study comparisons in different substantive domains. Multivariate Behavioral
Research 44(6): 828-47.
Other supporting evidence.
Berk R (2005) Randomized experiments as the bronze standard. J Exp Criminol 1:417–433
Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison
between a
true experiment and a regression discontinuity quasi-experiment. J Exp Criminol 6:191–208.
‘‘the results
from the two approaches are effectively identical’’ page 191.
Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal
inference from an observational study: Results of a within-study comparison. Educational
Evaluation and Policy Analysis, 31(4), 463-479.
Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational
studies, and the hierarchy of research designs. New England Journal of Medicine, 342(25), 18871889.
Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and
observational studies produce comparable causal estimates: New findings from within-study
comparisons. Journal of Policy and Management. 27 (4), 724–750.
Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield
accurate answers? A randomized experiment comparing random to nonrandom assignment.
Journal of the American Statistical Association, 103(484), 1334-1344.
Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of
reliable covariate measurement in selection bias adjustments using propensity scores. Journal of
Educational and Behavioral Statistics.
Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010). The importance
of covariate selection in controlling for selection bias in observational studies. Psychological
Methods. Volume 15, Issue 3. Pages 250-267. more than just pretest.
Kane, T., & Staiger, D. (2008). Estimating Teacher Impacts on Student Achievement: An
Experimental Evaluation. NBER working paper 14607.
35. Kane, T., & Staiger, D. (2008).
Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random
Assignment in Evaluations of School Choice? A Within‐Study Comparison." Journal of Policy
Analysis and Management 31, no. 3 (2012): 729-751.
Reports 64% to 96% reduced with pre-test
Other references:
Heckman, J. (2005). The Scientific Model of Causality. Sociological Methodology, 35, 1-99.
Morgan, S. L. & Harding, D. J. (2006). Matching estimators of causal effects: Prospects and
pitfalls in theory and practice. Sociological Methods and Research 35, 3-60.
Rosenbaum, P. 2002. Observational Studies. New York: Springer.
(Shadish, Cook, & Campbell, 2002)
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental
Designs for Generalized Causal Inference. Boston, NY: Houghton Mifflin.
Wooldridge, Jeffrey M. Econometric Analysis of Cross Section and Panel Data. Cambridge,
Ma: MIT Press.
Download