1 Chapter 3: Empirical Tools of Public Finance Introduction Empirical public finance is the use of data and statistical methodologies to measure the impact of government policy on individuals and markets. For example, helps us figure out the magnitude of the labor supply response from a TANF benefit cut. Key issue in empirical public finance is separating causation from correlation. Correlated means that two economic variables move together. Casual means that one of the variables is causing the movement in the other. This lesson overviews the kinds of methods that economists rely on to learn about the causal effects of government policy. DISTINCTION BETWEEN CORRELATION AND CAUSATION There are many examples where causation and correlation get confused. It is critical for government policy to understand the difference; otherwise policy may not have the intended impact. One interesting example is about Russian peasants. There was a cholera epidemic. Government sent doctors to the worst-affected areas to help. Peasants observed that in areas with lots of doctors, there was lots of cholera. Peasants concluded doctors were making things worse. Based on this insight, they murdered the doctors. Another example concerns SAT preparation courses. In 1988, Harvard interviewed its freshmen and found those who took SAT “coaching” courses scored 63 points lower than those who did not. One dean concluded that the SAT courses were unhelpful and “the coaching industry is playing on parental anxiety.” 2 The Problem In both examples, there is a common problem: an attempt to interpret a correlation as a causal relationship, without sufficient thought to the underlying data generating process. For any correlation between two variables A and B, there are three possible explanations for a correlation: 1. A is causing B. 2. B is causing A. 3. Some other factor is causing both (XA & B) 4. Some other factors is causing A and B (XA and YB) Ex1: In the case of Russian peasant, the possibilities might be: 1. Doctors cause peasants to die from cholera through incompetent treatment. more Dr more Peasant death 2. Higher incidence of death caused more physicians to be present. more Peasant death more Drs Peasants thought possibility (1) was correct. Ex2: In the Harvard SAT case, the possibilities could be: (1) SAT prep worsen preparation for the SATs. (2) Those with poorer test taking ability take prep courses to try to catch up. Or would be failing students took the prep courses. Poor performers take SAT prep (3) Those who are generally nervous both like to take prep courses and do the worst on standardized exams. Nervousness taking SAT prep Nervousness perform poorly Harvard dean thought possibility (1) was correct. Although the peasants or the Harvard dean could actually be correct, odds are they are misinterpreting the underlying process at work. For policy purposes, what we care about is causation. Knowing that two factors are correlated gives you no predictive power. 3 MEASURING CAUSATION WITH THE DATA WE’D LIKE TO HAVE: RANDOMIZED TRIALS The “gold standard” of causality is a randomized trial. The trial proceeds by taking a group of volunteers and randomly assigning them to either a “treatment” group that gets the intervention, or a “control” (or comparison) group that is denied the intervention. With random assignment, the assignment of the intervention is not determined by anything about the subjects. As a result, the treatment group is identical to the control group in every facet but one: the treatment group gets the intervention. In the SAT example, o the “treatment” group members are those who took the coaching course; o the “control” group members are those who did not. In the Russian peasant example, o the “treatment” group were communities where doctors were assigned, o the “control” group were communities where doctors were not assigned. The Problem of Bias In both cases, the assignment of the intervention was not random. This means the treatment and control groups are not identical. Non-random assignment, in turn, could cause bias. Bias represents any source of difference between treatment and control groups that is correlated with the treatment, but not due to the treatment. o In the SAT example, the impact of SAT courses is biased by the fact that those who take the prep course are likely to do worse on the SAT for other reasons. o In the Russian peasant example, the estimates are biased by the fact that the government assigned doctors to the worst-off communities. By definition, such differences do not exist in a randomized trial, since the groups are not different in any consistent fashion. As a result, randomized trials have no bias, and it is for this reason they are the “gold standard” for empirically estimating causal effects. 4 Randomized Trials in the TANF Context In the last lesson, we learned that economic theory predicts increases in labor supply when TANF benefits are cut, but the magnitude of the effect is unclear. One could design a randomized trial to learn about the elasticity of employment with respect to TANF benefits. Imagine a large group (say, 2000) of single mothers were randomly assigned to one of two groups with a coin flip: o The “control” group continues to receive a guarantee of $5,000. o The “treatment” group now has their TANF benefit cut to $3,000. Follow groups for a period of time, and measure the work effort. In an experiment like this in California in 1992 o the elasticity of employment w.r.t welfare benefits was estimated to be -0.67. o Thus, a 10% decrease in benefits resulted in a 6.7% increase in employment. Why We Need to Go Beyond Randomized Trials Randomized trials present some problems: 1. They can be expensive. 2. They can take a long time to complete. 3. They may raise ethical issues (especially in the context of medical treatments). 4. The inferences from them may not generalize to the population as a whole (valid only for volunteered people). 5. Subjects may drop out of the experiment for non-random reasons, a problem known as attrition. For these reasons (especially the first one about randomized trials being expensive), economists often take different approaches to try to assess causal relationships in empirical research. 5 ESTIMATING CAUSATION WITH THE DATA WE ACTUALLY GET: OBSERVATIONAL DATA Often researchers are faced with observational data, data generated from individual behavior observed in the real world. For example, the data from the SAT example would consist of data on which Harvard freshmen took the coaching course, along with their SAT scores. The goal of this section is to review the kinds of approaches researchers use to estimate causal effects with data like this. There are five main approaches: a. Time series analysis b. Cross-sectional regression analysis c. Quasi-experiments d. Structural modeling e. Panel data Time Series Analysis Time series analysis documents the correlation between the variables of interest over time. For example, could gather data over time on the TANF income guarantee, and compare that to the labor supply of single mothers over time. Figure 1 illustrates these trends. 6 Figure 1 reveals that real benefits have declined dramatically over time, while average hours have risen substantially. Apparently supports the theory that TANF benefit cuts should increase labor supply. Problems: 1. Two sub-periods (1968-1976, and 1978-1983) show negative effect on labor supply, or zero effect. o Highlights difficulty that when there is a slow moving trend (benefit declines), it is very difficult to infer causal effect of this on another variable. 2. There are many potential explanations for the changes, too, such as: o Greater acceptance of women in workplace. o Better child care options. o Changes in social norms about working. o Other government program like the earned income tax credit. EITC is a federal wage subsidy to low income people o Economic growth. Thus, in many cases time series analysis is not all that useful. But if there are sharp changes in a policy variable over time, then there may be some room for valid inference. o Cigarette price war in April 1993. o Tobacco settlement in 1998. Figure 2 illustrates these trends. 7 Cross-Sectional Regression Analysis Cross-sectional regression analysis is a statistical method for assessing the relationship between two variables while holding other factors constant. “Cross-sectional” means comparing many individuals at one point in time. Bivariate regression is a means of quantifying the extent to which two series covary. For example, Figure 3 shows this with TANF benefit cuts and labor supply. Figure 3 Labor Supply (hour/year) 1,000 The downward sloping line shows the negative correlation between benefits and labor supply. The slope of the line is -0.2(=∆L/∆G = 90/450 meaning that a $1 reduction in TANF leads to +0.2 hours of work. The other person works 90 hours when faced with $4,550 in benefits. One person works zero hours when faced with $5,000 in benefits. 90 0 5000-98*10*.5= 4,550 5,000 TANF benefits received Regression analysis takes the correlation further by finding the line that best fits the data. It describes the relationship between the dependent variable (in this case, labor supply), and the independent variables (in this case, TANF benefits). With 2 data points, the line fits perfectly. In real data, there will be many more individuals. 8 For example, the Current Population Survey (CPS) collects information on sources of income, hours of work, and health insurance. Figure 4 graphs hours of work per year on the y-axis for all single mothers in the CPS data set against TANF benefits on the x-axis. The “best fitting” line has a slope of -110. Can convert this into an elasticity: L= a + blnG o b = dL/dlnG = ∆L/%∆G so b/L = %∆L/%∆G o elasticity = %∆L/%∆G = b/L o L=Average work effort is 748 hours per year, b = -110 o 100% rise in TANF leads to a 15% reduction in hours worked (-110/748) = -0.147. o Thus the elasticity of work with respect to benefits is -0.15, a fairly inelastic response. 9 Technical This line corresponds to the regression: HOURS i TANFi i Where there is one observation for each mother “i”. In the regression, α is the constant term, β is the slope coefficient, and ε is the error term. ε represents the difference for each observation between its actual value and its predicted value on the regression line. Interpreting the results is potentially problematic. 1. One interpretation is that higher TANF benefits “cause” lower labor supply. 2. Another interpretation is that single mothers with a greater “taste” for leisure get higher TANF benefits due to the tax rate. - These people wouldn’t work much even if TANF were not available. - Working mothers automatically get lower benefits. => Higher taste for leisure => TANF↑ and not the other way around Figure 3, for example, showed that the relationship may not be causal. Instead, preferences may differ. Less obvious in Figure 4, since we do not know the underlying utility functions of the CPS respondents One advantage of regression analysis is the ability to include control variables, that is, other independent variables that may affect the dependent variable (in addition to TANF benefits). Control variables are included to account for differences between treatment and control groups that can lead to bias. - i.e., we might be able to include a control variable for “taste for leisure.” - Including these control variables allows us to reduce the systematic differences between different groups. In reality, however, the kinds of control variables that exist in typical data sets like the CPS are crude at best. “Taste for leisure” might be proxied with education, number of kids, age, etc. 10 Technical Adding control variables changes the regression: HOURS i TANFi CONTROLi i Where the control variables account for race, education, age, and location. As the Appendix shows, many of these control variables are “yes/no” indicator variables. In most applications, including this one, it is unlikely that control variables will ever completely solve the problem. Thus, it’s difficult to get rid of the bias totally. Economists typically cannot set up randomized trials for many public policy discussions. Yet, the time-series and cross-sectional approaches are often unsatisfactory. Quasi (or natural)-Experiments Quasi-experiment considers policy reform itself as an experiment and tries to find a naturally occurring comparison group that can mimic the properties of the control group in the properly designed experiment context. Quasi-experiments are changes in the economic environment that create roughly identical treatment and control groups for studying the effect of that environmental change. o This allows researchers to take advantage of randomization created by external forces. o Find a naturally occurring comparison group that can mimic the properties of the control group. Let outside forces do the randomization for us. In some cases, the situation happens naturally. o Suppose, for example, that Arkansas cut its TANF benefit by 20% in 1997, and that we had a large sample of single mothers in Arkansas in 1996 and 1998. o At the same time, imagine that Louisiana’s benefits remained unchanged. 11 In principle, the alteration in the states’ policies has essentially performed our randomization for us. o The women in Arkansas who experienced the decrease in benefits are the treatment group. o The women in Louisiana whose benefits were unchanged are the control. o By computing the change in labor supply across these groups, and then examining the difference between treatment (Arkansas) and control (Louisiana), we can obtain an estimate of the impact of benefits on labor supply that is free from bias. Imagine we simply studied single mothers in Arkansas alone. In the “experiment” single mothers in 1996 are the control group, and those in 1998 are the treatment group. HOURSAR,1998 - HOURSAR,1996 = Treatment effect + Bias from economic boom In practice, this comparison runs into the criticisms that confront us with time series analysis. o i.e., the national economy was growing exceptionally fast during this period. This contains both the treatment effect and the bias from the economic boom. To fix this problem we compare the treatment group for whom the policy changed to a control group for whom it did not. Single mothers in Louisiana did not experience the TANF cut, yet benefit from the growth in the economy. The treatment group for whom the policy changed: Arkansas HOURSAR,1998 - HOURSAR,1996= Treatment effect + Bias from economic boom The control group for whom the policy did not change: Louisiana HOURSLA,1998 - HOURSLA,1996 = Bias from economic boom By subtracting the change in hours of work in Louisiana from that in Arkansas, we control for the bias caused by the economic boom. HOURS AK ,1998 HOURS AK ,1996 HOURS LA,1998 HOURS LA,1996 = Bias from economic boom 12 Assumes that in the absence of the reform hours of work of single mothers in Arkansas would have changed by the same amount as in Louisiana. The method relies on two important assumptions: o Common time effects across groups o No systematic composition changes in each group We examine single mothers in the neighboring state of Louisiana, in the bottom panel of Table 1. Using Quasi-Experimental Variation Arkansas 1996 1998 Difference Benefit Guarantee $5,000 $4,000 -$1,000 Hours of Work Per Year 1,000 1,200 200 We can gather the same kind of data for Louisiana. Louisiana Benefit levels did not fall in Louisiana. 1996 1998 Difference Benefit Guarantee $5,000 $5,000 $0 Hours of Work Per Year 1,050 1,100 50 But labor supply still increased, perhaps due to the growing economy. It appears that 50 hours of the 200 hour increase was due to economic conditions, not TANF. In Arkansas while benefits fell by 20%, hours of work increased by 20%. HOURESAK, 1998 - HOURESAK, 1996 = (1200 – 1000) =200 (20% rise) o => elasticity of labor supply w.r.t. benefits levels = -1. o larger than the -0.67 elasticity estimate found in the randomized trial in California. There is likely to be bias in this “first-difference,” because there was major economic growth during this period. o Thus, single mothers in Arkansas may have increased their work effort even if TANF benefits had not fallen. This approach yields the difference-in-difference (DID) estimator – the difference between the changes in outcomes for the treatment group that experiences an intervention and a control group that does not. 13 The difference-in-difference estimator is: HOURS = AK ,1998 HOURS AK ,1996 HOURS LA,1998 HOURS LA,1996 (1200 – 1000) – (1050 – 1100) =150 hr These net out the bias from the growing economy. o Thus, the causal effect of TANF benefit cuts would be a 150-hour increase in labor supply. o The implied elasticity of hours w.r.t. welfare benefits = 0.75 elast. =%∆l/%∆G = (150/1000)/(1000/5000) = 15%/20% Similar to that of California Note that the cross-sectional analysis would suggest that the reduction in welfare benefits leads to a 100-hour increase in work, Technical The analogous difference-in-difference regression would be: H ijt 1 Bijt 2Yijt 3 Bijt Yijt ijt In this case, H is hours of work, B is a dummy for the state with a TANF benefits cut, and Y is a dummy for post-1997. The coefficient β3 represents the causal effect of the policy change. The subscripts i, j, and t indicate individual i in state j in time period t. Problems with quasi-experimental analysis It is possible that the two states experienced different time effects - i.e., different economic growth rates (this problem can be dealt with DDD estimator see HW problem “A question in Quasi Experiment”) - the economic boom can affect the two regions differently More generally, single mothers may be different across states. We can never be completely certain that we have purged the treatmentcontrol comparisons of bias. 14 Structural Modeling Both randomized trials and quasi-experiments suffer from two drawbacks: 1) They only provide an estimate of the causal impact of a particular treatment. It is difficult to extrapolate beyond the changes in policy. 2) The approaches often do not tell us why the outcomes change. For example, the approaches do not separate out income and substitution effects in the TANF example. The former approaches provide reduced-form estimates only, which leaves much of the economic theory in a “black box.” Structural estimation attempt to estimate the underlying parameters of the utility function. This allows a more thorough exploration of economic responses. These structural models are more difficult to estimate, and tend to rely on the same set of limited information as the reduced form models. Structural models essentially assume an explicit form to the utility function, but if the form is incorrect, the policy conclusions could be wrong. The approach of the text will be to rely on reduced form estimates because it is easier to think about and explain.