Conference: “New Methods For Cohesion Policy Evaluation: Promoting Accountability And Learning” Warsaw, 30 November-1 December 2009 Five Directions to Improve the Applicability of Counterfactual Impact Evaluations to Cohesion Policies Discussion notes presented at the workshop “Rigorous impact evaluation using counterfactuals” Daniele Bondonio University Piemonte Orientale (Italy) Research Center for Evaluation Studies in the Public Sector (Ph.D Carnegie Mellon University) I) The data for Counterfactual Impact Evaluations (CIEs) of cohesion policies For Enterprise support and R&D support policies, undertaking credible CIEs requires ideally the availability of firm-level data with unique firm-identifiers that can be merged with the data on programs activity (number of and value of the incentives paid to each assisted firm). See for example Bondonio (2007, 2009) When available, establishment level data, are an optimal choice as they allow to avoid the possible measurement errors arising from the existing of multipleestablishments firms in which only certain establishments and not others where eligible for the program assistance. Besides the availability of firm-level or establishment-level data offered by national or regional statistical services, it’s crucial that the data on program activities do include not only the enterprise support or R&D support policies co-funded by the Structural Funds but also the entire spectrum of alternative enterprise support policies offered by national and/or regional legislations. This is because, in many regions where enterprise support was available by programs co-funded by the Structural Funds, a large number of alternative enterprise support policies is available from national or regional legislations (in Objective 2 areas such as the Piemonte Region of Italy, the number of enterprise support programs available from national and regional legislations in addition to those co-funded by the Structural funds is in excess of twenty). As a result, without acquiring the data concerning the incentive payments offered by the coexisting national or regional enterprise support programs, any counterfactual impact evaluations of enterprise support policies co-funded by the structural funds would be at high risk of not being reliable (as non-treated firms in the comparison group are at risk to be assisted by other enterprise support programs not observable in the analysis). To evaluate human capital investment policies, program activity data should be ideally linked with individual-level data maintained by national social security or revenue service agencies. The 1 availability of such data would allow to build credible control groups of individuals not affected by the policies to be evaluated but with similar characteristics of the treated individuals. To evaluate geographically targeted programs of any type (for example urban renewal policies), at the present stage, throughout the large majority of member states, the smallest current geographical units at which data are easily available are NUTS_IIIs, which do not match the boundaries of cities or of important assisted areas such as the Objective 2 areas. Attempting to apply CIE or urban renewal policies would require the availability of reliable and stable over-time data for smaller geographic units. While in the US such data are easily available through the Census Tracts, in the EU, stable-over-time data at smaller geographical levels (such as NUTS V) are instead, at the present time, very difficult to obtain and very unreliable for comparisons across times as they are based on city administration boundaries that changes too often. For these reasons, applying CIEs to geographically targeted policies requires carefully planning the evaluation design at the same time when the policy intervention is designed, so that the boundaries of the target areas are such that can be covered by the existing statistical sources within the different EU regions in which the programs are implemented. II) The importance of estimating different impact evaluation parameters when undertaking CIEs In cases when both the characteristics of the incentives and the pre-intervention observable characteristics of the treated units are all fairly homogeneous, to obtain policy relevant empirical evidence is sufficient to estimate the Average Treatment Effect on the Treated (ATT) parameter, which yields the average program impact of a single homogenous binary treatment variable. When, instead, the policy treatment has quite different economic values across the population of treated units, or when the treatment impact is expected to be different according to different preintervention observable characteristics of the treated units, a single average program impact estimate such as the ATT is often of less policy relevance. In such cases, the recent methodological developments in the CIE literature1 allow to retrieve policy relevant empirical evidence by estimating different ATTs for different subpopulations of the treated units and/or for different treatment categories. Estimating such different categorical impacts for important subgroups of the treated units is better achievable when the sample of treated units is sufficiently large (e.g. in the case of enterprise support and R&D support policies). In such cases, as also argued by other authors (e.g Bartik 2004), statistical CIE can offer some insights into why and how a program works, provided that sufficient variation in program designs is observed and measured. Applied to the case of enterprise support policies such categorical impacts can be estimated for different ranges of the economic value of the incentives and for the different types of assistance offered to firms (capital grants, soft-loans, technical assistance or infrastructure improvements). See Bondonio 2007 for an empirical estimation with firm-level data of the Piemonte region of Italy. 1 For example: categorical propensity score matching (PSM) or the use of PSM as a first stage processing for reducing model dependence. See Bondonio 2009. 2 For human capital investment policies, is often of great interest to estimate the distribution of counterfactual treatment effects represented, for example, by the proportion of treated units (considering either the entire population of the treated or each policy-relevant category of treated units) for whom the outcome with the treatment is greater or equal to the counterfactual outcome. A recent stream of the methodological literature has been devoted to impact identification strategies for such distributional effects (often expressed as quantile treatment effects) and has underlined the policy relevance of estimating impact results beyond mean effects on the treated (see for example: Abadie et al. 2002, Cheser 2003, Carneiro et al 2003, Chernozhukov and Hansen 2005, 2006). When feasible, estimating such distributions of treatment effects over different categories of individuals based on their relevant pre-intervention characteristics would considerably enhance the quality of the results that CIEs can offer. III ) Tailoring CIEs to the specific features of the programs and of the available data CIE works at his best when the specific empirical models to retrieve treatment effects are tailored to the specific features of the policies to be evaluated, the data available for the analysis and the features of the plausible causal links between the program intervention and the desired positive social outcomes to be achieved (as the most convincing CIE models are often developed selecting the control variables based on a credible theory outlying the causal links between the intervention and changes in the outcome variable of the evaluation). No single pre-determined and exact empirical models for CIE are to be advocated for each of the programs within the different sectors of cohesion policies. In evaluating enterprise support policies, for example, the appropriate empirical models for CIE could vary quite sharply depending on whether or not eligible firms sell their goods and/or services predominantly within the local markets in which they are located (e.g. craft enterprises). When firms serve predominantly local markets, the major threats to the validity of the analysis would come from changes (exogenous to the program incentives) that may occur in the different local economies in which the assisted and non-assisted firms are located. As a result, searching for geographical natural experiment conditions in which homogeneous communities are crossed by administrative borders determining the differences in the treatment status of firms would provide a great advantage over other established quasi-experimental techniques2 which would be more appropriate for the cases in which the assisted firms sell their goods and/or services on national or international markets. IV) A word of caution on applying CIE to macroeconomic and long-term-effects evaluations of enterprise support policies In many cases, policy makers do show interest in knowing program impacts on regional macroeconomic outcomes. In principle, any program of all sorts is somehow capable of affecting distant outcomes, such as macro-economic or long-run indicators of the well-being of residents measured at the level of the entire provinces, regions, in which eligible firms are located. In the cases, however, in which the economic importance of the group of assisted firms, compared to the size of the province/region/state economy in which they are located is very small, CIEs should be 2 Such as Propensity score matching, which would have to rely either on conditional independence assumption (CIA, i.e. selection into treatment is based on observables characteristics of firms’ local markets/communities) or on the hypothesis that all (or part of) the unobserved heterogeneity of firms’ local markets/communities are fixed effects (in case of Difference in Difference schemes applied to comparisons of firms outcomes), or at least fixed linear growth trends (in case of triple differencing schemes, see Bondonio 2009 for further details). 3 applied with caution. This is because, in such cases, any actual program impact (in the form of a positive impulse given to the province/region/state economy) would become virtually undetectable from the changes to the outcome variable of the evaluation caused by many confounding factors (including, in many cases, the presence of other business incentive programs) of a much greater importance than the possible program-induced improvements in the economic activity of the assisted firms. Using rigorous CIE designs to assess whether or not business incentives had long-lasting impacts on employment or economic activity outcomes of assisted firms is also to be done very cautiously Assisted firms are economic units embedded in many ways in a network of economic transactions from ones to the others. In the medium/long-run, a possible positive program impulse produced on the assisted firms employment or economic activity is likely to have enough time to generate subsequent impacts also on non-assisted firms. In this case, outcome data from non-assisted firms could not be considered anymore as unaffected by the program incentives and used to retrieve counterfactual estimates. V) Can randomized experiments have role in cohesion policies? Pilot programs with random assignment are difficult to be implemented for cohesion policies because of ethical and political difficulties in excluding some eligible firms or target areas from the incentives. For some type of policies (e.g. enterprise support policies and R&D support policies), however, such difficulties could be eased if the experimentation takes the form of random selection of firms for targeted marketing of the program. If such randomly assigned marketing efforts are strong enough, the result should be some sharp difference in the usage of the program incentives between the treated firms (those receiving the marketing efforts) and the control-group firms (those not receiving the marketing efforts). The results would be a source of variation in program usage that does not directly affect the outcome variable of the evaluation. A second possibility to ease-up the ethical and political difficulties associated with experiments is offered by delay-of-treatment randomization schemes in which the program intervention in the comparison group is delayed in time rather than being denied. References Abadie A., Angrist J., Imbens G. (2002), Instrumental Variables estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings, Econometrica 70, 91-117. Bartik T.J. (2004), Evaluating the impacts of local economic development policies on local economic outcomes: What has been done and what is doable, in: Evaluating local economic and employment development: How to assess what works among programmes and policies, OECD, Paris, 113-141. Bondonio D. (2009), Impact identification strategies for evaluating business incentive programs, POLIS Working Papers n. 145/2009 University Piemonte Orientale, Italy (http://polis.unipmn.it/pubbl/index.php?paper=2440). 4 Bondonio D. (2007), The Employment Impact of Business Incentive Policies: a Comparative Evaluation of Different Forms of Assistance, POLIS Working Papers n. 101/2007 University Piemonte Orientale, Italy (http://polis.unipmn.it/pubbl/index.php?paper=2052). Carneiro P., Hansen K. T., Heckman J.J. (2003), Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice, IZA Working Papers n. 767. Chernozhukov V., Hansen C. (2005), An IV Model of Quantile Treatment Effects, Econometrica 73, 245-261. Chernozhukov V., Hansen C. (2006), Instrumental Quantile Regression Inference for Structural and Treatment Effect Models, Journal of Econometrics 132, 491-525. Cheser A. (2003), Identification in Nonseparable Models, Econometrica 71, 1405-1441. 5