A New Approach to Evaluating Public Policy Advocacy: Creating Evidence of Cause and Effect Matthew Carr, Ph.D. Marc Holley, Ph.D. Walton Family Foundation March 2013 Paper Prepared for the 38th annual meeting of the Association for Education Finance and Policy **Corresponding author: Matthew Carr (mcarr@wffmail.com) DRAFT: PLEASE DO NOT CITE WITHOUT PERMISSION Abstract There currently exists a significant disconnect between researchers and practitioners around whether, when, and how to measure and evaluate appropriately the performance and influence of non-profit policy advocacy organizations. Current approaches do not provide a feasible solution for evaluators and practitioners who need to determine the relative effectiveness of multiple advocacy strategies or organizations. In this paper we argue that traditional social scientific evaluation techniques (logic models, output and outcome measurement) grounded in postpositivist theory can improve upon existing advocacy evaluation models and allow for rigorous and objective assessment of the influence of advocacy organizations. Along with a more theoretical discussion of the trade-offs and implications of current approaches to advocacy evaluation, we also include specific direction about how to conduct evaluations using a new model that combines the strengths of existing approaches. Examples of the new model are provided, with a focus on state-level K-12 education reform advocacy efforts. Ultimately, presentation of this new approach aims to advance the conversation among researchers and inform the practice of evaluators and policy advocates. DRAFT: PLEASE DO NOT CITE WITHOUT PERMISSION Introduction A growing number of philanthropic organizations are shifting the focus of their giving from more traditional service delivery projects (such as a tutoring program for struggling readers) to funding groups that engage in public policy advocacy1 (for example, to amend the federal No Child Left Behind Act) (Teles & Schmitt, 2011; Beer, 2012). As Coffman (2008) writes: “Foundations trying to better leverage their influence and improve their impact increasingly are being urged to embrace advocacy and public policy grantmaking as a way to substantially enhance their results and advance their missions.” There are two primary rationales for funding advocacy projects over direct services: 1) The expected scale of impact from a government program is expected to be greater than one could expect from a discrete program (for example, funding a preschool program for low-income children in a local neighborhood versus advocacy to approve federal funding for universal preschool nationwide) (Greene, 2005); and 2) There is a hope that government policy can mitigate or ameliorate the underlying conditions that lead to the necessity for discrete programs in the first place (government anti-poverty programs might obviate the need for the preschool program in the local neighborhood). In short, achieving large-scale social change is now seen as beyond the ability of private funders alone, requiring engagement in the public policymaking arena. At the same time that some foundations are shifting the focus of their resource allocations toward advocacy, there is also a cultural change occurring around how they use data and evaluation to shape organizational decisionmaking more generally. Wales (2012) writes: “In recent years, the philanthropic sector has neared consensus on the need to improve measurement It is important to note that in this paper “advocacy” refers to the work of non-profit 501(c)(3) organizations, such as grassroots organizing, public education campaigns, conducting and disseminating policy research, or media relations. It does not include organizations that conduct direct lobbying activities. 1 1 and evaluation of its work.” There are a number of reasons for this shift, including the desire on the part of many foundations to be more strategic in their giving so as to maximize the effectiveness of their giving, to show impact through rigorously collected and analyzed evidence, or to hold grantee organizations more accountable for achieving stated goals (Brest, 2012). While there are a few vocal opponents of these efforts to use more data in decisionmaking and accountability (e.g. Shaywitz, 2011), many philanthropies are moving quickly in the direction of using more rigorous measurement strategies to increase accountability concerning the performance of grantees and themselves in creating social impact. Toward this end, some foundations have become increasingly sophisticated in their use of evaluation to determine the effectiveness of traditional service delivery programs they fund (see for example the Gates, Broad, Kellogg, Annie. E. Casey, or Robin Hood foundations). Assisting in this development is the fact that most of the basic methods of social science apply readily to questions of whether a particular group of service recipients benefitted from a particular program (from basic participant surveys all the way to randomized controlled trials). As Reisman et al. (2007) note: “The general field of evaluation offers an extensive literature that provides both theoretical and practical examples of how social scientific inquiry can be applied to outcome measurement for an array of programs, interventions, and initiatives.” One can readily find textbooks on program evaluation (Rossi et al., 2004), guidebooks for evaluators (Gertler et al., 2011), guides specifically designed around philanthropic evaluation (Gates Foundation, 2010, Kellogg Foundation, 2005), and practitioner toolkits (Council of Nonprofits, 2012). In addition, there are numerous consulting firms that specialize in the evaluation of service delivery programs (RAND, Mathematica, Westat, etc.). In many ways, it has never been easier for a 2 foundation or nonprofit to evaluate rigorously the impact of a traditional service delivery program on the participants it serves. Unfortunately, the same level of sophistication and availability of tools for foundations and other practitioners do not yet exist to evaluate public policy advocacy projects. In 2005 Guthrie et al. noted: “There is no particular methodology, set of metrics or tools to measure the efficacy of advocacy grant making in widespread use. In fact, there is not yet a real ‘field’ or ‘community of practice’ in evaluation of policy advocacy.” Two years later Reisman et al. (2007) summarized the state advocacy evaluation: “The field of evaluation in the area of advocacy and policy change is nascent, offering very few formally written documents that relate to approaches and methods. To date, when evaluation of advocacy and policy work has even occurred at all, efforts can be best characterized as attempts, or even missteps.” This lack of established norms for advocacy evaluation has led to confusion among foundations and practitioners around when and how to measure and evaluate appropriately the performance and influence of non-profit policy advocacy organizations. Exacerbating this confusion is significant disagreement among a number of scholars, funders, and practitioners about whether such evaluations even can, or should, be conducted in the first place. Given the state of the advocacy evaluation field, it is unsurprising that a 2008 survey produced for the Annie E. Casey Foundation and The Atlantic Philanthropies found that only 24.6% of 211 respondent nonprofit organizations that conduct advocacy reported that their work had been formally evaluated. When respondents were asked what challenges they faced in communicating advocacy success, two of the top three responses were a lack of knowledge about how to measure success (29.6%) and lack of internal capacity to conduct evaluation (14.2%) (Innovation Network, 2008). These results suggest two clear obstacles to getting more nonprofits 3 to engage in advocacy evaluation: 1) Greater clarity around when and how to conduct such evaluations; and 2) resources that reduce the burden on advocacy practitioners and funders for carrying out evaluation. Our goal is to provide guidance on both of these issues. In this paper we identify and review the various theoretical arguments around whether and how best to evaluate public policy advocacy, identifying the strengths and weaknesses of each. This is the first such attempt to create a typology that categorizes systematically the existing theoretical perspectives on this subject. Based on this review, we then present a new model of advocacy evaluation that builds on the strengths of previous work to provide specific direction to funders and practitioners about how to conduct prospective advocacy evaluations that are objective, reliable, rigorous, and cost-effective. In particular, our model establishes a series of specific, measurable performance goals, connected to an explicit theory of policy change, that can be used by any foundation or nonprofit engaged in advocacy work, even if it only has limited evaluation capacity. This new model is focused on state-level K-12 education reform advocacy efforts. As such, the specifics of the model are designed for evaluating policy advocacy in that area. But, the principles of the approach are more broadly applicable to other policy areas. The policy preferences described in our model represent one example, but others could be pursued. We provide specific examples of how the model can be used at the end of the paper. Advocacy Evaluation – Four Dominant Schools of Thought Based on our review of the extant literature on the evaluation of advocacy efforts, we have identified four general belief systems about whether and how to best conduct evaluations of public policy advocacy work: Nihilists, Anthropologists, Constructivists, and Post-positivists. We review the basic tenets of each perspective, along with the strengths and weaknesses of each 4 in turn. Table 1 below provides a quick overview of the key differences between the various perspectives. In particular, the categorization of each approach depends on its perspective regarding the need to apply formal structures to the planning of an advocacy program on the front end and to the implementation of evaluation methods on the back end. Additional details on the distinguishing features of the perspectives are provided in each section below. Table 1: Overview of Differences between the Four Perspectives Evaluation Phase Paradigm Planning Conducting N/A N/A Nihilist No Structure No Structure Anthropologist Structure No Structure Constructivist Structure Structure Post-Positivist Nihilists – Do not create plans, do not conduct evaluations The nihilist view of advocacy evaluation is generally represented by advocacy practitioners (though it is held by others as well, for example see Cutler, 2012 or Shambra, 2011) and comprises two central arguments. Reisman et al. (2007) summarized both when they reported that one of the factors making advocacy evaluation difficult “is the belief among some advocacy organizations that their work cannot be measured and that any attempt to do so diminishes the power of their efforts.” The first part of the nihilist view is that advocacy work is rooted in subtle human interactions that cannot be captured by traditional evaluation tools. For example, they argue that there are no tools available to social scientists to measure the planting of an idea in the mind of a policy influencer, or the reshaping of the definition of a public problem to make it more amenable to a particular, favored policy solution. In short, neither quantitative nor qualitative methods can provide the evaluator with the data needed to judge whether an advocate was successful or not. 5 The crux of the nihilist’s second position is that the act of planning and conducting evaluation inherently reduces the effectiveness of advocates. They argue that the nature of advocacy requires a high level of flexibility, and that evaluation (at least using traditional methods) interferes with their ability to be responsive to ever-changing political and policy contexts. According to the nihilists, to blindly stick to a plan of action may satisfy the funder’s desire for measurement and accountability, but the cost is operational ineffectiveness and the inability to achieve the broader goals that work was intended to achieve. As a result, the central tenet of this perspective is that neither planning nor conducting evaluations are appropriate in the context of policy advocacy. Rather, it is best to provide unrestricted support to such organizations and leave them free to pursue whatever goals are most feasible at a particular point in time. At the end of the project, the advocate will report back on successes and failures. The strengths of the nihilist argument is their insight about the need for advocates to shift strategies based on changing contexts, which, as we discuss below, highlights the need for evaluators to be sensitive that flexibility should be explicitly built into any evaluation model. Advocacy does take place in a context unique from traditional service delivery programs, and so a valid evaluation approach requires additional allowances for changes in strategies and plans during the course of a project. As we explain, how to provide flexibility while maintaining rigor becomes a key challenge for advocacy evaluators. The limitations of the nihilist position are many. Most critically, not being able to rigorously judge the effectiveness of advocacy projects or approaches is untenable for both foundations and practitioners. With limited resources that have alternative uses, foundations cannot continue to operate under decisionmaking protocols centered on relationships, intuition, and “common knowledge” if they are to maximize their social impact. Rather, decisions are 6 more likely to lead to successful outcomes when made based on empirical evidence and rational analysis of advocate performance. In addition, the practitioner community cannot progress as a field without rigorous knowledge creation about best practices, basic standards of effectiveness, or the creation of feedback loops to inform and improve performance over time. Lastly, advocates can and should create logic models and plans for how they will accomplish their goals at the beginning of any project, both because it makes prospective evaluation possible, and because it allows foundations and strategic partners to conduct due diligence and offer suggestions. Having a plan is a critical signal to others that there is a strategy for achieving results. But, as noted above, it is clear that there must be a formal process in place for amending these plans during the course of the project to account for the unique contexts in which advocacy occurs. Anthropologists – Do not structure the project, do not structure the evaluation This group comprises mainly academicians. They argue that traditional models of program evaluation are not well suited to the inherently political nature of advocacy and the fluid contexts in which policy is created and enacted, necessitating an approach based largely on the informed judgment of participants. Among the academicians, the argument is generally couched in theories of the policymaking process. Teles and Schmitt (2011) are representative when they write: Unfortunately, these sophisticated tools (for evaluating service programs) are almost wholly unhelpful in evaluating advocacy efforts. That’s because advocacy, even when carefully nonpartisan and based in research, is inherently political... Because of these peculiar features of politics, few if any best practices can be identified through the sophisticated methods that have been developed to evaluate the delivery of services. Advocacy evaluation should be seen, therefore, as a form of trained judgment—a craft requiring judgment and tacit knowledge—rather than as a scientific method. Similarly, Guthrie et al. (2005) identify seven key challenges to conducting advocacy evaluation using traditional methods: the complex nature of policy change, the role of external forces 7 beyond the advocate’s control, the often long time frame involved, the need for advocates to be flexible and shift strategies based on rapidly changing contexts, the inherent difficulty of making claims of causal attribution, and limitations on how directly non-profits can influence policy (i.e. they cannot lobby policymakers). The anthropologists hold that these aspects mean that traditional models of program evaluation cannot be used. Indeed, they go so far as to argue that the application of the scientific method itself to the task is inappropriate. In the place of traditional scientific methods, the anthropologists suggest that advocacy evaluation should be based on the professional judgment of an evaluator gathering information from the narratives of participants and other contextual sources. Teles and Schmitt (2011) explain: If scientific method is an inappropriate model, where can grantmakers look for an analogy that sheds light on the intensely judgmental quality of advocacy evaluation? One possibility is the skilled foreign intelligence analyst. She consumes official government reports and statistics, which she knows provide a picture of the world with significant gaps. She talks to insiders, some of whom she trusts, and others whose information she has learned to take with a grain of salt…It is the web of all of these imperfect sources of information, instead of a single measure, that helps the analyst figure out what is actually happening. And it is the quality and experience of the analyst—her tacit knowledge—that allows her to create an authoritative picture. In short, advocacy evaluation is all art and no science. Similarly, some anthropologists have viewed advocacy programs as “adaptive initiatives”2 whereby evaluations should be conducted using models that involve the evaluator as a participant or partner. Britt and Coffman (2012) write that “since adaptive initiatives do not seek predetermined objectives through the application of best practices, these (traditional formative and summative) evaluation approaches are a poor fit.” Instead, Britt and Coffman recommend two new methods: 1) Developmental evaluation – in which the evaluator is 2 Adaptive initiatives are described as projects that are defined by continual change and adaptation, rather than following a specified plan. 8 embedded in the project as a “critical friend” offering instantaneous feedback on data as it emerges; and 2) Goal-free evaluation – in which there are no goals set at the outset of a project, and the evaluation focuses on measuring “outcomes influenced, rather than limiting the inquiry to planned outcomes” (Ibid). The anthropologist position has a number of strong contributions to make to the development of a rigorous evaluation model for advocacy projects. For one, they are correct about the need to capture qualitative data about the local context in which the advocacy is occurring (including the effects of external forces beyond the control of advocates). This necessitates the addition of multiple approaches (triangulation) or narrative reporting to ensure that the evaluation is fully capturing the more intangible aspects of an advocacy project. A second is the need to be transparent about the limitations of attributing causation to a single advocate or advocacy group. Evaluations of advocacy projects should be upfront about the limits of producing cause and effect statements about particular advocates and specific policy changes. Nonetheless, when designed properly, we believe that advocacy evaluations can move understanding of a policy advocacy organization’s role from contribution closer to attribution. In addition, the anthropologist point about the potential non-linearity of policy change is also valid. In response, evaluators and advocates must avoid thinking too rigidly about the precise sequence of activities and results that are expected from a project. Indeed, planning for such non-linearity should be part of the prospective evaluation model. However, there are a number of serious limitations to the anthropologist position. Perhaps most importantly, the anthropologists overstate the magnitude of the obstacles to conducting advocacy evaluation. While it is true that the traditional service delivery program evaluation model does not directly translate to the advocacy context, it is also not so different that the 9 scientific method is completely unusable. Here, the anthropologists confuse the specific methods of the traditional model (regressions, randomized trials, treatment and control groups) with the underlying epistemological principles that it represents. Advocacy evaluations are unlikely to be conducted with regression models, but they can be conducted in ways designed to generate objective, reliable information about individual contributions to an observable outcome. The key to overcoming the challenges of advocacy evaluation is not to abandon the scientific method and succumb to the false promise of subjective professional judgment, but rather to return to the first principals of the scientific method. This involves seeking the best aspects of traditional, quantitative evaluation methods and using them to inform non-statistical methods of advocacy evaluation. As King et al. (1994) state: “non-statistical research will produce more reliable results if researchers pay attention to the rules of scientific inference – rules that are sometimes more clearly stated in the style of quantitative research.” The model that we present later in this paper aims to build on existing models using these principles of scientific inference. There are a number of other limitations of the anthropologist approach as well. First, evaluation results based on this approach cannot be independently verified or replicated. Because the data and methods are neither public nor established a priori, the reliability of results generated are called into question. As King et al. (1994) note: “If the method and logic of a researcher’s observations and inferences are left implicit, the scholarly community has no way of judging the validity of what was done…Such research is not a public act. Whether or not it makes good reading, it is not a contribution to social science.” The use of structure in designing and conducting evaluation plays a key role because it allows others to judge the quality of the data and methods used, as well as to replicate the results. 10 A related issue is that there is a high likelihood of bias being introduced into the evaluation when it relies predominantly on information provided by participants. On the one hand, participants have a strong incentive and natural inclination to over-state their contribution to an advocacy campaign. An evaluation based on participant assessments of their own work will likely significantly overstate the influence of respondents. Alternatively the incentive structure among grantees in an advocacy coalition could be one of mutual support, whereby each member may feel pressure to speak positively of other coalition members in the hopes that they will, in turn, speak positively of them. In either case, the results of the evaluation will be biased. Lastly, an important shortcoming is that the anthropologist approach ignores the reality faced by foundations which have to measure the return on investment of every grant, not just the collective impact of numerous grants.3 Opportunity costs in grantmaking are high; foundations are not in a position to provide funding for every group in a coalition and hope that at least some members will accomplish the ultimate goal of the project. Rather, foundations have a responsibility to seek out the most effective practitioners to receive support from the limited resources available. Constructivists – Structure the project, do not structure the evaluation Like the anthropologists, the constructivists also argue that the traditional methods of program evaluation are not applicable to public policy advocacy work. But, a key difference is that they eschew the completely unstructured approach of the anthropologists and seek to build more rigorous, and better planned, program models at the beginning of projects that can be used by foundations and practitioners to set expectations. These approaches generally include creating explicit logic models (theories of change) and developing plans for achieving goals that are established a priori (though it should be noted that these goals are generally broadly stated and 3 See Kania & Kramer (2011) for a detailed discussion of the concept of collective impact. 11 lack targets or measurement strategies) (e.g. Innovation Network, n.d.; Beer, 2012). But, like the anthropologists, in the end the evaluator still seeks to gauge performance largely through the perceptions of key actors. Coffman and Beer (2011) exemplify this approach, writing: Evaluation is constructivist. Constructivism is a view of the world and of knowledge that assumes that “truth” is not an objectively knowable thing but is individually and collectively constructed by the people who experience it. Constructivist evaluators assume that the practice of making meaning out of data, perceptions, and experience depends on and is always filtered through the knowers’ own experiences and understanding…. There are no hard and fast “truths” based on data; rather, evaluation for strategic learning generates knowledge and meaning from a combination of evaluation data, past experiences, and other forms of information. Even though the constructivists do not believe that there is objective truth in the data, they do believe that advocacy projects require some structure. For example, the Innovation Network (n.d.) offers a practical guide for advocacy evaluators that begins by suggesting that evaluators be actively engaged in helping programs create detailed theories of action and strategic plans. The guide states: “Using your evaluation expertise and experience, focus those involved in the process to document a robust, strapping theory of how to move from Point A to Point B. Embed in that conversation a discussion of strategies and interim outcomes.” But, despite all of the rigor suggested at the beginning of the process, Innovation Network later recommends using developmental evaluation as the best approach for measuring the effectiveness of advocacy projects. As noted above in the anthropologist section, developmental evaluation is a methodology defined by the placement of the evaluator as a project stakeholder, gathering real-time feedback and data by talking to other participants. The implications of the constructivist approach have been summed up by Beer (2012) in seven recommendations for foundations: 1) Provide unrestricted funding to advocates to provide them the necessary flexibility to react to local contexts; 2) Provide multi-year grants to advocates, recognizing that the policy process is a long-term game; 3) Offer higher grant 12 amounts to build advocates’ capacity; 4) Focus on intermediate-term outcomes, rather than longer-term policy “wins” (when setting goals); 5) Provide advocates with flexibility in both when they need to report, and how they report; 6) “Do not apply traditional program models to advocacy work”; and 7) Provide additional non-financial supports to help build advocates capacity. Across the seven recommendations, a consistent theme emerges in which advocacy is viewed as fundamentally different from traditional service delivery models. As a consequence, constructivists believe that advocates should be granted tremendous flexibility in a number of key respects, including evaluation. The most important contribution of the constructivists is the application of structure to advocacy planning. Most constructivist evaluators agree that advocacy projects need to start with an explicit theory of change or logic model. And some go further in suggesting that benchmarks and goals be identified that are connected to those models. By acknowledging and emphasizing the need for advocacy programs to be grounded in some explicit model of change and with prospective goals about what success will look like (even if only vaguely defined), the constructivists insert important aspects of the scientific model into advocacy evaluation. Unfortunately, they do not carry this rigorous start through to its conclusion, choosing instead to fall back on subjective methods and participant narratives as primary data sources when conducting the actual evaluation. Another benefit is that constructivists correctly note the complexity of the policymaking process and the need for any evaluation model to be explicitly based on a theory of policy change. By having an explicit model, the evaluator and the advocate can clarify at least some of the complexity and thereby create goals based on discrete and definable aspects of the policy change process. In addition, the constructivists offer sound advice to focus on shorter-term 13 outcomes when evaluating advocacy projects, even if they choose not to measure them rigorously. The timeframe on many policy campaigns can be long, necessitating evaluation of accomplishments that fall short of official policy change. As such, it is important to establish other measures that can capture important effects that may signal an increased likelihood of policy change occurring in the future. Similarly, the focus on advocacy organization capacity as a short-term outcome is also well considered and should be part of a rigorous model of advocacy evaluation. Increasing capacity to conduct advocacy should be strongly related to the ability of the organization to accomplish key goals. Despite these positive contributions, the drawbacks of the constructivist approach to conducting an evaluation are largely the same as those found among the anthropologists. Primarily, because the formal evaluation is based on the interpretations of participant observations, results cannot be reproduced nor can they be independently verified. The data and methods are not established based on the principles of scientific inference, and as such the results do not further transparent and defensible claims of the effects of any particular advocacy program. Post-positivists – Structure the project, structure the evaluation The post-positivists start at the same point as the constructivists – holding that rigorous advocacy evaluation begins by establishing prospective plans using logic models and setting key benchmarks a priori for what will be accomplished within a particular timeframe. But, unlike the constructivists and the anthropologists, the post-positivists continue to work within structured models to conduct the actual evaluation of advocacy programs (Guthrie et al., 2005). Specifically, evaluators use “applied” social science methods to create and then carry out data 14 collection and analysis to inform evaluations. Reisman et al. (2007) explain the distinction of using applied methods rather than conventional social science methods: Social science research techniques are guided by rigorous academic and scientific standards that qualify for building disciplinary knowledge. In contrast, evaluation research is guided by practical and applied interests that support the ability to make program and policy decisions. The methodological techniques are the same; however, the standards for evidence are often more stringent in the academic and scientific arenas. As such, the traditional post-positivist position has been that while the evaluator seeks to use the best social science research methods available, the evidentiary standard for making statements about performance is lowered. In our Omnibus model discussed below, we agree that the evidentiary standard for advocacy evaluation is lower than that used in the best service delivery program evaluations (e.g. randomized trials or regression discontinuities), but we argue that the evidentiary standard can, and should, be raised by including objective sources of evidence and quantitative methods whenever possible In practice, the post-positivists generally rely on traditional qualitative methods for conducting advocacy evaluations. In their guide, Reisman et al. (2007) suggest using focus groups, interviews, observation instruments, document content analysis, and surveys. The guide created by Coffman for the Harvard Family Research Project (2009) states: “Like all evaluations, advocacy evaluations can draw on a familiar list of traditional data collection methods, such as surveys, interviews, document review, observation, polling, focus groups, or case studies.” Even newer methodologies created for the specific purpose of evaluating advocates (such as the bellwether methodology, policymaker ratings, intense period debriefs, and system mapping (Coffman and Reed, 2009) or policy champion ratings (Devlin-Foltz and Molinaro, 2010)) are mainly based on qualitative approaches and participant self-assessments. It should be noted that current post-positivist models do occasionally include more objective measures such as policy tracking, but this tends to be the exception rather than the rule. Still, while the post-positivists 15 bring added structure and rigor to advocacy evaluation models, they are in effect typically restricting their available tools to a set that limits the ability of the evaluator to create inferences about program impacts. The post-positivists have also built a number of resources detailing short- and intermediate-term outcomes that foundations and practitioners can use to evaluate the performance of programs where formal policy change is on a longer timeframe. Reisman et al. (2007) have created a comprehensive list of possible outcomes, including ideas for metrics that track organizational capacity, strength of advocacy alliances, public support, improved policy, and broad social change. Coffman (2009) similarly provides a list of potential intermediate-term outcomes as well, including awareness, salience, visibility, political will, donors, and policy champions. However, while helpful in identifying what one could potentially measure, neither of these guides presents advice on how these might specifically be measured. The most important contribution of the post-positivist model is that it creates structure not only around the design of advocacy programs (like the constructivists), but also around the evaluation conducted during or after the project has concluded. This approach is based on the premise that there are social scientific methods that can be applied to the evaluation of policy advocacy, even if the evidentiary standard is lower than that used in the study of more traditional service delivery programs. Additionally, the post-positivist model assumes the existence of objective facts and truth that can be uncovered in systematic ways. In this way, the postpositivists add far more rigor to their evaluations than either the anthropologists or the constructivists, and results are externally verifiable. Another strength of the post-positivist model is that it creates a realistic balance between rigor and feasibility in evaluating advocacy projects. Unlike the other models, this approach is 16 scalable; it can be used to evaluate the performance of a large number of individuals or organizations at the same time (which is a particular benefit to foundations with large portfolios of advocacy grantees). In particular, because the approach is structured, it can be distilled into practical tools for participants so that they can conduct their own data collection, and even analysis, while respecting differences in organizations and approaches. There are a number of such tools currently available to foundations and practitioners, including an online logic model builder from Aspen Institute's Advocacy Planning and Evaluation Program, a logic model guide from the Kellogg Foundation, and advocacy evaluation guides from the Annie E. Casey Foundation, the California Endowment, and the Harvard Family Research Project. The most significant weakness of current post-positivist approaches is that they rely too much on self-assessment instruments and other qualitative methods to collect evidence. These methods are a significant improvement over past practice and the completely unstructured nature of other approaches, but there remain significant opportunities to introduce more quantitative measurement strategies into advocacy evaluation models and to rely on more objective sources of evidence (which we include in our model below). Lastly, even with the best models, strong claims of casual attribution are limited. However, advocate contribution to shorter-term outcomes can be assessed rigorously and the post-positivist model provides insights into how that can be accomplished. An Omnibus Model for Advocacy Evaluation All four of the dominant advocacy evaluation paradigms have strengths and weaknesses. These perspectives have contributions to make to a more rigorous and standardized approach to advocacy evaluation, but also pitfalls and limitations to be avoided or mitigated to the extent possible. In developing an Omnibus advocacy evaluation model, we combine the best aspects of 17 each approach and then build in additional components based on the principles of scientific inference developed by King et al. (1994).4 Table 2 below summarizes the contribution of each paradigm to our new omnibus model, along with the key ways in which our model then builds on them to create additional rigor. Table 2: Summary of the Contributions Made By Each Paradigm Paradigm Contribution Nihilist Add flexibility in amending plans and goals during a project Anthropologist Capture context through supplemental narratives; be explicit about limits of causal claims; expect non-linearity Constructivist Create structure around advocacy planning (logic models, strategic plans, goal-setting); need to have a theory of how policy change occurs; focus on shorter-term outcomes, including advocate capacity Post-Positivist Create structure around evaluation of projects (using traditional social science methods); belief that advocacy can be measured in systematic ways to uncover objective truths; practical tools for participants so that they can conduct their own evaluation activities Omnibus Approach (our model) Seek objective data in measurement strategies whenever possible; build quantitative performance measures (outputs and outcomes); connect metrics and measurement strategies to a theory of policy change As mentioned above, in our model we apply the key principles of scientific inference identified by King and his colleagues (1994). Specifically, we argue that the characteristics that are foundational for the development of any evaluation of advocacy projects are: 1) Creating inferences about the world: It is not enough to merely collect facts or tell a story about a particular phenomenon. Rather, researchers must collect and analyze empirical information in ways that allow them to infer beyond what was observed to make statements about parts of the world that were not observed. 4 Our model was developed with the assistance of consultants at BTW Informing Change. 18 2) Structure matters: Theories of action, strategic plans, and goals should always be established before an advocacy project begins. These should be as specific and measureable as possible. This is the only way an evaluator can establish rigorous and reliable evidence of performance. A coherent theory of change “is a concrete statement of plausible, testable pathways of change that can both guide actions and explain their impact. In this way, a theory of change provides a roadmap for action and a framework to chart and monitor progress over time” (Kubisch et al., 2002). 3) Methods matter: According to King et al.: “Scientific research uses explicit, codified, and public methods to generate and analyze data whose reliability can therefore be assessed.” In short, the methods used in an evaluation must adhere to the basic rules of scientific inquiry. For example, it is important to apply proper sampling procedures to promote representativeness and minimize bias. 4) Accept uncertainty: Evaluations of advocacy generate evidence about the relationships between the activities of advocates and changes in the policy environment, and this evidence should be collected and analyzed in the most objective and rigorous way possible. But, it must be acknowledged that even the best evidence will fall short of the “gold standard” for making statements about cause and effect and will not be able to create certainty. 5) Favor objective sources of data: Whenever possible, advocacy evaluations should seek to collect and analyze objective data to produce evidence. This will not always be possible as some outcomes require more subjective data sources and assessments (i.e. interviews, surveys, etc). But, the researcher should always prioritize objective 19 sources over subjective. These include independently verifiable data points of both outputs (services provided or activities conducted) and outcomes. Our omnibus model starts with the constructivist requirement that advocacy projects and evaluations begin with an explicit theory of change or logic model and then develop a set of a priori targets or benchmarks that are directly and explicitly connected to that model. Every project, whether a traditional service delivery program or an advocacy campaign, has to start with a plan. These plans may change, and often do for various reasons (service programs learn from mistakes and formative assessments to improve; advocates react to changes in the policy environment), but every project needs to have some notion about what it will take to create change and what that change will look like. Tools like the Aspen Institute's Advocacy Planning and Evaluation Program online logic model builder are a good place for advocacy practitioners to start. We also follow the advice of the constructivists that evaluation models need to be built on an explicit theory of policy change. Figure 1 below provides a graphic representation of that theory, which is based heavily on the work of Coffman (2009) and Reisman et al. (2007). In addition to the description of the policy process, we have also added a number of potential metrics that could be used in a measurement strategy (informed by a number of the tools cited in the previous sections). In this way practitioners can simply select the outputs they expect to complete and the outcomes they seek to impact from a predetermined list.5 We have also indicated which parts of the model are more amenable to causal inferences than others at the bottom of the figure. 5 As noted above, this model and its associated tools have been specifically designed to measure state-level K-12 education reform advocacy toward a particular set of policy preferences. But, the model can be applied to other issue areas or other policy preferences. 20 Figure 1 also contains our heuristic as to which areas of the policy change process are more amenable to measurement using objective data sources (see principle #5 above). This is an important aspect of the Omnibus model, as it graphically represents where evaluators and practitioners are more likely to have access to objective sources of data, and where a heavier reliance on subjective information is more likely. At the very beginning of the process (the outputs) and at the very end (academic performance) there should be a heavy focus on objective sources and quantitative measurement approaches. But, in the middle of the process (increasing capacity, changing attitudes and behaviors, affecting policy) measurement precision decreases and more subjective sources and qualitative measurement approaches will play a larger role in the evaluation of performance. Again, our position is not that every measure has to be quantitative and based on objective data sources, but that such measures should always be favored when they are available. 21 Figure 1: Policy Change Model for Advocacy Evaluation in K-12 Education Mapping Evaluation to Advocacy Org Logic Model WFF Grant Inputs Grantee Outputs Grant Outcomes Grant/Strategy Outcomes Strategy/ Initiative Outcomes Short-term (0–5 years) Medium-term (6–10 years) Long-term (11+ years) Schools & School Systems Policy Children Parents and Policy Influencers Choice • Projectbased and capacitybuilding grants • Movementbuilding support and guidance • Other (e.g., research, convenings) Stages Knowledge/ Awareness • Operational • Programmatic Outputs (services, products) Support Involvement Organizational capacity of Advocacy Org Knowledge/ Awareness Commitment Action • Development • Placement on the policy agenda • Adoption • Blocking • Implementation • Monitoring and evaluation • Maintenance Autonomy Academic performance Competition Quality Policymakers (Elected Officials & Civil Servants) • Rallies held • Reports published • Trainings conducted • Members recruited • Staff advocacy skill development • Media relationships • Legislative process knowledge • Policy tracking • Polling results • Invitation to educate policymakers • Public statements of support by policymakers • Parent participation in advocacy •Citation of policy points •Earned media • Bills introduced • Bills blocked • Development of rules and regs • Number of charter schools • Voucher programs • Funding equity • Choice marketshare • Student achievement growth • Student achievement attainment • HS graduation • College matriculation Confidence that given advocacy org caused the result… Graphic Developed in partnership with BTW Informing Change 22 Once a theory of action has been established, along with a set of key goals, we then draw from the post-positivist model the requirement that measurement plans have to be developed for how progress against each of the goals will be tracked. However, we diverge slightly from the post-positivists by seeking to base evaluations primarily, though not exclusively, on quantitative measurement approaches and objective data sources rather than the qualitative methods and selfassessments that they recommend. One particular exception to this rule is the addition of organization capacity assessment as the first short-term outcome in our model (which we base on the work of Alliance for Justice, 2005). Because this is a new direction, we have created a tool to assist in the development of quantitative performance metrics that can be tied to any advocacy logic model. The basis of the performance metric builder tool is the idea that every performance measure should contain five pieces of information to be valid and evaluable (King et al., 2011): WHAT is going to change or be accomplished through the program? HOW MUCH change will occur? What will the level of accomplishment be? WHO will achieve the change or accomplish the task? WHEN will the change or accomplishment occur? HOW DO WE KNOW the change occurred? The tool enables practitioners to select the key outputs and outcomes that they expect to accomplish during the term of a grant from our policy change model. Then, it provides a structured template whereby the practitioner only has to complete each of the five key questions set forth above, with advice on best practices for using objective data, where possible, to create quantifiable goals. Some examples of what this tool looks like are provided in Figure 2 below. 23 Figure 2: Examples of the WFF Advocacy Performance Measure Builder Outputs Outcomes 24 Examples of objective output measures generated by the tool that demonstrate how an advocacy organization is executing on its theory of action and that allow for a credible case to be made that an advocacy organization in fact contributed to a policy outcome include: Number of earned media editorial board meetings held, outreach attempts to reporters, press releases developed/distributed; Number of media partnerships developed, distribution outlets accessed through media partnerships; Number of members of your organization, constituencies represented in the coalition, coalition meetings held; Number of communities where organizing efforts take place; community events/trainings held; Number of rallies/marches held; Number of education materials developed/distributed, distribution outlets for education materials; Number of briefings or presentations held; Number of research/policy analysis products developed/distributed, distribution outlets for products; Number of educational meetings/briefings held with policymakers/opinion leaders; Number of relationship-building meetings held with key policymakers/opinion leaders; Examples of objective outcome measures of advocacy organization impacts are: Invitation of an advocacy group to testify in a legislative hearing; Publication of an op-ed describing a policy position in a major news outlet; Public citation of an advocacy organization by a policymaker; Number of policymakers inviting advocacy organization to talk about a particular policy proposal; Number of parent advocates who chose to attend a rally at the statehouse; Invitation of advocacy organization to serve on a policymaking task force; or Invitation of advocacy organization to participate in the rulemaking process. Appendix A provides a list of sample performance measures that could be developed using the tool. In addition to these discrete performance measures, we also draw from the insights of the anthropologists and the constructivists by building in flexibility for changing goals mid-course through a formal metric amendment process, as well as a narrative component to the reporting requirements so that qualitative information about contextual factors and other pertinent 25 information about performance is captured. But, we follow the direction of King et al. (1994) and prioritize the information that is collected through structured, public methods that adhere to the principles of scientific inference. The narrative context is important as supplemental information that informs the interpretation of findings, but it does not form the basis of findings themselves. In sum, the process of conducting advocacy evaluation using this model is as follows: 1) Start by creating a plan of action or logic model based on our theory of policy change (Figure 1 above). 2) Select key outputs and outcomes related to that plan. Our tool provides a predetermined set of possible options for each. 3) Build performance measures for each output and outcome around five key elements, using our tool to ensure rigor and objectivity to the extent possible. 4) During the term of the project, revise metrics if necessary as conditions change. 5) Generate and submit regular reports, which are used as formative assessments to revise plans during the term of the project and summative assessments to rate performance at the end of the project term. Ultimately, advocacy practitioner reports are used as the basis of a formal evaluation conducted by a funder, but the advocacy practitioner’s process of marshaling evidence to demonstrate success in implementing a theory of change is useful in itself as formative assessment. In this way, evaluation according to the structure of the Omnibus model becomes evaluation not merely for accountability, but also evaluation that is valuable for organizational learning. These evaluations then inform grantmaking decisions and strategic discussions about how to increase the return on advocacy projects in the funder’s portfolio. 26 Discussion This new model aims to makes a number of important contributions to the literature and potentially to advocacy evaluation practice as well. The most important is that it makes it possible to determine the extent to which individual advocates contributed to key outcomes based as much as possible on objective data. When used to evaluate short-term outcomes with strong measurement approaches, it can even move an evaluation from contribution closer to attribution, allowing for statements that are more causal rather than merely correlational. Another contribution is the application of quantitative methods and more objective data sources for tracking and measuring the performance of advocates toward meeting predetermined project goals. By enhancing the rigor of measurement, evaluations can make stronger, more reliable, and more valid inferences about the performance of various advocates within a particular policy context. Lastly, by contributing standardized tools for each step of the process, the model can be a cost-effective approach to measuring large numbers of advocacy projects. There are, of course, limitations to any model or methodology, and this Omnibus model is no different. As King et al. (1994) caution, there will always be uncertainty when we try to understand cause and effects in the real world, and this holds particularly true in the context of advocacy evaluation. This uncertainty requires humility on the part of evaluators about the extent to which observations, collected data, and narratives tell the whole story. In addition, most advocacy projects play out over a much longer period than the grant term. As such, findings will generally be limited to short- and intermediate-term outcomes (though a strength of this model is its ability to rigorously capture impacts at these more proximate stages). This is the trade-off; the shorter timeframe greatly limits the ability of evaluators to make causal claims about what 27 ultimately led to particular policy changes in the long term, but the ability to make causal claims is enhanced when focused on short- and intermediate-term outcomes. Conclusion Reisman et al. (2007) note in their report that the history of evaluation in philanthropy is in many ways currently repeating itself: In the early 1990s, program evaluation was a new concept for many organizations involved in the delivery of social and human services. As program evaluation began to be widely implemented, skepticism and worry were common among many social and human service program providers. Complaints included providers’ perceptions that the results of their programs’ work could not be adequately named or measured; that program evaluation was far too academic and complex to implement in a program setting, and that evaluation would surely take away resources from direct service delivery. Over the past decade and a half, evaluation has become much more commonplace for social service programs, particularly in the non-profit and public sectors…The situation is reminiscent of attitudes toward the measurement of advocacy and policy work. Evaluation in this field has been viewed as a new and intimidating prospect, though it does not have to be. Just as the early skeptics of evaluating traditional service delivery programs eventually lost out to those seeking to bring more rigor to studying the effects of such programs, so too we believe that today’s skeptics of measuring the effectiveness of advocacy projects. In each we see a natural progression from beliefs that a phenomenon cannot be measured, to arguments that measurement can only happen using atypical and unscientific methods, and then finally to the general acceptance of using standard social science models and techniques to determine discrete performance with a relatively high degree of precision. Ultimately, we hope that the new approach presented in this paper will advance the conversation among researchers, and inform the practice of evaluators and policy advocates, as we move toward that final stage of acceptance. 28 Appendix A: Performance Measure Examples6 Sample Outputs: Hold Education Sessions: By November 2013, Grantee will hold one-on-one meetings to discuss education policy generally with 3 school board members, 10 state legislators, and the district superintendent, as recorded in program management files. Hold Education Sessions: Grantee will organize and execute at least 10 educational meetings on charter school policy improvements for community leaders and policymakers by June 30, 2012, as recorded in program management files. Conduct Rally: Grantee will participate in organizing and hosting (or co-hosting) a rally to support charter schools at the Statehouse by April 30, 2012. The event will be attended by at least 100 people, as recorded in program management files. Recruit New Contacts: By December 2013, the E-advocacy mailing list will be increased from 4,200 to 6,000, as recorded in program management files. Contact with Legislators: Program staff will conduct at least 12 meetings with targeted legislators and/or their staff to educate them on general issues related to school choice programs during each of years 1 and 2 of the program, as measured by program records. Sample Outcomes: 6 Increased Awareness: By June 2014, at least 25% of key policymakers and opinion leaders will report being aware of [policy of interest], as measured by a grantee conducted/third-party survey of a select group of key policymakers and opinion leaders. Currently, 10% of key policymakers and opinion leaders are aware of [policy of interest]. Increased Support: By June 2015, there will be an increase of 25% in the number of key policymakers and opinion leaders that publicly support [policy of interest], as measured by media hits and records of public remarks of policymakers. Currently, 10% of key policymakers and opinion leaders support [policy of interest]. Improved Policy: By May 2014, school district policy will change so that parents who want to attend a school anywhere in the city will get free transportation, as recorded in official district policy. Currently, free transportation is only provided for up to 10 miles from the residence. Improved Policy: By October 2012, the cap on charter schools in the state will be increased by at least 50 schools, as recorded in official state policy. The current cap is 100 schools. These examples are drawn from King et al. (2011) 29 Bibliography Alliance for Justice. (2005). Build Your Advocacy Grantmaking: Advocacy Evaluation Tool. Washington, DC. Beer, T. (2012). Best Practices and Emerging Trends in Advocacy Grantmaking. Center for Evaluation Innovation. Bill and Melinda Gates Foundation. (2010). A Guide to Actionable Measurement. Accessed May 2012 at: http://docs.gatesfoundation.org/learning/documents/guide-to-actionablemeasurement.pdf. Brest, P. (2012). A Decade of Outcome-Oriented Philanthropy. Stanford Social Innovation Review, Spring 2012. Britt, H. & Coffman, J. (2012). Evaluation for Models and Adaptive Initiatives. The Foundation Review, 4(2). Coffman, J. (2009). A User’s Guide to Advocacy Evaluation Planning. Harvard Family Research Project. Coffman, J (2008). Foundations and Public Policy Grantmaking. James Irvine Foundation. Coffman, J. & Beer, H. (2011). Evaluation to Support Strategic Learning: Principles and Practices. Center for Evaluation Innovation. Coffman, J. & Reed, E. (2009). Unique Methods in Advocacy Evaluation. Innovation Network. Accessed May 2012 at: http://www.innonet.org/resources/files/Unique_Methods_Brief.pdf Council of Nonprofits. (2012). Self-Assessment and Evaluation of Outcomes. Accessed May 2012 at: http://www.councilofnonprofits.org/resources/resources-topic/evaluation-andmeasurement . Cutler 2012 Generosity Without Measurement: It Can't Hurt. Family Care Foundation website. Accessed May 2012 at: http://www.familycare.org/opinions/generosity-withoutmeasurement-it-cant-hurt. Devlin-Foltz, D. & Molinaro, L. (2010). Champions and “Champion-ness”: Measuring Efforts to Create Champions for Policy Change. Center for Evaluation Innovation. Gertler, P., Martinez, S., Premand, P., Rawlings, L., & Vermeersch, C. (2011). Impact Evalaution in Practice. The World Bank. 30 Greene, J. (2005). “Buckets Into the Sea: Why Philanthropy Isn't Changing Schools, and How it Could," in With the Best of Intentions, edited by Frederick M. Hess, Harvard Education Press: Cambridge, MA. Guthrie, K., Louie, J., David, T., & Foster, C. (2005). The Challenge of Assessing Policy and Advocacy Activities: Strategies for a Prospective Evaluation Approach. The California Endowment. Innovation Network. (2008). Speaking for Themselves: Advocates’ Perspectives on Evaluation. Accessed May 2012 at: http://www.innonet.org/client_docs/File/advocacy/speaking_for_themselves_web_basic. pdf Innovation Network. (n.d.) A Practical Guide to Advocacy Evaluation. Accessed May 2012 at: http://www.innonet.org/client_docs/File/advocacy/pathfinder_advocate_web.pdf Kania, J. & Kramer, M. (2011). Collective Impact. Stanford Social Innovation Review, Winter 2011. King, G., Keohane, R., & Verba, S. (1994). Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton University Press: Princeton, NJ. King, M., Holley, M. & Carr, M. (2011). How to Construct Performance Measures: A Brief Guide for Education Reform Grant Applicants to the Walton Family Foundation. Available online at: http://www.waltonfamilyfoundation.org/about/evaluation-unit. Kubisch, A.C., Auspos, P., Brown, P., Chaskin, R., Fulbright-Anderson, K., & Hamilton, R. (2002). Voices From the Field II: Reflections on Comprehensive Community Change. The Apsen Institute: Washington, DC. Reisman, J., Gienapp, A., & Stachowiak, S. (2007). A Guide to Measuring Advocacy and Policy. Annie E. Casey Foundation. Rossi, P., Lipsey, M., & Freeman, H. (2004) Evaluation: A Systematic Approach. SAGE Publications: Thousand Oaks, CA. Teles, S. & Schmitt, M. (2011). The Elusive Craft of Evaluating Advocacy. Stanford Social Innovation Review, Summer 2011. Shambra, W. (2011) Measurement Is a Futile Way to Approach Grant Making. Chronicle of Philanthropy, February 6, 2011. Shaywitz, D. (2011). Our Metrics Fetish – And What to Do About It. Forbes Magazine, June 23, 2011. Accessed May 2012 at: http://www.forbes.com/sites/davidshaywitz/2011/06/23/our-metrics-fetish-and-what-todo-about-it. 31 W.K. Kellogg Foundation. (2005). W.K. Kellogg Foundation Evaluation Handbook. Accessed May 2012 at: http://www.wkkf.org/knowledge-center/resources/2010/W-K-KelloggFoundation-Evaluation-Handbook.aspx. Wales, J. (2012). Advancing Evaluation Practices in Philanthropy. The Aspen Institute. Accessed May 2012 at: http://www.aspeninstitute.org/publications/advancing-evaluation-practicesphilanthropy. Whelan, J. (2008). Advocacy Evaluation: Review and Opportunities. Innovation Network. Accessed May 2012 at: http://www.innonet.org/resources/files/Whelan_Advocacy_Evaluation.pdf. 32