A New Approach to Evaluating Public Policy Advocacy: Creating

advertisement
A New Approach to Evaluating Public Policy Advocacy:
Creating Evidence of Cause and Effect
Matthew Carr, Ph.D.
Marc Holley, Ph.D.
Walton Family Foundation
March 2013
Paper Prepared for the 38th annual meeting of the Association for Education Finance and Policy
**Corresponding author: Matthew Carr (mcarr@wffmail.com)
DRAFT: PLEASE DO NOT CITE WITHOUT PERMISSION
Abstract
There currently exists a significant disconnect between researchers and practitioners around
whether, when, and how to measure and evaluate appropriately the performance and influence of
non-profit policy advocacy organizations. Current approaches do not provide a feasible solution
for evaluators and practitioners who need to determine the relative effectiveness of multiple
advocacy strategies or organizations. In this paper we argue that traditional social scientific
evaluation techniques (logic models, output and outcome measurement) grounded in postpositivist theory can improve upon existing advocacy evaluation models and allow for rigorous
and objective assessment of the influence of advocacy organizations. Along with a more
theoretical discussion of the trade-offs and implications of current approaches to advocacy
evaluation, we also include specific direction about how to conduct evaluations using a new
model that combines the strengths of existing approaches. Examples of the new model are
provided, with a focus on state-level K-12 education reform advocacy efforts. Ultimately,
presentation of this new approach aims to advance the conversation among researchers and
inform the practice of evaluators and policy advocates.
DRAFT: PLEASE DO NOT CITE WITHOUT PERMISSION
Introduction
A growing number of philanthropic organizations are shifting the focus of their giving
from more traditional service delivery projects (such as a tutoring program for struggling
readers) to funding groups that engage in public policy advocacy1 (for example, to amend the
federal No Child Left Behind Act) (Teles & Schmitt, 2011; Beer, 2012). As Coffman (2008)
writes: “Foundations trying to better leverage their influence and improve their impact
increasingly are being urged to embrace advocacy and public policy grantmaking as a way to
substantially enhance their results and advance their missions.”
There are two primary rationales for funding advocacy projects over direct services: 1)
The expected scale of impact from a government program is expected to be greater than one
could expect from a discrete program (for example, funding a preschool program for low-income
children in a local neighborhood versus advocacy to approve federal funding for universal
preschool nationwide) (Greene, 2005); and 2) There is a hope that government policy can
mitigate or ameliorate the underlying conditions that lead to the necessity for discrete programs
in the first place (government anti-poverty programs might obviate the need for the preschool
program in the local neighborhood). In short, achieving large-scale social change is now seen as
beyond the ability of private funders alone, requiring engagement in the public policymaking
arena.
At the same time that some foundations are shifting the focus of their resource allocations
toward advocacy, there is also a cultural change occurring around how they use data and
evaluation to shape organizational decisionmaking more generally. Wales (2012) writes: “In
recent years, the philanthropic sector has neared consensus on the need to improve measurement
It is important to note that in this paper “advocacy” refers to the work of non-profit 501(c)(3) organizations, such
as grassroots organizing, public education campaigns, conducting and disseminating policy research, or media
relations. It does not include organizations that conduct direct lobbying activities.
1
1
and evaluation of its work.” There are a number of reasons for this shift, including the desire on
the part of many foundations to be more strategic in their giving so as to maximize the
effectiveness of their giving, to show impact through rigorously collected and analyzed evidence,
or to hold grantee organizations more accountable for achieving stated goals (Brest, 2012).
While there are a few vocal opponents of these efforts to use more data in decisionmaking and
accountability (e.g. Shaywitz, 2011), many philanthropies are moving quickly in the direction of
using more rigorous measurement strategies to increase accountability concerning the
performance of grantees and themselves in creating social impact.
Toward this end, some foundations have become increasingly sophisticated in their use of
evaluation to determine the effectiveness of traditional service delivery programs they fund (see
for example the Gates, Broad, Kellogg, Annie. E. Casey, or Robin Hood foundations). Assisting
in this development is the fact that most of the basic methods of social science apply readily to
questions of whether a particular group of service recipients benefitted from a particular program
(from basic participant surveys all the way to randomized controlled trials). As Reisman et al.
(2007) note: “The general field of evaluation offers an extensive literature that provides both
theoretical and practical examples of how social scientific inquiry can be applied to outcome
measurement for an array of programs, interventions, and initiatives.” One can readily find
textbooks on program evaluation (Rossi et al., 2004), guidebooks for evaluators (Gertler et al.,
2011), guides specifically designed around philanthropic evaluation (Gates Foundation, 2010,
Kellogg Foundation, 2005), and practitioner toolkits (Council of Nonprofits, 2012). In addition,
there are numerous consulting firms that specialize in the evaluation of service delivery
programs (RAND, Mathematica, Westat, etc.). In many ways, it has never been easier for a
2
foundation or nonprofit to evaluate rigorously the impact of a traditional service delivery
program on the participants it serves.
Unfortunately, the same level of sophistication and availability of tools for foundations
and other practitioners do not yet exist to evaluate public policy advocacy projects. In 2005
Guthrie et al. noted: “There is no particular methodology, set of metrics or tools to measure the
efficacy of advocacy grant making in widespread use. In fact, there is not yet a real ‘field’ or
‘community of practice’ in evaluation of policy advocacy.” Two years later Reisman et al.
(2007) summarized the state advocacy evaluation: “The field of evaluation in the area of
advocacy and policy change is nascent, offering very few formally written documents that relate
to approaches and methods. To date, when evaluation of advocacy and policy work has even
occurred at all, efforts can be best characterized as attempts, or even missteps.” This lack of
established norms for advocacy evaluation has led to confusion among foundations and
practitioners around when and how to measure and evaluate appropriately the performance and
influence of non-profit policy advocacy organizations. Exacerbating this confusion is significant
disagreement among a number of scholars, funders, and practitioners about whether such
evaluations even can, or should, be conducted in the first place.
Given the state of the advocacy evaluation field, it is unsurprising that a 2008 survey
produced for the Annie E. Casey Foundation and The Atlantic Philanthropies found that only
24.6% of 211 respondent nonprofit organizations that conduct advocacy reported that their work
had been formally evaluated. When respondents were asked what challenges they faced in
communicating advocacy success, two of the top three responses were a lack of knowledge about
how to measure success (29.6%) and lack of internal capacity to conduct evaluation (14.2%)
(Innovation Network, 2008). These results suggest two clear obstacles to getting more nonprofits
3
to engage in advocacy evaluation: 1) Greater clarity around when and how to conduct such
evaluations; and 2) resources that reduce the burden on advocacy practitioners and funders for
carrying out evaluation. Our goal is to provide guidance on both of these issues.
In this paper we identify and review the various theoretical arguments around whether
and how best to evaluate public policy advocacy, identifying the strengths and weaknesses of
each. This is the first such attempt to create a typology that categorizes systematically the
existing theoretical perspectives on this subject. Based on this review, we then present a new
model of advocacy evaluation that builds on the strengths of previous work to provide specific
direction to funders and practitioners about how to conduct prospective advocacy evaluations
that are objective, reliable, rigorous, and cost-effective.
In particular, our model establishes a series of specific, measurable performance goals,
connected to an explicit theory of policy change, that can be used by any foundation or nonprofit
engaged in advocacy work, even if it only has limited evaluation capacity. This new model is
focused on state-level K-12 education reform advocacy efforts. As such, the specifics of the
model are designed for evaluating policy advocacy in that area. But, the principles of the
approach are more broadly applicable to other policy areas. The policy preferences described in
our model represent one example, but others could be pursued. We provide specific examples of
how the model can be used at the end of the paper.
Advocacy Evaluation – Four Dominant Schools of Thought
Based on our review of the extant literature on the evaluation of advocacy efforts, we
have identified four general belief systems about whether and how to best conduct evaluations of
public policy advocacy work: Nihilists, Anthropologists, Constructivists, and Post-positivists.
We review the basic tenets of each perspective, along with the strengths and weaknesses of each
4
in turn. Table 1 below provides a quick overview of the key differences between the various
perspectives. In particular, the categorization of each approach depends on its perspective
regarding the need to apply formal structures to the planning of an advocacy program on the
front end and to the implementation of evaluation methods on the back end. Additional details on
the distinguishing features of the perspectives are provided in each section below.
Table 1: Overview of Differences between the Four Perspectives
Evaluation Phase
Paradigm
Planning
Conducting
N/A
N/A
Nihilist
No Structure
No Structure
Anthropologist
Structure
No Structure
Constructivist
Structure
Structure
Post-Positivist
Nihilists – Do not create plans, do not conduct evaluations
The nihilist view of advocacy evaluation is generally represented by advocacy
practitioners (though it is held by others as well, for example see Cutler, 2012 or Shambra, 2011)
and comprises two central arguments. Reisman et al. (2007) summarized both when they
reported that one of the factors making advocacy evaluation difficult “is the belief among some
advocacy organizations that their work cannot be measured and that any attempt to do so
diminishes the power of their efforts.” The first part of the nihilist view is that advocacy work is
rooted in subtle human interactions that cannot be captured by traditional evaluation tools. For
example, they argue that there are no tools available to social scientists to measure the planting
of an idea in the mind of a policy influencer, or the reshaping of the definition of a public
problem to make it more amenable to a particular, favored policy solution. In short, neither
quantitative nor qualitative methods can provide the evaluator with the data needed to judge
whether an advocate was successful or not.
5
The crux of the nihilist’s second position is that the act of planning and conducting
evaluation inherently reduces the effectiveness of advocates. They argue that the nature of
advocacy requires a high level of flexibility, and that evaluation (at least using traditional
methods) interferes with their ability to be responsive to ever-changing political and policy
contexts. According to the nihilists, to blindly stick to a plan of action may satisfy the funder’s
desire for measurement and accountability, but the cost is operational ineffectiveness and the
inability to achieve the broader goals that work was intended to achieve. As a result, the central
tenet of this perspective is that neither planning nor conducting evaluations are appropriate in the
context of policy advocacy. Rather, it is best to provide unrestricted support to such
organizations and leave them free to pursue whatever goals are most feasible at a particular point
in time. At the end of the project, the advocate will report back on successes and failures.
The strengths of the nihilist argument is their insight about the need for advocates to shift
strategies based on changing contexts, which, as we discuss below, highlights the need for
evaluators to be sensitive that flexibility should be explicitly built into any evaluation model.
Advocacy does take place in a context unique from traditional service delivery programs, and so
a valid evaluation approach requires additional allowances for changes in strategies and plans
during the course of a project. As we explain, how to provide flexibility while maintaining rigor
becomes a key challenge for advocacy evaluators.
The limitations of the nihilist position are many. Most critically, not being able to
rigorously judge the effectiveness of advocacy projects or approaches is untenable for both
foundations and practitioners. With limited resources that have alternative uses, foundations
cannot continue to operate under decisionmaking protocols centered on relationships, intuition,
and “common knowledge” if they are to maximize their social impact. Rather, decisions are
6
more likely to lead to successful outcomes when made based on empirical evidence and rational
analysis of advocate performance. In addition, the practitioner community cannot progress as a
field without rigorous knowledge creation about best practices, basic standards of effectiveness,
or the creation of feedback loops to inform and improve performance over time. Lastly,
advocates can and should create logic models and plans for how they will accomplish their goals
at the beginning of any project, both because it makes prospective evaluation possible, and
because it allows foundations and strategic partners to conduct due diligence and offer
suggestions. Having a plan is a critical signal to others that there is a strategy for achieving
results. But, as noted above, it is clear that there must be a formal process in place for amending
these plans during the course of the project to account for the unique contexts in which advocacy
occurs.
Anthropologists – Do not structure the project, do not structure the evaluation
This group comprises mainly academicians. They argue that traditional models of
program evaluation are not well suited to the inherently political nature of advocacy and the fluid
contexts in which policy is created and enacted, necessitating an approach based largely on the
informed judgment of participants. Among the academicians, the argument is generally couched
in theories of the policymaking process. Teles and Schmitt (2011) are representative when they
write:
Unfortunately, these sophisticated tools (for evaluating service programs) are almost
wholly unhelpful in evaluating advocacy efforts. That’s because advocacy, even when
carefully nonpartisan and based in research, is inherently political... Because of these
peculiar features of politics, few if any best practices can be identified through the
sophisticated methods that have been developed to evaluate the delivery of services.
Advocacy evaluation should be seen, therefore, as a form of trained judgment—a craft
requiring judgment and tacit knowledge—rather than as a scientific method.
Similarly, Guthrie et al. (2005) identify seven key challenges to conducting advocacy evaluation
using traditional methods: the complex nature of policy change, the role of external forces
7
beyond the advocate’s control, the often long time frame involved, the need for advocates to be
flexible and shift strategies based on rapidly changing contexts, the inherent difficulty of making
claims of causal attribution, and limitations on how directly non-profits can influence policy (i.e.
they cannot lobby policymakers). The anthropologists hold that these aspects mean that
traditional models of program evaluation cannot be used. Indeed, they go so far as to argue that
the application of the scientific method itself to the task is inappropriate.
In the place of traditional scientific methods, the anthropologists suggest that advocacy
evaluation should be based on the professional judgment of an evaluator gathering information
from the narratives of participants and other contextual sources. Teles and Schmitt (2011)
explain:
If scientific method is an inappropriate model, where can grantmakers look for an
analogy that sheds light on the intensely judgmental quality of advocacy evaluation? One
possibility is the skilled foreign intelligence analyst. She consumes official government
reports and statistics, which she knows provide a picture of the world with significant
gaps. She talks to insiders, some of whom she trusts, and others whose information she
has learned to take with a grain of salt…It is the web of all of these imperfect sources of
information, instead of a single measure, that helps the analyst figure out what is actually
happening. And it is the quality and experience of the analyst—her tacit knowledge—that
allows her to create an authoritative picture.
In short, advocacy evaluation is all art and no science.
Similarly, some anthropologists have viewed advocacy programs as “adaptive
initiatives”2 whereby evaluations should be conducted using models that involve the evaluator as
a participant or partner. Britt and Coffman (2012) write that “since adaptive initiatives do not
seek predetermined objectives through the application of best practices, these (traditional
formative and summative) evaluation approaches are a poor fit.” Instead, Britt and Coffman
recommend two new methods: 1) Developmental evaluation – in which the evaluator is
2
Adaptive initiatives are described as projects that are defined by continual change and adaptation, rather than
following a specified plan.
8
embedded in the project as a “critical friend” offering instantaneous feedback on data as it
emerges; and 2) Goal-free evaluation – in which there are no goals set at the outset of a project,
and the evaluation focuses on measuring “outcomes influenced, rather than limiting the inquiry
to planned outcomes” (Ibid).
The anthropologist position has a number of strong contributions to make to the
development of a rigorous evaluation model for advocacy projects. For one, they are correct
about the need to capture qualitative data about the local context in which the advocacy is
occurring (including the effects of external forces beyond the control of advocates). This
necessitates the addition of multiple approaches (triangulation) or narrative reporting to ensure
that the evaluation is fully capturing the more intangible aspects of an advocacy project. A
second is the need to be transparent about the limitations of attributing causation to a single
advocate or advocacy group. Evaluations of advocacy projects should be upfront about the limits
of producing cause and effect statements about particular advocates and specific policy changes.
Nonetheless, when designed properly, we believe that advocacy evaluations can move
understanding of a policy advocacy organization’s role from contribution closer to attribution.
In addition, the anthropologist point about the potential non-linearity of policy change is
also valid. In response, evaluators and advocates must avoid thinking too rigidly about the
precise sequence of activities and results that are expected from a project. Indeed, planning for
such non-linearity should be part of the prospective evaluation model.
However, there are a number of serious limitations to the anthropologist position. Perhaps
most importantly, the anthropologists overstate the magnitude of the obstacles to conducting
advocacy evaluation. While it is true that the traditional service delivery program evaluation
model does not directly translate to the advocacy context, it is also not so different that the
9
scientific method is completely unusable. Here, the anthropologists confuse the specific methods
of the traditional model (regressions, randomized trials, treatment and control groups) with the
underlying epistemological principles that it represents. Advocacy evaluations are unlikely to be
conducted with regression models, but they can be conducted in ways designed to generate
objective, reliable information about individual contributions to an observable outcome. The key
to overcoming the challenges of advocacy evaluation is not to abandon the scientific method and
succumb to the false promise of subjective professional judgment, but rather to return to the first
principals of the scientific method. This involves seeking the best aspects of traditional,
quantitative evaluation methods and using them to inform non-statistical methods of advocacy
evaluation. As King et al. (1994) state: “non-statistical research will produce more reliable
results if researchers pay attention to the rules of scientific inference – rules that are sometimes
more clearly stated in the style of quantitative research.” The model that we present later in this
paper aims to build on existing models using these principles of scientific inference.
There are a number of other limitations of the anthropologist approach as well. First,
evaluation results based on this approach cannot be independently verified or replicated. Because
the data and methods are neither public nor established a priori, the reliability of results
generated are called into question. As King et al. (1994) note: “If the method and logic of a
researcher’s observations and inferences are left implicit, the scholarly community has no way of
judging the validity of what was done…Such research is not a public act. Whether or not it
makes good reading, it is not a contribution to social science.” The use of structure in designing
and conducting evaluation plays a key role because it allows others to judge the quality of the
data and methods used, as well as to replicate the results.
10
A related issue is that there is a high likelihood of bias being introduced into the
evaluation when it relies predominantly on information provided by participants. On the one
hand, participants have a strong incentive and natural inclination to over-state their contribution
to an advocacy campaign. An evaluation based on participant assessments of their own work will
likely significantly overstate the influence of respondents. Alternatively the incentive structure
among grantees in an advocacy coalition could be one of mutual support, whereby each member
may feel pressure to speak positively of other coalition members in the hopes that they will, in
turn, speak positively of them. In either case, the results of the evaluation will be biased.
Lastly, an important shortcoming is that the anthropologist approach ignores the reality
faced by foundations which have to measure the return on investment of every grant, not just the
collective impact of numerous grants.3 Opportunity costs in grantmaking are high; foundations
are not in a position to provide funding for every group in a coalition and hope that at least some
members will accomplish the ultimate goal of the project. Rather, foundations have a
responsibility to seek out the most effective practitioners to receive support from the limited
resources available.
Constructivists – Structure the project, do not structure the evaluation
Like the anthropologists, the constructivists also argue that the traditional methods of
program evaluation are not applicable to public policy advocacy work. But, a key difference is
that they eschew the completely unstructured approach of the anthropologists and seek to build
more rigorous, and better planned, program models at the beginning of projects that can be used
by foundations and practitioners to set expectations. These approaches generally include creating
explicit logic models (theories of change) and developing plans for achieving goals that are
established a priori (though it should be noted that these goals are generally broadly stated and
3
See Kania & Kramer (2011) for a detailed discussion of the concept of collective impact.
11
lack targets or measurement strategies) (e.g. Innovation Network, n.d.; Beer, 2012). But, like the
anthropologists, in the end the evaluator still seeks to gauge performance largely through the
perceptions of key actors. Coffman and Beer (2011) exemplify this approach, writing:
Evaluation is constructivist. Constructivism is a view of the world and of knowledge that
assumes that “truth” is not an objectively knowable thing but is individually and
collectively constructed by the people who experience it. Constructivist evaluators
assume that the practice of making meaning out of data, perceptions, and experience
depends on and is always filtered through the knowers’ own experiences and
understanding…. There are no hard and fast “truths” based on data; rather, evaluation for
strategic learning generates knowledge and meaning from a combination of evaluation
data, past experiences, and other forms of information.
Even though the constructivists do not believe that there is objective truth in the data,
they do believe that advocacy projects require some structure. For example, the Innovation
Network (n.d.) offers a practical guide for advocacy evaluators that begins by suggesting that
evaluators be actively engaged in helping programs create detailed theories of action and
strategic plans. The guide states: “Using your evaluation expertise and experience, focus those
involved in the process to document a robust, strapping theory of how to move from Point A to
Point B. Embed in that conversation a discussion of strategies and interim outcomes.” But,
despite all of the rigor suggested at the beginning of the process, Innovation Network later
recommends using developmental evaluation as the best approach for measuring the
effectiveness of advocacy projects. As noted above in the anthropologist section, developmental
evaluation is a methodology defined by the placement of the evaluator as a project stakeholder,
gathering real-time feedback and data by talking to other participants.
The implications of the constructivist approach have been summed up by Beer (2012) in
seven recommendations for foundations: 1) Provide unrestricted funding to advocates to provide
them the necessary flexibility to react to local contexts; 2) Provide multi-year grants to
advocates, recognizing that the policy process is a long-term game; 3) Offer higher grant
12
amounts to build advocates’ capacity; 4) Focus on intermediate-term outcomes, rather than
longer-term policy “wins” (when setting goals); 5) Provide advocates with flexibility in both
when they need to report, and how they report; 6) “Do not apply traditional program models to
advocacy work”; and 7) Provide additional non-financial supports to help build advocates
capacity. Across the seven recommendations, a consistent theme emerges in which advocacy is
viewed as fundamentally different from traditional service delivery models. As a consequence,
constructivists believe that advocates should be granted tremendous flexibility in a number of
key respects, including evaluation.
The most important contribution of the constructivists is the application of structure to
advocacy planning. Most constructivist evaluators agree that advocacy projects need to start with
an explicit theory of change or logic model. And some go further in suggesting that benchmarks
and goals be identified that are connected to those models. By acknowledging and emphasizing
the need for advocacy programs to be grounded in some explicit model of change and with
prospective goals about what success will look like (even if only vaguely defined), the
constructivists insert important aspects of the scientific model into advocacy evaluation.
Unfortunately, they do not carry this rigorous start through to its conclusion, choosing instead to
fall back on subjective methods and participant narratives as primary data sources when
conducting the actual evaluation.
Another benefit is that constructivists correctly note the complexity of the policymaking
process and the need for any evaluation model to be explicitly based on a theory of policy
change. By having an explicit model, the evaluator and the advocate can clarify at least some of
the complexity and thereby create goals based on discrete and definable aspects of the policy
change process. In addition, the constructivists offer sound advice to focus on shorter-term
13
outcomes when evaluating advocacy projects, even if they choose not to measure them
rigorously. The timeframe on many policy campaigns can be long, necessitating evaluation of
accomplishments that fall short of official policy change. As such, it is important to establish
other measures that can capture important effects that may signal an increased likelihood of
policy change occurring in the future. Similarly, the focus on advocacy organization capacity as
a short-term outcome is also well considered and should be part of a rigorous model of advocacy
evaluation. Increasing capacity to conduct advocacy should be strongly related to the ability of
the organization to accomplish key goals.
Despite these positive contributions, the drawbacks of the constructivist approach to
conducting an evaluation are largely the same as those found among the anthropologists.
Primarily, because the formal evaluation is based on the interpretations of participant
observations, results cannot be reproduced nor can they be independently verified. The data and
methods are not established based on the principles of scientific inference, and as such the results
do not further transparent and defensible claims of the effects of any particular advocacy
program.
Post-positivists – Structure the project, structure the evaluation
The post-positivists start at the same point as the constructivists – holding that rigorous
advocacy evaluation begins by establishing prospective plans using logic models and setting key
benchmarks a priori for what will be accomplished within a particular timeframe. But, unlike the
constructivists and the anthropologists, the post-positivists continue to work within structured
models to conduct the actual evaluation of advocacy programs (Guthrie et al., 2005).
Specifically, evaluators use “applied” social science methods to create and then carry out data
14
collection and analysis to inform evaluations. Reisman et al. (2007) explain the distinction of
using applied methods rather than conventional social science methods:
Social science research techniques are guided by rigorous academic and scientific
standards that qualify for building disciplinary knowledge. In contrast, evaluation
research is guided by practical and applied interests that support the ability to make
program and policy decisions. The methodological techniques are the same; however, the
standards for evidence are often more stringent in the academic and scientific arenas.
As such, the traditional post-positivist position has been that while the evaluator seeks to use the
best social science research methods available, the evidentiary standard for making statements
about performance is lowered. In our Omnibus model discussed below, we agree that the
evidentiary standard for advocacy evaluation is lower than that used in the best service delivery
program evaluations (e.g. randomized trials or regression discontinuities), but we argue that the
evidentiary standard can, and should, be raised by including objective sources of evidence and
quantitative methods whenever possible
In practice, the post-positivists generally rely on traditional qualitative methods for
conducting advocacy evaluations. In their guide, Reisman et al. (2007) suggest using focus
groups, interviews, observation instruments, document content analysis, and surveys. The guide
created by Coffman for the Harvard Family Research Project (2009) states: “Like all evaluations,
advocacy evaluations can draw on a familiar list of traditional data collection methods, such as
surveys, interviews, document review, observation, polling, focus groups, or case studies.” Even
newer methodologies created for the specific purpose of evaluating advocates (such as the
bellwether methodology, policymaker ratings, intense period debriefs, and system mapping
(Coffman and Reed, 2009) or policy champion ratings (Devlin-Foltz and Molinaro, 2010)) are
mainly based on qualitative approaches and participant self-assessments. It should be noted that
current post-positivist models do occasionally include more objective measures such as policy
tracking, but this tends to be the exception rather than the rule. Still, while the post-positivists
15
bring added structure and rigor to advocacy evaluation models, they are in effect typically
restricting their available tools to a set that limits the ability of the evaluator to create inferences
about program impacts.
The post-positivists have also built a number of resources detailing short- and
intermediate-term outcomes that foundations and practitioners can use to evaluate the
performance of programs where formal policy change is on a longer timeframe. Reisman et al.
(2007) have created a comprehensive list of possible outcomes, including ideas for metrics that
track organizational capacity, strength of advocacy alliances, public support, improved policy,
and broad social change. Coffman (2009) similarly provides a list of potential intermediate-term
outcomes as well, including awareness, salience, visibility, political will, donors, and policy
champions. However, while helpful in identifying what one could potentially measure, neither of
these guides presents advice on how these might specifically be measured.
The most important contribution of the post-positivist model is that it creates structure
not only around the design of advocacy programs (like the constructivists), but also around the
evaluation conducted during or after the project has concluded. This approach is based on the
premise that there are social scientific methods that can be applied to the evaluation of policy
advocacy, even if the evidentiary standard is lower than that used in the study of more traditional
service delivery programs. Additionally, the post-positivist model assumes the existence of
objective facts and truth that can be uncovered in systematic ways. In this way, the postpositivists add far more rigor to their evaluations than either the anthropologists or the
constructivists, and results are externally verifiable.
Another strength of the post-positivist model is that it creates a realistic balance between
rigor and feasibility in evaluating advocacy projects. Unlike the other models, this approach is
16
scalable; it can be used to evaluate the performance of a large number of individuals or
organizations at the same time (which is a particular benefit to foundations with large portfolios
of advocacy grantees). In particular, because the approach is structured, it can be distilled into
practical tools for participants so that they can conduct their own data collection, and even
analysis, while respecting differences in organizations and approaches. There are a number of
such tools currently available to foundations and practitioners, including an online logic model
builder from Aspen Institute's Advocacy Planning and Evaluation Program, a logic model guide
from the Kellogg Foundation, and advocacy evaluation guides from the Annie E. Casey
Foundation, the California Endowment, and the Harvard Family Research Project.
The most significant weakness of current post-positivist approaches is that they rely too
much on self-assessment instruments and other qualitative methods to collect evidence. These
methods are a significant improvement over past practice and the completely unstructured nature
of other approaches, but there remain significant opportunities to introduce more quantitative
measurement strategies into advocacy evaluation models and to rely on more objective sources
of evidence (which we include in our model below). Lastly, even with the best models, strong
claims of casual attribution are limited. However, advocate contribution to shorter-term
outcomes can be assessed rigorously and the post-positivist model provides insights into how
that can be accomplished.
An Omnibus Model for Advocacy Evaluation
All four of the dominant advocacy evaluation paradigms have strengths and weaknesses.
These perspectives have contributions to make to a more rigorous and standardized approach to
advocacy evaluation, but also pitfalls and limitations to be avoided or mitigated to the extent
possible. In developing an Omnibus advocacy evaluation model, we combine the best aspects of
17
each approach and then build in additional components based on the principles of scientific
inference developed by King et al. (1994).4 Table 2 below summarizes the contribution of each
paradigm to our new omnibus model, along with the key ways in which our model then builds on
them to create additional rigor.
Table 2: Summary of the Contributions Made By Each Paradigm
Paradigm
Contribution
Nihilist
Add flexibility in amending plans and goals during a project
Anthropologist
Capture context through supplemental narratives; be explicit about
limits of causal claims; expect non-linearity
Constructivist
Create structure around advocacy planning (logic models, strategic
plans, goal-setting); need to have a theory of how policy change occurs;
focus on shorter-term outcomes, including advocate capacity
Post-Positivist
Create structure around evaluation of projects (using traditional social
science methods); belief that advocacy can be measured in systematic
ways to uncover objective truths; practical tools for participants so that
they can conduct their own evaluation activities
Omnibus Approach
(our model)
Seek objective data in measurement strategies whenever possible; build
quantitative performance measures (outputs and outcomes); connect
metrics and measurement strategies to a theory of policy change
As mentioned above, in our model we apply the key principles of scientific inference
identified by King and his colleagues (1994). Specifically, we argue that the characteristics that
are foundational for the development of any evaluation of advocacy projects are:
1) Creating inferences about the world: It is not enough to merely collect facts or tell
a story about a particular phenomenon. Rather, researchers must collect and analyze
empirical information in ways that allow them to infer beyond what was observed to
make statements about parts of the world that were not observed.
4
Our model was developed with the assistance of consultants at BTW Informing Change.
18
2) Structure matters: Theories of action, strategic plans, and goals should always be
established before an advocacy project begins. These should be as specific and
measureable as possible. This is the only way an evaluator can establish rigorous and
reliable evidence of performance. A coherent theory of change “is a concrete
statement of plausible, testable pathways of change that can both guide actions and
explain their impact. In this way, a theory of change provides a roadmap for action
and a framework to chart and monitor progress over time” (Kubisch et al., 2002).
3) Methods matter: According to King et al.: “Scientific research uses explicit,
codified, and public methods to generate and analyze data whose reliability can
therefore be assessed.” In short, the methods used in an evaluation must adhere to the
basic rules of scientific inquiry. For example, it is important to apply proper sampling
procedures to promote representativeness and minimize bias.
4) Accept uncertainty: Evaluations of advocacy generate evidence about the
relationships between the activities of advocates and changes in the policy
environment, and this evidence should be collected and analyzed in the most
objective and rigorous way possible. But, it must be acknowledged that even the best
evidence will fall short of the “gold standard” for making statements about cause and
effect and will not be able to create certainty.
5) Favor objective sources of data: Whenever possible, advocacy evaluations should
seek to collect and analyze objective data to produce evidence. This will not always
be possible as some outcomes require more subjective data sources and assessments
(i.e. interviews, surveys, etc). But, the researcher should always prioritize objective
19
sources over subjective. These include independently verifiable data points of both
outputs (services provided or activities conducted) and outcomes.
Our omnibus model starts with the constructivist requirement that advocacy projects and
evaluations begin with an explicit theory of change or logic model and then develop a set of a
priori targets or benchmarks that are directly and explicitly connected to that model. Every
project, whether a traditional service delivery program or an advocacy campaign, has to start
with a plan. These plans may change, and often do for various reasons (service programs learn
from mistakes and formative assessments to improve; advocates react to changes in the policy
environment), but every project needs to have some notion about what it will take to create
change and what that change will look like. Tools like the Aspen Institute's Advocacy Planning
and Evaluation Program online logic model builder are a good place for advocacy practitioners
to start.
We also follow the advice of the constructivists that evaluation models need to be built
on an explicit theory of policy change. Figure 1 below provides a graphic representation of that
theory, which is based heavily on the work of Coffman (2009) and Reisman et al. (2007). In
addition to the description of the policy process, we have also added a number of potential
metrics that could be used in a measurement strategy (informed by a number of the tools cited in
the previous sections). In this way practitioners can simply select the outputs they expect to
complete and the outcomes they seek to impact from a predetermined list.5 We have also
indicated which parts of the model are more amenable to causal inferences than others at the
bottom of the figure.
5
As noted above, this model and its associated tools have been specifically designed to measure state-level K-12
education reform advocacy toward a particular set of policy preferences. But, the model can be applied to other issue
areas or other policy preferences.
20
Figure 1 also contains our heuristic as to which areas of the policy change process are
more amenable to measurement using objective data sources (see principle #5 above). This is an
important aspect of the Omnibus model, as it graphically represents where evaluators and
practitioners are more likely to have access to objective sources of data, and where a heavier
reliance on subjective information is more likely. At the very beginning of the process (the
outputs) and at the very end (academic performance) there should be a heavy focus on objective
sources and quantitative measurement approaches. But, in the middle of the process (increasing
capacity, changing attitudes and behaviors, affecting policy) measurement precision decreases
and more subjective sources and qualitative measurement approaches will play a larger role in
the evaluation of performance. Again, our position is not that every measure has to be
quantitative and based on objective data sources, but that such measures should always be
favored when they are available.
21
Figure 1: Policy Change Model for Advocacy Evaluation in K-12 Education
Mapping Evaluation to Advocacy Org Logic Model
WFF
Grant
Inputs
Grantee
Outputs
Grant
Outcomes
Grant/Strategy
Outcomes
Strategy/ Initiative
Outcomes
Short-term
(0–5 years)
Medium-term
(6–10 years)
Long-term
(11+ years)
Schools &
School Systems
Policy
Children
Parents and Policy
Influencers
Choice
• Projectbased and
capacitybuilding
grants
• Movementbuilding
support and
guidance
• Other (e.g.,
research,
convenings)
Stages
Knowledge/
Awareness
• Operational
• Programmatic
Outputs
(services,
products)
Support
Involvement
Organizational
capacity of
Advocacy Org
Knowledge/
Awareness
Commitment
Action
• Development
• Placement on
the policy
agenda
• Adoption
• Blocking
• Implementation
• Monitoring and
evaluation
• Maintenance
Autonomy
Academic
performance
Competition
Quality
Policymakers
(Elected Officials & Civil Servants)
• Rallies held
• Reports
published
• Trainings
conducted
• Members
recruited
• Staff advocacy skill
development
• Media relationships
• Legislative process
knowledge
• Policy tracking
• Polling results
• Invitation to educate
policymakers
• Public statements of
support by policymakers
• Parent participation in
advocacy
•Citation of policy points
•Earned media
• Bills introduced
• Bills blocked
• Development of
rules and regs
• Number of
charter schools
• Voucher
programs
• Funding equity
• Choice
marketshare
• Student achievement
growth
• Student achievement
attainment
• HS graduation
• College matriculation
Confidence that given advocacy org caused the result…
Graphic Developed in partnership with BTW Informing Change
22
Once a theory of action has been established, along with a set of key goals, we then draw
from the post-positivist model the requirement that measurement plans have to be developed for
how progress against each of the goals will be tracked. However, we diverge slightly from the
post-positivists by seeking to base evaluations primarily, though not exclusively, on quantitative
measurement approaches and objective data sources rather than the qualitative methods and selfassessments that they recommend. One particular exception to this rule is the addition of
organization capacity assessment as the first short-term outcome in our model (which we base on
the work of Alliance for Justice, 2005). Because this is a new direction, we have created a tool to
assist in the development of quantitative performance metrics that can be tied to any advocacy
logic model.
The basis of the performance metric builder tool is the idea that every performance
measure should contain five pieces of information to be valid and evaluable (King et al., 2011):

WHAT is going to change or be accomplished through the program?

HOW MUCH change will occur? What will the level of accomplishment be?

WHO will achieve the change or accomplish the task?

WHEN will the change or accomplishment occur?

HOW DO WE KNOW the change occurred?
The tool enables practitioners to select the key outputs and outcomes that they expect to
accomplish during the term of a grant from our policy change model. Then, it provides a
structured template whereby the practitioner only has to complete each of the five key questions
set forth above, with advice on best practices for using objective data, where possible, to create
quantifiable goals. Some examples of what this tool looks like are provided in Figure 2 below.
23
Figure 2: Examples of the WFF Advocacy Performance Measure Builder
Outputs
Outcomes
24
Examples of objective output measures generated by the tool that demonstrate how an
advocacy organization is executing on its theory of action and that allow for a credible case to be
made that an advocacy organization in fact contributed to a policy outcome include:










Number of earned media editorial board meetings held, outreach attempts to reporters,
press releases developed/distributed;
Number of media partnerships developed, distribution outlets accessed through media
partnerships;
Number of members of your organization, constituencies represented in the coalition,
coalition meetings held;
Number of communities where organizing efforts take place; community events/trainings
held;
Number of rallies/marches held;
Number of education materials developed/distributed, distribution outlets for education
materials;
Number of briefings or presentations held;
Number of research/policy analysis products developed/distributed, distribution outlets
for products;
Number of educational meetings/briefings held with policymakers/opinion leaders;
Number of relationship-building meetings held with key policymakers/opinion leaders;
Examples of objective outcome measures of advocacy organization impacts are:







Invitation of an advocacy group to testify in a legislative hearing;
Publication of an op-ed describing a policy position in a major news outlet;
Public citation of an advocacy organization by a policymaker;
Number of policymakers inviting advocacy organization to talk about a particular policy
proposal;
Number of parent advocates who chose to attend a rally at the statehouse;
Invitation of advocacy organization to serve on a policymaking task force; or
Invitation of advocacy organization to participate in the rulemaking process.
Appendix A provides a list of sample performance measures that could be developed using the
tool.
In addition to these discrete performance measures, we also draw from the insights of the
anthropologists and the constructivists by building in flexibility for changing goals mid-course
through a formal metric amendment process, as well as a narrative component to the reporting
requirements so that qualitative information about contextual factors and other pertinent
25
information about performance is captured. But, we follow the direction of King et al. (1994) and
prioritize the information that is collected through structured, public methods that adhere to the
principles of scientific inference. The narrative context is important as supplemental information
that informs the interpretation of findings, but it does not form the basis of findings themselves.
In sum, the process of conducting advocacy evaluation using this model is as follows:
1) Start by creating a plan of action or logic model based on our theory of policy change
(Figure 1 above).
2) Select key outputs and outcomes related to that plan. Our tool provides a predetermined
set of possible options for each.
3) Build performance measures for each output and outcome around five key elements,
using our tool to ensure rigor and objectivity to the extent possible.
4) During the term of the project, revise metrics if necessary as conditions change.
5) Generate and submit regular reports, which are used as formative assessments to revise
plans during the term of the project and summative assessments to rate performance at
the end of the project term.
Ultimately, advocacy practitioner reports are used as the basis of a formal evaluation conducted
by a funder, but the advocacy practitioner’s process of marshaling evidence to demonstrate
success in implementing a theory of change is useful in itself as formative assessment. In this
way, evaluation according to the structure of the Omnibus model becomes evaluation not merely
for accountability, but also evaluation that is valuable for organizational learning. These
evaluations then inform grantmaking decisions and strategic discussions about how to increase
the return on advocacy projects in the funder’s portfolio.
26
Discussion
This new model aims to makes a number of important contributions to the literature and
potentially to advocacy evaluation practice as well. The most important is that it makes it
possible to determine the extent to which individual advocates contributed to key outcomes
based as much as possible on objective data. When used to evaluate short-term outcomes with
strong measurement approaches, it can even move an evaluation from contribution closer to
attribution, allowing for statements that are more causal rather than merely correlational. Another
contribution is the application of quantitative methods and more objective data sources for
tracking and measuring the performance of advocates toward meeting predetermined project
goals. By enhancing the rigor of measurement, evaluations can make stronger, more reliable, and
more valid inferences about the performance of various advocates within a particular policy
context. Lastly, by contributing standardized tools for each step of the process, the model can be
a cost-effective approach to measuring large numbers of advocacy projects.
There are, of course, limitations to any model or methodology, and this Omnibus model
is no different. As King et al. (1994) caution, there will always be uncertainty when we try to
understand cause and effects in the real world, and this holds particularly true in the context of
advocacy evaluation. This uncertainty requires humility on the part of evaluators about the extent
to which observations, collected data, and narratives tell the whole story. In addition, most
advocacy projects play out over a much longer period than the grant term. As such, findings will
generally be limited to short- and intermediate-term outcomes (though a strength of this model is
its ability to rigorously capture impacts at these more proximate stages). This is the trade-off; the
shorter timeframe greatly limits the ability of evaluators to make causal claims about what
27
ultimately led to particular policy changes in the long term, but the ability to make causal claims
is enhanced when focused on short- and intermediate-term outcomes.
Conclusion
Reisman et al. (2007) note in their report that the history of evaluation in philanthropy is
in many ways currently repeating itself:
In the early 1990s, program evaluation was a new concept for many organizations
involved in the delivery of social and human services. As program evaluation began to be
widely implemented, skepticism and worry were common among many social and human
service program providers. Complaints included providers’ perceptions that the results of
their programs’ work could not be adequately named or measured; that program
evaluation was far too academic and complex to implement in a program setting, and that
evaluation would surely take away resources from direct service delivery. Over the past
decade and a half, evaluation has become much more commonplace for social service
programs, particularly in the non-profit and public sectors…The situation is reminiscent
of attitudes toward the measurement of advocacy and policy work. Evaluation in this
field has been viewed as a new and intimidating prospect, though it does not have to be.
Just as the early skeptics of evaluating traditional service delivery programs eventually lost out to
those seeking to bring more rigor to studying the effects of such programs, so too we believe that
today’s skeptics of measuring the effectiveness of advocacy projects. In each we see a natural
progression from beliefs that a phenomenon cannot be measured, to arguments that measurement
can only happen using atypical and unscientific methods, and then finally to the general
acceptance of using standard social science models and techniques to determine discrete
performance with a relatively high degree of precision. Ultimately, we hope that the new
approach presented in this paper will advance the conversation among researchers, and inform
the practice of evaluators and policy advocates, as we move toward that final stage of
acceptance.
28
Appendix A: Performance Measure Examples6
Sample Outputs:

Hold Education Sessions: By November 2013, Grantee will hold one-on-one meetings to
discuss education policy generally with 3 school board members, 10 state legislators, and
the district superintendent, as recorded in program management files.

Hold Education Sessions: Grantee will organize and execute at least 10 educational
meetings on charter school policy improvements for community leaders and
policymakers by June 30, 2012, as recorded in program management files.

Conduct Rally: Grantee will participate in organizing and hosting (or co-hosting) a rally
to support charter schools at the Statehouse by April 30, 2012. The event will be attended
by at least 100 people, as recorded in program management files.

Recruit New Contacts: By December 2013, the E-advocacy mailing list will be increased
from 4,200 to 6,000, as recorded in program management files.

Contact with Legislators: Program staff will conduct at least 12 meetings with targeted
legislators and/or their staff to educate them on general issues related to school choice
programs during each of years 1 and 2 of the program, as measured by program records.
Sample Outcomes:
6

Increased Awareness: By June 2014, at least 25% of key policymakers and opinion
leaders will report being aware of [policy of interest], as measured by a grantee
conducted/third-party survey of a select group of key policymakers and opinion leaders.
Currently, 10% of key policymakers and opinion leaders are aware of [policy of interest].

Increased Support: By June 2015, there will be an increase of 25% in the number of key
policymakers and opinion leaders that publicly support [policy of interest], as measured
by media hits and records of public remarks of policymakers. Currently, 10% of key
policymakers and opinion leaders support [policy of interest].

Improved Policy: By May 2014, school district policy will change so that parents who
want to attend a school anywhere in the city will get free transportation, as recorded in
official district policy. Currently, free transportation is only provided for up to 10 miles
from the residence.

Improved Policy: By October 2012, the cap on charter schools in the state will be
increased by at least 50 schools, as recorded in official state policy. The current cap is
100 schools.
These examples are drawn from King et al. (2011)
29
Bibliography
Alliance for Justice. (2005). Build Your Advocacy Grantmaking: Advocacy Evaluation Tool.
Washington, DC.
Beer, T. (2012). Best Practices and Emerging Trends in Advocacy Grantmaking. Center for
Evaluation Innovation.
Bill and Melinda Gates Foundation. (2010). A Guide to Actionable Measurement. Accessed May
2012 at: http://docs.gatesfoundation.org/learning/documents/guide-to-actionablemeasurement.pdf.
Brest, P. (2012). A Decade of Outcome-Oriented Philanthropy. Stanford Social Innovation
Review, Spring 2012.
Britt, H. & Coffman, J. (2012). Evaluation for Models and Adaptive Initiatives. The Foundation
Review, 4(2).
Coffman, J. (2009). A User’s Guide to Advocacy Evaluation Planning. Harvard Family Research
Project.
Coffman, J (2008). Foundations and Public Policy Grantmaking. James Irvine Foundation.
Coffman, J. & Beer, H. (2011). Evaluation to Support Strategic Learning: Principles and
Practices. Center for Evaluation Innovation.
Coffman, J. & Reed, E. (2009). Unique Methods in Advocacy Evaluation. Innovation Network.
Accessed May 2012 at:
http://www.innonet.org/resources/files/Unique_Methods_Brief.pdf
Council of Nonprofits. (2012). Self-Assessment and Evaluation of Outcomes. Accessed May
2012 at: http://www.councilofnonprofits.org/resources/resources-topic/evaluation-andmeasurement .
Cutler 2012 Generosity Without Measurement: It Can't Hurt. Family Care Foundation website.
Accessed May 2012 at: http://www.familycare.org/opinions/generosity-withoutmeasurement-it-cant-hurt.
Devlin-Foltz, D. & Molinaro, L. (2010). Champions and “Champion-ness”: Measuring Efforts to
Create Champions for Policy Change. Center for Evaluation Innovation.
Gertler, P., Martinez, S., Premand, P., Rawlings, L., & Vermeersch, C. (2011). Impact
Evalaution in Practice. The World Bank.
30
Greene, J. (2005). “Buckets Into the Sea: Why Philanthropy Isn't Changing Schools, and How it
Could," in With the Best of Intentions, edited by Frederick M. Hess, Harvard Education
Press: Cambridge, MA.
Guthrie, K., Louie, J., David, T., & Foster, C. (2005). The Challenge of Assessing Policy and
Advocacy Activities: Strategies for a Prospective Evaluation Approach. The California
Endowment.
Innovation Network. (2008). Speaking for Themselves: Advocates’ Perspectives on Evaluation.
Accessed May 2012 at:
http://www.innonet.org/client_docs/File/advocacy/speaking_for_themselves_web_basic.
pdf
Innovation Network. (n.d.) A Practical Guide to Advocacy Evaluation. Accessed May 2012 at:
http://www.innonet.org/client_docs/File/advocacy/pathfinder_advocate_web.pdf
Kania, J. & Kramer, M. (2011). Collective Impact. Stanford Social Innovation Review, Winter
2011.
King, G., Keohane, R., & Verba, S. (1994). Designing Social Inquiry: Scientific Inference in
Qualitative Research. Princeton University Press: Princeton, NJ.
King, M., Holley, M. & Carr, M. (2011). How to Construct Performance Measures: A Brief
Guide for Education Reform Grant Applicants to the Walton Family Foundation.
Available online at: http://www.waltonfamilyfoundation.org/about/evaluation-unit.
Kubisch, A.C., Auspos, P., Brown, P., Chaskin, R., Fulbright-Anderson, K., & Hamilton, R.
(2002). Voices From the Field II: Reflections on Comprehensive Community Change.
The Apsen Institute: Washington, DC.
Reisman, J., Gienapp, A., & Stachowiak, S. (2007). A Guide to Measuring Advocacy and Policy.
Annie E. Casey Foundation.
Rossi, P., Lipsey, M., & Freeman, H. (2004) Evaluation: A Systematic Approach. SAGE
Publications: Thousand Oaks, CA.
Teles, S. & Schmitt, M. (2011). The Elusive Craft of Evaluating Advocacy. Stanford Social
Innovation Review, Summer 2011.
Shambra, W. (2011) Measurement Is a Futile Way to Approach Grant Making. Chronicle of
Philanthropy, February 6, 2011.
Shaywitz, D. (2011). Our Metrics Fetish – And What to Do About It. Forbes Magazine, June 23,
2011. Accessed May 2012 at:
http://www.forbes.com/sites/davidshaywitz/2011/06/23/our-metrics-fetish-and-what-todo-about-it.
31
W.K. Kellogg Foundation. (2005). W.K. Kellogg Foundation Evaluation Handbook. Accessed
May 2012 at: http://www.wkkf.org/knowledge-center/resources/2010/W-K-KelloggFoundation-Evaluation-Handbook.aspx.
Wales, J. (2012). Advancing Evaluation Practices in Philanthropy. The Aspen Institute. Accessed
May 2012 at: http://www.aspeninstitute.org/publications/advancing-evaluation-practicesphilanthropy.
Whelan, J. (2008). Advocacy Evaluation: Review and Opportunities. Innovation Network.
Accessed May 2012 at:
http://www.innonet.org/resources/files/Whelan_Advocacy_Evaluation.pdf.
32
Download