How Viable is Evaluation Capacity Building

advertisement
How Viable is Evaluation Capacity Building
in Schools that Give Standardized Tests?
Jean A. King
University of Minnesota
November, 2004
Keynote address prepared for the Virginia Association of Test Directors, Richmond, VA.
Before I launch into my content, let me briefly explain that I participated in a major
science experiment this summer. After a marvelous week camping in northern Minnesota, I
developed acute Lyme disease and Lyme-induced Bell’s palsy while en route to New York in a
U-Haul truck. I tell you this only because while, after three months, my face is getting close to
what it once looked like, what a long three months it’s been. I am not one to complain. There
are many far worse diseases, ones that kill people. I am going to fully recover, and I now
routinely experience close kinship with people in the Fellowship of Lyme or others who have
had Bell’s palsy or whose relatives have had Bell’s palsy or their friends. Plus it’s a family
tradition that when life gives you Lyme, you make lime-ade. My husband tells me to keep a stiff
upper lip. When we were watching the last Presidential debate, I turned to him in frustration at
one point and said, “I just can’t look at that man with a straight face.” To which he replied,
“Jean, you can’t look at anyone with a straight face.” Right. We always go for the joke in our
family, knowing that you can apologize later, but that straight line may never come again. Trust
me: I will not speak out of both sides of my mouth this morning. Perhaps in a month.
I wanted to begin by explaining my funny eye, but as I was pulling my thoughts together,
I realized that my experience with Bell’s palsy actually is a useful metaphor for discussing
evaluation capacity building. (This is what happens when English majors age—everything
becomes a metaphor for something else. Garrison Keillor has captured it exactly on Prairie
Home Companion with the English Majors Society; once an English major, always an English
major.) In any event, Lyme has taught me about what it means to look different from everyone
else. People notice you. People stare at you. People ask you what’s going on. And so it is in
my experience when you are a testing office trying to build evaluation capacity in a district.
Administrators and faculty notice you. They ask what’s going on. It is definitely not business as
1
usual, and the title of this speech becomes a key question: “How Viable is Evaluation Capacity
Building in Schools that Give Standardized Tests?”
Since the burgeoning of the accountability movement in the 1970s, American school
districts have been responsible for increasing amounts of standardized testing. This will not
surprise you. In the past thirty years, however, the parallel development of the program
evaluation function in most districts has been largely overwhelmed. Research, testing,
assessment, and evaluation departments--by whatever name--exist in many districts, especially
large ones, and routinely complete mandated evaluations for state and federally funded grants.
Although there are exceptions, it is fair to say that, despite the purported benefits of doing so and
despite counter examples, the processes of developmental, formative or summative program
evaluation for decision-making and program improvement have rarely been fully
institutionalized in districts (Sanders, 2000).
In these same thirty years, program evaluation practice has evolved to include a range of
acceptable alternatives, from traditional evaluation practice to increasingly collaborative and
participatory forms where the evaluator's role becomes one of training and facilitation.
Simultaneously, the notion of organizational learning, the process through which organizations
over time use data to improve themselves, has gained credibility in the evaluation community
(Preskill & Torres, 1999). These ideas have led to the emergent discussion and practice of
evaluation capacity building, my topic this morning.
A brief comment on the professional experience from which I am speaking today. First, I
served as Research and Evaluation Coordinator for the third largest district in Minnesota from
1999-2002 and still consult there as often as I am able to. It’s a fascinating context. Garrison
Keillor (of Prairie Home Companion fame, the creator of Lake Wobegone) graduated from a
2
high school in our district, and the district is in Jesse Ventura country to the northwest of the
Twin Cities. (Former Governor Ventura lives nearby and actually helped coach football at one
of our high schools.) Unlike the two largest districts in Minnesota (Minneapolis and St. Paul), the
district I worked in is not urban and is not, therefore, similarly challenged. Most of our students
do fine both on state-mandated tests and on Board-mandated nationally-normed tests. (We really
are above average.) Our challenges stem instead from a low tax base and hence a low per pupil
funding allowance; from continuing growth in the district as families continue to expand into our
13 communities; and from our commitment to help every single District #11 student reach the
maximum of his or her potential.
As may well be the case in the settings you work in, program evaluation is a relatively
recent addition to district practice. Since the mid-1980's, the Student Assessment Department
has engaged in "cutting edge" work related to (not surprisingly) student assessment, using, for
example, criterion-referenced tests linked directly to district curricula and a high school
graduation test (the Assurance of Basic Learning) years before the state of Minnesota mandated a
similar test. We were an early adopter of performance based assessment, piloting classroom
assessment packages K-12, again, before they were mandated state-wide. But as important as
student assessment is, the current superintendent recognized an equally important role for
systematic program evaluation, the need to create and/or compile information to inform decisionmaking. Imagine that. And that's where I came in.
My role as Coordinator of Research and Evaluation for the district was distinct from the
role of the Student Assessment Facilitator, and I worked for three years to develop within the
district the capacity to create, store, and use evaluation information. This is a school district
where administrators were keenly aware of the importance of continuous improvement and
3
wanted to purposefully build evaluation capacity. Given the resource strain and a Board context
that made new administrative positions unlikely, the option of expanding the assessment
department with additional evaluation staff was not viable. They sought instead to involve
district staff (district and building administrators and teachers alike) and parents in participatory
evaluation activities related to programs in which they were involved, hoping, over time, to make
evaluation an integral part of every person's job. Hence capacity building. (Note: I left when it
became clear that my position would never be funded, at least in my lifetime. So the quick
answer to the question of how viable is ECB in schools that give standardized tests may be: Not
very. Maybe even not at all.
So the content of my presentation this morning is grounded firmly in my experiences in a
large central office with a small, but enthusiastic staff as we set out to institutionalize evaluation.
(Let me note that there was no yellow brick road.) I am also presenting what I learned in a
research project that began last year in which a colleague and I studied the process of capacity
building in three non-profit organizations: my school district (after I left), a major museum in the
Twin Cities, and a large social service agency in West St. Paul. Each of these institutions was
serious about increasing its capacity to make evaluation routine, a way of life for staff. We were
interested in comparing and contrasting the process in the three organizations, but found far more
commonalities than differences.
Let’s be clear what we are talking about. Here is a definition of evaluation capacity
building, taken from an evaluation journal early on in the development of the concept. This is
the definition given by Stockdill, Baizerman, and Compton. You’ll notice I’ve written it like a
poem, which encourages you to make sense of every word or phrase:
Intentional work
to constantly
[You do this on purpose]
[You can’t stop]
4
co-create and
co-sustain
an overall process that makes
quality evaluation
and its uses
routine
in organizations and systems
[Co => together, create => make it anew]
[Again, co=> together, but sustain => keep it going]
[Big picture, activity]
[Really good evaluation (both process & judgment)]
[Do something with the process and results]
[Boring, every day]
[Where you build capacity (could do it in part)]
So that’s what I’m talking about, an in-house, organic evaluation function. I also refer to this as
free range evaluation, a process that lives on its own and is stronger as a result. This is my life’s
work. I’m now going to tell you what I would do if I were in charge of a district effort to build
evaluation capacity. My experience and research has suggested five key activities that can help
to build a culture of evaluation over time. I would propose working on these, staged over several
years’ time, knowing full-well that every single one will take much longer than I imagine and
will evolve as people bring it to life in their own context. I invite you to consider how viable
these activities might be in your own district context. I will note with some caution that ECB
continues at the district I worked in, but it is a slender and fragile being.
Activity 1. Create a District Program Evaluation Advisory Committee
I would establish a small, but really nice Program Evaluation Advisory Committee
(PEAC), initially consisting of central office staff and, depending on the size of the district, five
to six positive-minded school-based opinion leaders. This group would not discuss testing issues
except as they related to program evaluation (as I’ll discuss in a moment.) This small group
would be the primary intended users of our evaluation process, what I sometimes call—‘though
hating the negative image--the evaluation “virus” that, over time, will “infect” the district culture
with positive evaluation thinking. The Committee will charge itself with several ongoing
activities; members will collaboratively help design studies, monitor evaluation activities, get
first-hand feedback from data users, and so on. Through regular meetings, they are the heart of
5
an ongoing reflection process. Committee membership will be flexible—people will come and
go—and is likely to evolve as people’s lives and schedules change.
In my experience, this Committee needs four different types of members, often times
embodied in just a few individuals:
1. Staff who are highly respected and truly know the district culture and inhabitants well. These
are individuals who have excellent interpersonal skills (including the intuition to pick up on
people’s affect), may have worked in the district a few years, and can readily learn what their
colleagues are really thinking because people freely talk to them.
2. People who “get” evaluation and enjoy data. In my experience there are individuals in every
school who enjoy the evaluation process, either because they understand it intuitively and are
eager to learn more or because they have had formal training, typically in a degree program.
They often admit this attribute sheepishly, knowing that many will label it odd.
3. Those positive, “can do” individuals who can get things done efficiently and thoughtfully.
4. At least one person with a good sense of humor who will remind the group that this work
should be agreeable, even at its most challenging, and that an occasional smile or chuckle is a
necessary thing.
Should you include nay sayers on this initial Committee? Some think that including negative
individuals in the initial steps of a change process will give diverse perspectives to Committee
discussions, encourage them to get with the program, and support the notion of representative
democracy in the district. In my experience, these folks are rarely helpful and often can
dismantle or demoralize an otherwise enthusiastic group. Good news: I say do not include
negative people in this initial group. This does not mean, however, that you ignore these folks;
6
the Advisory Committee must attend to their interests and concerns individually and extremely
purposefully or they may in opposition shut the process down.
Since conflict is inherent in the evaluation process, the conceptual framing for the
Committee’s work is that of the dual concerns model for understanding conflict (Deutsch, 1949).
Deutsch’s model reminds us that people in conflict have two important concerns: reaching a goal
(in this case, conducting a meaningful evaluation) and simultaneously maintaining relationships.
Collaborative problem-solving/negotiation is the process that facilitates both goal attainment and
positive relationships (there are four other processes that are less effective—forcing,
withdrawing, smoothing, compromising), so my job as evaluation leader would be to help the
members of the PEAC engage in collaborative problem-solving as we move through the capacity
building process.
Activity 2. Begin to Build an Evaluation Infrastructure
The Program Evaluation Advisory Committee marks the first step in creating an
evaluation infrastructure the district. You may already have such a committee, although it may
be focused on testing and assessment issues, rather than evaluation. Once formed, the PEAC
will be charged with responsibility for two types of activities: a) studying the district’s context to
determine the availability of certain infrastructure requirements; and b) directly taking other
actions themselves to build the infrastructure.
a) Assessing the Context. There are three areas of the district context that need to be
examined. First, we would be well advised to make sense of the accountability context in which
the district finds itself. State accountability requirements, typically driven by federal No Child
Left Behind and state mandates, may require that the schools produce certain types of data
routinely, and our infrastructure must allow for this, in addition to any other targeted evaluation
7
efforts we might propose, either because they are required by funders or would provide helpful
information. It would also be important to assess the district accountability environment to
determine possible interest in--or opposition to—the Program Evaluation Advisory Committee’s
activities. If the external environment for whatever reason is not likely to support the capacity
building effort, I’d want to know that sooner rather than later. Simply put, there are settings in
which it will not be viable.
A second area of context requiring immediate examination is that of decision making.
First, what is senior administration’s interest in and demand for program evaluation information?
To what extent does the superintendent want to be involved in the process or in tracking its
progress? Second, is there a feedback mechanism in place that will effectively position the
results of evaluations into decision-making processes at both the district and the school-level?
Absent such a mechanism, the group may have to create one. Third, will school teachers and
staff have sufficient autonomy in their decision making, i.e., will people truly be able to act on
data, or will some structure external to the school limit or determine actions in certain areas? The
Committee needs to understand the bounds within which teachers and staff must operate. An
obvious example here might be any cross-district curriculum. A principal in my district recently
reported finding the mandatory new first grade math curriculum still in shrink wrap in a closet in
one of her teacher’s rooms. Hmm. Does the teacher have the right to develop his own
curriculum? Not if the decision has been made at the district level. Evaluation capacity building
must be conducted with decision-making constraints.
A third and final topic for study in the context is likely access to resources. If resources
are available, the capacity building effort is enhanced; if not, then the time-line may be
significantly slowed or even made impossible. These resources are of two types:
8
a) Access to evaluation/research knowledge and training- If staff want to build evaluation
capacity over time, then access to evaluation knowledge (e.g., data analysis, interpreting test
scores, research bases) is essential. The staff already have access to you and whoever else is
in your department, and you may be eager to teach people—if your schedule allows time for
that. Access to evaluation/research knowledge typically includes access to people—district
personnel, other external consultants, or even volunteers (e.g., faculty or students from a
local university engaged in a service learning effort) who can explain evaluation. It may also
include access to information on evaluation resources (e.g., Websites, books, evaluation
reports, tools) or access either to formal training or informal coaching on evaluation
processes.
b) Access to resources that support the evaluation process- Beyond basic resources like
copying, computers, data print-outs, etc., this includes fiscal support from the central
administration to provide, for example, time within the work day to collaborate on evaluation
activities (e.g., by providing substitute teachers to free people from classroom
responsibilities), funds to buy pizza if a group works into the evening, or honoraria for
faculty or staff who commit to participating extensively in the evaluation process. My
experience over the course of the last decade suggests that although teachers truly value time
to collaborate during the school day, they perceive being gone from class (i.e., preparing for
a substitute teacher and then dealing with the effects of being gone) as far more costly than
working after school, in the evening, or on weekends. Needless to say, this creates a difficult
situation for building evaluation capacity! (Review framework up to that point.)
2) Building the Infrastructure Directly. In addition to assessing the context, the Program
Evaluation Advisory Committee should begin work on activities to directly increase the
9
evaluation infrastructure in the internal organizational context and thus create a centralized
conception of evaluation for the district. You need visible and supportive leadership at both the
building and at the district office. This is a necessary component in my experience.
The Committee should also seek to directly improve the district climate to make it
supportive of evaluation. Members would do this in part by their positive attitudes toward
evaluation, their open mindedness when challenged, their respect for colleagues’ opinions, their
enthusiasm for risk taking and creativity, and a continuing sense of good humor. Committee
members would become evaluation champions, serving as visible supporters of the process,
mentioning it in favorable terms, identifying issues for possible study, taking on nay-sayers
pleasantly but firmly throughout the school day and across the school year. I understand this
sounds a bit saccharine. Glenda, the Good Witch. Miss America. Mrs. Doubtfire. Right. And
while this behavior is important, it may not be sufficient to change things.
The Committee can establish three structures to teach the evaluation process over time.
First, they should collaboratively develop an explicit plan and realistic timeline for building
evaluation capacity in the district. Such a plan would include the following content:

Re-writing school policies and procedures to include core evaluation capacity building
principles (e.g., the expectation that routine activities related to school improvement
activities will be evaluated annually, routine compilation and discussion of data related to
core activities, explicit evaluative roles for committee chairs);

Creating opportunities for faculty and staff (and, over time, students and parents) to
collaborate and participate in various ways in ongoing evaluation activities and then nicely
mandating them over time- One way to facilitate this, borrowed from social psychology, is to
10
create interdependent roles whereby people necessarily support each other in completing
evaluation tasks;

Developing formal mechanisms for reflection on data- It would be helpful to create peer
learning structures through which teachers and other staff could come together to reflect on
evaluation data routinely.
Second, based on resources available for the task, the PEAC would decide how to build a
within-school infrastructure to support the technical components of the evaluation process. This
is necessary to insure the accuracy of data collected and the efficiency of the process. The
infrastructure would include a variety of activities, e.g., an occasional process to measure needs,
a mechanism to frame questions and generate studies, a way to design evaluations, collect,
analyze, and interpret data, and report results both internally in the school community and to the
wider public.
Third, the Committee would create a structure to socialize faculty and staff purposefully
into the organization’s evaluation process, both initially and over time. There would be clear
expectations that everyone is expected (dare we say required?) to “do” evaluation (the stick) and
equally clear incentives for participation (carrots). The PEAC would also structure ways for
those interested to receive training in evaluation, either through informal workshops at the school
or formal courses offered in the community. To the extent that staff teach and office near one
another and regularly socialize during the workday (e.g., sharing meals and snacks), socializing
will be easier. The Committee might also need to consider trust building activities in the shortterm. Program evaluation should be non-threatening and even fun.
Establishing the Program Evaluation Advisory Committee and beginning to work on the
evaluation infrastructure would mark an important beginning. In that critical first year or two, I
11
would also propose three key activities to engage people and model the evaluation process:
making sense of district test scores, conducting at least one highly visible participatory study,
and laying the groundwork for eventual action research efforts by groups of faculty and staff.
Ideally, a different member of the Program Evaluation Advisory Committee might lead each of
these efforts with other members of the school community—faculty, staff, parents, district office
folk, etc.—participating and learning alongside. It is likely that the resources (especially time) to
do all three activities might well be lacking, and one might stage these over several years. But I
would want people to understand how these could help make the evaluation process meaningful
in the lives of faculty and staff and, potentially, parents and students. Absent meaningful
examples, administrators, teachers, staff, and community folks alike may never move from
intuitive evaluation to systematic efforts.
Key Activity 1: Making Sense of the Test Scores.
The importance of this activity cannot be overstated. American education currently lives
in an environment laden with accountability measures. Some call this “accountabilism”:
The belief that the appearance of measurement and audit is an essential feature of public
accountability within the public sector, and that responsible action in systems,
organizations or initiatives, and individuals who are members of such systems,
organizations or initiatives, can be augmented through the frantic, rabid, face-valid-only
measurement of anything and everything; the belief that threat via audit is both necessary
and sufficient to produce improvement in any systems; the over-reliance on
decontextualized measurement and measures; noun- a religious cult amongst public
administrators and bureaucrats in the late 20th and early 21st century, believed to be of
political origins.
12
The point is simple: the failure to make sense of district test scores may quickly lead to
additional internal dismay and public humiliation. I would therefore propose that one or two
members of the PEAC agree to lead a separate committee that would be charged with studying
the district’s and schools’ test scores for the past several years and interpreting them with a view
to action. We would access someone (from the district office, a local university, or research
shop) with a good understanding of score interpretation and ideally the ability to work with the
data to answer targeted questions the group might raise. How helpful this activity will be in
relation to specific actions teachers can take in their classrooms depends greatly on the content of
the tests and the quality of the existing data. Regardless, it is key activity and could lead to the
development of a functional data base teachers could access for information on their current
students. This would be an important development for the use of student data over time and
hence for evaluation capacity building. You may already do this.
This committee might also develop program theory that would plan backward from the
necessary achievement outcomes to identify explicit strategies to increase learning in specific
areas. Again, the assistance of an outside expert in student learning could be extremely helpful.
We might choose to conduct meetings with small groups (e.g., teachers at given grade levels,
language arts teachers across grades, and so on) to process the data.
Key Activity 2- Conducting One Highly Visible Participatory Inquiry.
Modeling the evaluation process for the district is one way to demonstrate how you frame
an evaluation question, develop instruments, collect and analyze data, and then make
recommendations. I used this process when I served as Research and Evaluation Coordinator,
systematically teaching participants in the course of three highly visible studies on topics that
13
mattered greatly to people (i.e., high school graduation standards, special education, and changes
in the middle schools). People paid close attention because they truly cared about the outcomes.
I would propose a participatory evaluation process involving a team of 20-25 people-representatives of teachers, staff, perhaps parents, perhaps students, district representatives, and
university professors. The team would meet monthly throughout the course of the year, and the
PEAC representative with support from the evaluation consultants would meet in between to
prepare materials for the next month’s meeting. In my experience, the participants in such a
study become close friends while gaining a sense of how program evaluation works.
Key Activity 3- Instituting Action Research Activities—or At Least Planting the Seeds.
I have never succeeded in bringing this final activity fully to life anywhere I have worked, but it
remains one of my intentions when I collaborate with a district. The ideal would be for groups
of collaborating teachers and staff to institute action research efforts on specific interventions
with specific students, e.g., first grade teachers working with students struggling with letter
recognition or tenth grade staff whose students have low scores on their standardized test on a
certain topic. The action research cycle—plan, act, observe, and reflect—is intuitive; good
teachers will note that they informally engage in it every day. What I would propose is making
the process more explicit and public. In some sense, school improvement plans that most
districts require CAN be a form of action research if they are data-based in design. Since school
improvement planning is commonplace, it strikes me as an appropriate vector for introducing
this practice.
Teachers and administrators would meet and identify extremely specific instructional
activities that they know or believe are effective (“promising practices”) to teach this skill. They
would agree to try out a strategy, measure the results, and then meet in a month to discuss what
14
happened. This is the process Michael Schmoker presents in his Results books. In contrast to
Key Activity Two, the highly visible study that models participatory evaluation, these would be
fairly private studies that model how individual teachers can facilitate the specific learning of
specific individuals. As I mentioned before, action research could also be a method for an
annual data-based school improvement process. Although difficult to institute and sustain,
action research can provide visible and transparent results, a way to share and reflect on them,
and hence bring the evaluation process to life within classrooms. Based on my experience, the
PEAC would need to develop a process that includes training in collaborative teamwork, an
opportunity for people to identify their own burning issues, incentives for participation, and a
structure that will enable projects to be completed in a reasonable timeframe.
To summarize, I have presented a list of activities that I believe would help build a
culture of evaluation over time--creating a program evaluation advisory group, beginning to
build a formal evaluation infrastructure, making sense of test scores, conducting a highly visible
participatory inquiry; and instituting action research activities. Given the life of a school district,
it is inconceivable that you could tackle every one of these tasks, even if you had considerable
resources to devote, which you most likely don’t. My district increased spending each year on
standardized testing because we had to; discretionary dollars for evaluation work were hard to
find, except to the extent that we re-directed the existing school improvement process to address
the idea of continuous improvement. To return, then, to the question that is the title of this
presentation, exactly how viable is evaluation capacity building in schools that give standardized
tests? My heart would love to say, yes, this can really happen; free range evaluation can thrive in
the context of public school systems. My head tells me something different: it is a continuous
challenge. I have provided an outline of what I might do if faced with the ECB challenge in a
15
district. Let me leave you with some deep thoughts grounded in Minnesota connections that I
hope will make them memorable.
16
Deep Thought
1. There are specific actions
you can take and
structures to develop that
may create a professional
community in which
evaluation can thrive.
2. Look for opportunities
where program evaluation
information can add to
district discussions.
3. There are rarely
guarantees in program
evaluation.
4. If you work hard enough,
the impossible can
sometimes be achieved.
The Related Minnesotan
Judy Garland (born in Grand
Rapids, MN)
How Connected?
"Somewhere Over the Rainbow"
vs. "Follow the Yellow Brick
Road"- Never assume use will
happen, but move purposefully in
that direction.
Garrison Keillor
Laura Ingalls Wilder
Tell stories in a compelling way.
Build on what you and others near
you experience.
Keep it simple! What?
Prince (AKA the Artist
Formerly Known as the Artist
Formerly Known as Prince)
Rocky and Bullwinkle
(Frostbite Falls, MN)
Charles Lindberg
Herb Brooks, coach of the 1980
US Olympic hockey team
Do your best to laugh as you go
along.
Don’t be afraid to take calculated
risks; stay committed to the vision
Remember who beat the Russians
and went on to win the gold
medal? Sometimes the impossible
IS possible.
References
Deutsch, M. (1949). A theory of cooperation and competition upon group process. Human
Relations, 2: 199-231.
Preskill, H., & Torres, R. T. (1999). Evaluative inquiry for learning in organizations. Thousand Oaks,
CA: Sage Publications.
Sanders, J. R. (2000, April). Strengthening evaluation use in public education. Paper presented at the
annual meeting of the American Educational Research Association, New Orleans, LA.
17
Download