How Viable is Evaluation Capacity Building in Schools that Give Standardized Tests? Jean A. King University of Minnesota November, 2004 Keynote address prepared for the Virginia Association of Test Directors, Richmond, VA. Before I launch into my content, let me briefly explain that I participated in a major science experiment this summer. After a marvelous week camping in northern Minnesota, I developed acute Lyme disease and Lyme-induced Bell’s palsy while en route to New York in a U-Haul truck. I tell you this only because while, after three months, my face is getting close to what it once looked like, what a long three months it’s been. I am not one to complain. There are many far worse diseases, ones that kill people. I am going to fully recover, and I now routinely experience close kinship with people in the Fellowship of Lyme or others who have had Bell’s palsy or whose relatives have had Bell’s palsy or their friends. Plus it’s a family tradition that when life gives you Lyme, you make lime-ade. My husband tells me to keep a stiff upper lip. When we were watching the last Presidential debate, I turned to him in frustration at one point and said, “I just can’t look at that man with a straight face.” To which he replied, “Jean, you can’t look at anyone with a straight face.” Right. We always go for the joke in our family, knowing that you can apologize later, but that straight line may never come again. Trust me: I will not speak out of both sides of my mouth this morning. Perhaps in a month. I wanted to begin by explaining my funny eye, but as I was pulling my thoughts together, I realized that my experience with Bell’s palsy actually is a useful metaphor for discussing evaluation capacity building. (This is what happens when English majors age—everything becomes a metaphor for something else. Garrison Keillor has captured it exactly on Prairie Home Companion with the English Majors Society; once an English major, always an English major.) In any event, Lyme has taught me about what it means to look different from everyone else. People notice you. People stare at you. People ask you what’s going on. And so it is in my experience when you are a testing office trying to build evaluation capacity in a district. Administrators and faculty notice you. They ask what’s going on. It is definitely not business as 1 usual, and the title of this speech becomes a key question: “How Viable is Evaluation Capacity Building in Schools that Give Standardized Tests?” Since the burgeoning of the accountability movement in the 1970s, American school districts have been responsible for increasing amounts of standardized testing. This will not surprise you. In the past thirty years, however, the parallel development of the program evaluation function in most districts has been largely overwhelmed. Research, testing, assessment, and evaluation departments--by whatever name--exist in many districts, especially large ones, and routinely complete mandated evaluations for state and federally funded grants. Although there are exceptions, it is fair to say that, despite the purported benefits of doing so and despite counter examples, the processes of developmental, formative or summative program evaluation for decision-making and program improvement have rarely been fully institutionalized in districts (Sanders, 2000). In these same thirty years, program evaluation practice has evolved to include a range of acceptable alternatives, from traditional evaluation practice to increasingly collaborative and participatory forms where the evaluator's role becomes one of training and facilitation. Simultaneously, the notion of organizational learning, the process through which organizations over time use data to improve themselves, has gained credibility in the evaluation community (Preskill & Torres, 1999). These ideas have led to the emergent discussion and practice of evaluation capacity building, my topic this morning. A brief comment on the professional experience from which I am speaking today. First, I served as Research and Evaluation Coordinator for the third largest district in Minnesota from 1999-2002 and still consult there as often as I am able to. It’s a fascinating context. Garrison Keillor (of Prairie Home Companion fame, the creator of Lake Wobegone) graduated from a 2 high school in our district, and the district is in Jesse Ventura country to the northwest of the Twin Cities. (Former Governor Ventura lives nearby and actually helped coach football at one of our high schools.) Unlike the two largest districts in Minnesota (Minneapolis and St. Paul), the district I worked in is not urban and is not, therefore, similarly challenged. Most of our students do fine both on state-mandated tests and on Board-mandated nationally-normed tests. (We really are above average.) Our challenges stem instead from a low tax base and hence a low per pupil funding allowance; from continuing growth in the district as families continue to expand into our 13 communities; and from our commitment to help every single District #11 student reach the maximum of his or her potential. As may well be the case in the settings you work in, program evaluation is a relatively recent addition to district practice. Since the mid-1980's, the Student Assessment Department has engaged in "cutting edge" work related to (not surprisingly) student assessment, using, for example, criterion-referenced tests linked directly to district curricula and a high school graduation test (the Assurance of Basic Learning) years before the state of Minnesota mandated a similar test. We were an early adopter of performance based assessment, piloting classroom assessment packages K-12, again, before they were mandated state-wide. But as important as student assessment is, the current superintendent recognized an equally important role for systematic program evaluation, the need to create and/or compile information to inform decisionmaking. Imagine that. And that's where I came in. My role as Coordinator of Research and Evaluation for the district was distinct from the role of the Student Assessment Facilitator, and I worked for three years to develop within the district the capacity to create, store, and use evaluation information. This is a school district where administrators were keenly aware of the importance of continuous improvement and 3 wanted to purposefully build evaluation capacity. Given the resource strain and a Board context that made new administrative positions unlikely, the option of expanding the assessment department with additional evaluation staff was not viable. They sought instead to involve district staff (district and building administrators and teachers alike) and parents in participatory evaluation activities related to programs in which they were involved, hoping, over time, to make evaluation an integral part of every person's job. Hence capacity building. (Note: I left when it became clear that my position would never be funded, at least in my lifetime. So the quick answer to the question of how viable is ECB in schools that give standardized tests may be: Not very. Maybe even not at all. So the content of my presentation this morning is grounded firmly in my experiences in a large central office with a small, but enthusiastic staff as we set out to institutionalize evaluation. (Let me note that there was no yellow brick road.) I am also presenting what I learned in a research project that began last year in which a colleague and I studied the process of capacity building in three non-profit organizations: my school district (after I left), a major museum in the Twin Cities, and a large social service agency in West St. Paul. Each of these institutions was serious about increasing its capacity to make evaluation routine, a way of life for staff. We were interested in comparing and contrasting the process in the three organizations, but found far more commonalities than differences. Let’s be clear what we are talking about. Here is a definition of evaluation capacity building, taken from an evaluation journal early on in the development of the concept. This is the definition given by Stockdill, Baizerman, and Compton. You’ll notice I’ve written it like a poem, which encourages you to make sense of every word or phrase: Intentional work to constantly [You do this on purpose] [You can’t stop] 4 co-create and co-sustain an overall process that makes quality evaluation and its uses routine in organizations and systems [Co => together, create => make it anew] [Again, co=> together, but sustain => keep it going] [Big picture, activity] [Really good evaluation (both process & judgment)] [Do something with the process and results] [Boring, every day] [Where you build capacity (could do it in part)] So that’s what I’m talking about, an in-house, organic evaluation function. I also refer to this as free range evaluation, a process that lives on its own and is stronger as a result. This is my life’s work. I’m now going to tell you what I would do if I were in charge of a district effort to build evaluation capacity. My experience and research has suggested five key activities that can help to build a culture of evaluation over time. I would propose working on these, staged over several years’ time, knowing full-well that every single one will take much longer than I imagine and will evolve as people bring it to life in their own context. I invite you to consider how viable these activities might be in your own district context. I will note with some caution that ECB continues at the district I worked in, but it is a slender and fragile being. Activity 1. Create a District Program Evaluation Advisory Committee I would establish a small, but really nice Program Evaluation Advisory Committee (PEAC), initially consisting of central office staff and, depending on the size of the district, five to six positive-minded school-based opinion leaders. This group would not discuss testing issues except as they related to program evaluation (as I’ll discuss in a moment.) This small group would be the primary intended users of our evaluation process, what I sometimes call—‘though hating the negative image--the evaluation “virus” that, over time, will “infect” the district culture with positive evaluation thinking. The Committee will charge itself with several ongoing activities; members will collaboratively help design studies, monitor evaluation activities, get first-hand feedback from data users, and so on. Through regular meetings, they are the heart of 5 an ongoing reflection process. Committee membership will be flexible—people will come and go—and is likely to evolve as people’s lives and schedules change. In my experience, this Committee needs four different types of members, often times embodied in just a few individuals: 1. Staff who are highly respected and truly know the district culture and inhabitants well. These are individuals who have excellent interpersonal skills (including the intuition to pick up on people’s affect), may have worked in the district a few years, and can readily learn what their colleagues are really thinking because people freely talk to them. 2. People who “get” evaluation and enjoy data. In my experience there are individuals in every school who enjoy the evaluation process, either because they understand it intuitively and are eager to learn more or because they have had formal training, typically in a degree program. They often admit this attribute sheepishly, knowing that many will label it odd. 3. Those positive, “can do” individuals who can get things done efficiently and thoughtfully. 4. At least one person with a good sense of humor who will remind the group that this work should be agreeable, even at its most challenging, and that an occasional smile or chuckle is a necessary thing. Should you include nay sayers on this initial Committee? Some think that including negative individuals in the initial steps of a change process will give diverse perspectives to Committee discussions, encourage them to get with the program, and support the notion of representative democracy in the district. In my experience, these folks are rarely helpful and often can dismantle or demoralize an otherwise enthusiastic group. Good news: I say do not include negative people in this initial group. This does not mean, however, that you ignore these folks; 6 the Advisory Committee must attend to their interests and concerns individually and extremely purposefully or they may in opposition shut the process down. Since conflict is inherent in the evaluation process, the conceptual framing for the Committee’s work is that of the dual concerns model for understanding conflict (Deutsch, 1949). Deutsch’s model reminds us that people in conflict have two important concerns: reaching a goal (in this case, conducting a meaningful evaluation) and simultaneously maintaining relationships. Collaborative problem-solving/negotiation is the process that facilitates both goal attainment and positive relationships (there are four other processes that are less effective—forcing, withdrawing, smoothing, compromising), so my job as evaluation leader would be to help the members of the PEAC engage in collaborative problem-solving as we move through the capacity building process. Activity 2. Begin to Build an Evaluation Infrastructure The Program Evaluation Advisory Committee marks the first step in creating an evaluation infrastructure the district. You may already have such a committee, although it may be focused on testing and assessment issues, rather than evaluation. Once formed, the PEAC will be charged with responsibility for two types of activities: a) studying the district’s context to determine the availability of certain infrastructure requirements; and b) directly taking other actions themselves to build the infrastructure. a) Assessing the Context. There are three areas of the district context that need to be examined. First, we would be well advised to make sense of the accountability context in which the district finds itself. State accountability requirements, typically driven by federal No Child Left Behind and state mandates, may require that the schools produce certain types of data routinely, and our infrastructure must allow for this, in addition to any other targeted evaluation 7 efforts we might propose, either because they are required by funders or would provide helpful information. It would also be important to assess the district accountability environment to determine possible interest in--or opposition to—the Program Evaluation Advisory Committee’s activities. If the external environment for whatever reason is not likely to support the capacity building effort, I’d want to know that sooner rather than later. Simply put, there are settings in which it will not be viable. A second area of context requiring immediate examination is that of decision making. First, what is senior administration’s interest in and demand for program evaluation information? To what extent does the superintendent want to be involved in the process or in tracking its progress? Second, is there a feedback mechanism in place that will effectively position the results of evaluations into decision-making processes at both the district and the school-level? Absent such a mechanism, the group may have to create one. Third, will school teachers and staff have sufficient autonomy in their decision making, i.e., will people truly be able to act on data, or will some structure external to the school limit or determine actions in certain areas? The Committee needs to understand the bounds within which teachers and staff must operate. An obvious example here might be any cross-district curriculum. A principal in my district recently reported finding the mandatory new first grade math curriculum still in shrink wrap in a closet in one of her teacher’s rooms. Hmm. Does the teacher have the right to develop his own curriculum? Not if the decision has been made at the district level. Evaluation capacity building must be conducted with decision-making constraints. A third and final topic for study in the context is likely access to resources. If resources are available, the capacity building effort is enhanced; if not, then the time-line may be significantly slowed or even made impossible. These resources are of two types: 8 a) Access to evaluation/research knowledge and training- If staff want to build evaluation capacity over time, then access to evaluation knowledge (e.g., data analysis, interpreting test scores, research bases) is essential. The staff already have access to you and whoever else is in your department, and you may be eager to teach people—if your schedule allows time for that. Access to evaluation/research knowledge typically includes access to people—district personnel, other external consultants, or even volunteers (e.g., faculty or students from a local university engaged in a service learning effort) who can explain evaluation. It may also include access to information on evaluation resources (e.g., Websites, books, evaluation reports, tools) or access either to formal training or informal coaching on evaluation processes. b) Access to resources that support the evaluation process- Beyond basic resources like copying, computers, data print-outs, etc., this includes fiscal support from the central administration to provide, for example, time within the work day to collaborate on evaluation activities (e.g., by providing substitute teachers to free people from classroom responsibilities), funds to buy pizza if a group works into the evening, or honoraria for faculty or staff who commit to participating extensively in the evaluation process. My experience over the course of the last decade suggests that although teachers truly value time to collaborate during the school day, they perceive being gone from class (i.e., preparing for a substitute teacher and then dealing with the effects of being gone) as far more costly than working after school, in the evening, or on weekends. Needless to say, this creates a difficult situation for building evaluation capacity! (Review framework up to that point.) 2) Building the Infrastructure Directly. In addition to assessing the context, the Program Evaluation Advisory Committee should begin work on activities to directly increase the 9 evaluation infrastructure in the internal organizational context and thus create a centralized conception of evaluation for the district. You need visible and supportive leadership at both the building and at the district office. This is a necessary component in my experience. The Committee should also seek to directly improve the district climate to make it supportive of evaluation. Members would do this in part by their positive attitudes toward evaluation, their open mindedness when challenged, their respect for colleagues’ opinions, their enthusiasm for risk taking and creativity, and a continuing sense of good humor. Committee members would become evaluation champions, serving as visible supporters of the process, mentioning it in favorable terms, identifying issues for possible study, taking on nay-sayers pleasantly but firmly throughout the school day and across the school year. I understand this sounds a bit saccharine. Glenda, the Good Witch. Miss America. Mrs. Doubtfire. Right. And while this behavior is important, it may not be sufficient to change things. The Committee can establish three structures to teach the evaluation process over time. First, they should collaboratively develop an explicit plan and realistic timeline for building evaluation capacity in the district. Such a plan would include the following content: Re-writing school policies and procedures to include core evaluation capacity building principles (e.g., the expectation that routine activities related to school improvement activities will be evaluated annually, routine compilation and discussion of data related to core activities, explicit evaluative roles for committee chairs); Creating opportunities for faculty and staff (and, over time, students and parents) to collaborate and participate in various ways in ongoing evaluation activities and then nicely mandating them over time- One way to facilitate this, borrowed from social psychology, is to 10 create interdependent roles whereby people necessarily support each other in completing evaluation tasks; Developing formal mechanisms for reflection on data- It would be helpful to create peer learning structures through which teachers and other staff could come together to reflect on evaluation data routinely. Second, based on resources available for the task, the PEAC would decide how to build a within-school infrastructure to support the technical components of the evaluation process. This is necessary to insure the accuracy of data collected and the efficiency of the process. The infrastructure would include a variety of activities, e.g., an occasional process to measure needs, a mechanism to frame questions and generate studies, a way to design evaluations, collect, analyze, and interpret data, and report results both internally in the school community and to the wider public. Third, the Committee would create a structure to socialize faculty and staff purposefully into the organization’s evaluation process, both initially and over time. There would be clear expectations that everyone is expected (dare we say required?) to “do” evaluation (the stick) and equally clear incentives for participation (carrots). The PEAC would also structure ways for those interested to receive training in evaluation, either through informal workshops at the school or formal courses offered in the community. To the extent that staff teach and office near one another and regularly socialize during the workday (e.g., sharing meals and snacks), socializing will be easier. The Committee might also need to consider trust building activities in the shortterm. Program evaluation should be non-threatening and even fun. Establishing the Program Evaluation Advisory Committee and beginning to work on the evaluation infrastructure would mark an important beginning. In that critical first year or two, I 11 would also propose three key activities to engage people and model the evaluation process: making sense of district test scores, conducting at least one highly visible participatory study, and laying the groundwork for eventual action research efforts by groups of faculty and staff. Ideally, a different member of the Program Evaluation Advisory Committee might lead each of these efforts with other members of the school community—faculty, staff, parents, district office folk, etc.—participating and learning alongside. It is likely that the resources (especially time) to do all three activities might well be lacking, and one might stage these over several years. But I would want people to understand how these could help make the evaluation process meaningful in the lives of faculty and staff and, potentially, parents and students. Absent meaningful examples, administrators, teachers, staff, and community folks alike may never move from intuitive evaluation to systematic efforts. Key Activity 1: Making Sense of the Test Scores. The importance of this activity cannot be overstated. American education currently lives in an environment laden with accountability measures. Some call this “accountabilism”: The belief that the appearance of measurement and audit is an essential feature of public accountability within the public sector, and that responsible action in systems, organizations or initiatives, and individuals who are members of such systems, organizations or initiatives, can be augmented through the frantic, rabid, face-valid-only measurement of anything and everything; the belief that threat via audit is both necessary and sufficient to produce improvement in any systems; the over-reliance on decontextualized measurement and measures; noun- a religious cult amongst public administrators and bureaucrats in the late 20th and early 21st century, believed to be of political origins. 12 The point is simple: the failure to make sense of district test scores may quickly lead to additional internal dismay and public humiliation. I would therefore propose that one or two members of the PEAC agree to lead a separate committee that would be charged with studying the district’s and schools’ test scores for the past several years and interpreting them with a view to action. We would access someone (from the district office, a local university, or research shop) with a good understanding of score interpretation and ideally the ability to work with the data to answer targeted questions the group might raise. How helpful this activity will be in relation to specific actions teachers can take in their classrooms depends greatly on the content of the tests and the quality of the existing data. Regardless, it is key activity and could lead to the development of a functional data base teachers could access for information on their current students. This would be an important development for the use of student data over time and hence for evaluation capacity building. You may already do this. This committee might also develop program theory that would plan backward from the necessary achievement outcomes to identify explicit strategies to increase learning in specific areas. Again, the assistance of an outside expert in student learning could be extremely helpful. We might choose to conduct meetings with small groups (e.g., teachers at given grade levels, language arts teachers across grades, and so on) to process the data. Key Activity 2- Conducting One Highly Visible Participatory Inquiry. Modeling the evaluation process for the district is one way to demonstrate how you frame an evaluation question, develop instruments, collect and analyze data, and then make recommendations. I used this process when I served as Research and Evaluation Coordinator, systematically teaching participants in the course of three highly visible studies on topics that 13 mattered greatly to people (i.e., high school graduation standards, special education, and changes in the middle schools). People paid close attention because they truly cared about the outcomes. I would propose a participatory evaluation process involving a team of 20-25 people-representatives of teachers, staff, perhaps parents, perhaps students, district representatives, and university professors. The team would meet monthly throughout the course of the year, and the PEAC representative with support from the evaluation consultants would meet in between to prepare materials for the next month’s meeting. In my experience, the participants in such a study become close friends while gaining a sense of how program evaluation works. Key Activity 3- Instituting Action Research Activities—or At Least Planting the Seeds. I have never succeeded in bringing this final activity fully to life anywhere I have worked, but it remains one of my intentions when I collaborate with a district. The ideal would be for groups of collaborating teachers and staff to institute action research efforts on specific interventions with specific students, e.g., first grade teachers working with students struggling with letter recognition or tenth grade staff whose students have low scores on their standardized test on a certain topic. The action research cycle—plan, act, observe, and reflect—is intuitive; good teachers will note that they informally engage in it every day. What I would propose is making the process more explicit and public. In some sense, school improvement plans that most districts require CAN be a form of action research if they are data-based in design. Since school improvement planning is commonplace, it strikes me as an appropriate vector for introducing this practice. Teachers and administrators would meet and identify extremely specific instructional activities that they know or believe are effective (“promising practices”) to teach this skill. They would agree to try out a strategy, measure the results, and then meet in a month to discuss what 14 happened. This is the process Michael Schmoker presents in his Results books. In contrast to Key Activity Two, the highly visible study that models participatory evaluation, these would be fairly private studies that model how individual teachers can facilitate the specific learning of specific individuals. As I mentioned before, action research could also be a method for an annual data-based school improvement process. Although difficult to institute and sustain, action research can provide visible and transparent results, a way to share and reflect on them, and hence bring the evaluation process to life within classrooms. Based on my experience, the PEAC would need to develop a process that includes training in collaborative teamwork, an opportunity for people to identify their own burning issues, incentives for participation, and a structure that will enable projects to be completed in a reasonable timeframe. To summarize, I have presented a list of activities that I believe would help build a culture of evaluation over time--creating a program evaluation advisory group, beginning to build a formal evaluation infrastructure, making sense of test scores, conducting a highly visible participatory inquiry; and instituting action research activities. Given the life of a school district, it is inconceivable that you could tackle every one of these tasks, even if you had considerable resources to devote, which you most likely don’t. My district increased spending each year on standardized testing because we had to; discretionary dollars for evaluation work were hard to find, except to the extent that we re-directed the existing school improvement process to address the idea of continuous improvement. To return, then, to the question that is the title of this presentation, exactly how viable is evaluation capacity building in schools that give standardized tests? My heart would love to say, yes, this can really happen; free range evaluation can thrive in the context of public school systems. My head tells me something different: it is a continuous challenge. I have provided an outline of what I might do if faced with the ECB challenge in a 15 district. Let me leave you with some deep thoughts grounded in Minnesota connections that I hope will make them memorable. 16 Deep Thought 1. There are specific actions you can take and structures to develop that may create a professional community in which evaluation can thrive. 2. Look for opportunities where program evaluation information can add to district discussions. 3. There are rarely guarantees in program evaluation. 4. If you work hard enough, the impossible can sometimes be achieved. The Related Minnesotan Judy Garland (born in Grand Rapids, MN) How Connected? "Somewhere Over the Rainbow" vs. "Follow the Yellow Brick Road"- Never assume use will happen, but move purposefully in that direction. Garrison Keillor Laura Ingalls Wilder Tell stories in a compelling way. Build on what you and others near you experience. Keep it simple! What? Prince (AKA the Artist Formerly Known as the Artist Formerly Known as Prince) Rocky and Bullwinkle (Frostbite Falls, MN) Charles Lindberg Herb Brooks, coach of the 1980 US Olympic hockey team Do your best to laugh as you go along. Don’t be afraid to take calculated risks; stay committed to the vision Remember who beat the Russians and went on to win the gold medal? Sometimes the impossible IS possible. References Deutsch, M. (1949). A theory of cooperation and competition upon group process. Human Relations, 2: 199-231. Preskill, H., & Torres, R. T. (1999). Evaluative inquiry for learning in organizations. Thousand Oaks, CA: Sage Publications. Sanders, J. R. (2000, April). Strengthening evaluation use in public education. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. 17