Evaluators should do what they can1 (in tribute to Barry MacDonald) Saville Kushner University of Auckland Barry MacDonald was a pragmatist – which is to say that if an idea or a proposition has no implications for action it is meaningless – meaningless in the sense that it does not inform intent or motivation to act. This is why he was drawn to educational evaluation. In fact, he distinguished evaluation from research with two basic observations: one, that evaluation in essence has consequences in action (mostly for resource distribution) – whereas research may have only theoretical implications; the other, that evaluation works on the practical dilemmas of others – not on the priorities of the researcher. Evaluation is service; research is discovery. Both generate original knowledge – but with different intent. He was also drawn to titles. Titles for him were mini-summations of the orientation of the paper he was writing. Most often, he was preoccupied with the dilemmas we all face in confronting the challenge and the paradox of action. So his titles often were expressed as paradoxes or double entendres or juxtapositions – but always as challenges. ‘Hard Times’ (evaluation and accountability), ‘Mandarins and Lemons’ (evaluation and the bureaucracy), ‘Bread and Dreams’ (civil rights policy and bilingual education). One of his minor offerings was the tile of this introduction – a double entendre to be sure. His preoccupations were with taking a realistic view of what evaluation can achieve, and in this he saw things evaluators were asked to do which were unreasonable, and other things evaluators could do which they were not. This is the theme of the discussion I suggest we have. Let us start with competing statement of the possible and the impossible. First, the impossible. Here is Michael Scriven’s (1996) view of evaluator competencies: 1. Basic qualitative and quantitative methodologies (including survey and observation skills, bias control procedures, practical testing and measurement procedures, judgment and narrative assessment, standard-setting models, presupposition identification, definition theory and practice, case study techniques). 2. Validity theory, generalizability theory, meta-analysis (a.k.a. external synthesis), and their implications (especially as they apply to instruments and indicators, to dissemination practice, and to the synthesis of evaluation studies, respectively). 3. Legal constraints on data control and access, funds use, and personnel treatment (including the rights of human subjects). This includes an understanding of the professional program evaluation standards and the testing standards. 4. Personnel evaluation (since a program can hardly be said to be good if its evaluation of personnel is incompetent or improper), and the personnel evaluation standards. 5. Ethical analysis (e.g., of the extent to which services to clients require a judgment about the legitimacy of program goals; and with respect to confidentiality, discrimination, abuse, triage). 6. Needs assessment, including the distinctions between needs and wants, between performance needs and treatment needs, between needs and ideals, between met and unmet needs, and so on. 1 Paper presented at the ANZ Evaluation Association annual conference, Auckland, July, 2013 7. Cost analysis, including standard procedures such as time discounting of the value of money, but also including the ability to determine opportunity costs and non-money costs (since they are often the most important ones). 8. Internal synthesis models and skills (i.e., models for pulling together sub-evaluations into an over-all evaluation, sub-scores into sub-evaluations, and the evaluations of multiple judges into an over-all rating or standard). 9. Conceptual geography, as in understanding (1) the difference between the four fundamental logical tasks for evaluation [of either (a) merit, (b) worth, or (c) significance], namely grading, ranking, scoring, and apportioning, and their impact on evaluation design; (2) the technical vocabulary of evaluation (including an understanding of commonly discussed methodologies, such as performance measurement and TQM); (3) various models of evaluation as a basis for justifying various evaluation designs; and (4) the validity and utility of evaluation itself (i.e., recta-evaluation), since that issue often comes up with clients and program staff (it includes psychological impact of evaluation). 10. Evaluation-specific report design, construction, and presentation. He goes on to say – hardly comforting: “This is a formidable set of competencies, although it is meant to be minimal, and one that (if valid) raises a serious question about how many people who describe themselves as evaluators deserve the name.” Well – that rules me out. Let’s take a more comforting view. Here is Lee Cronbach (1985): “No one is…fully equipped…The political scientist and the economist find familiar the questions about social processes that evaluation confronts, but they lack preparation for field observation. Sociologists, psychologists and anthropologists know ways to develop part of the data needed for evaluations, but they are not accustomed to drawing conclusions in fastchanging political contexts. The scale and uncontrollability of contemporary field studies tax the skills of statisticians. Even at their best, statistical summaries fall short because evaluative enquiry is historical investigation more than it is a search for general laws and explanations….The intellectual and practical difficulties of completing a credible evaluation are painfully evident…” (Cronbach et al, 1980: 13) Now I feel a little better! But these are old writings and contexts have changed substantially. Both Scriven and Cronbach were writing in what were still expansive times with robust public sectors and a political and fiscal focus in advanced industrial countries on program development as a means of advancing social policy. We live in different times with the widespread intent to dismantle welfare states and to shrink public sectors and systems of social protection. In intervening years we have also seen a widespread transfer of evaluation responsibility into the administrative system and the close alignment of evaluation with the preferences of the government of the day. Government administrators increasingly see evaluation as a means of solving their immediate problems – essentially as an instrument of their own accountabilities and as management information. As the size of the bureaucracy itself shrinks, so the range of individual responsibilities broadens and program management becomes more complex. So – as a generalisation - evaluation contracts are drying up, shrinking in scale but widening in scope. This means fewer resources to hold public policy accountable and a burgeoning expectation on evaluators. Evaluators are increasingly asked to do more than is practicable and to shorter deadlines. On the other hand, evaluators are an increasingly embattled community whose economy is in decline. Whether in a university or in private practice evaluators live within tight commercial realities and are as good and as viable as their last contract. Contracts represent acts of faith and conforming to minimum contracts specifications is the best many can do to guarantee a foothold on the funding process. There is little room left to theorise across evaluation tasks, to address the bigger picture – what House (REF) called ‘big policy’ as opposed to ‘little policy’. Neither of these is a healthy state of affairs and, in the end, do not serve the interests of the administrative system, civil society or, obviously, evaluators. I have signalled before (Kushner, 2011) that treating evaluation as an internal, management information and accountability service has left us bereft of narratives to defend what is publicly cherished in public service and left us vulnerable to damaging narratives of inefficiency, waste and irrelevance. So here is the challenge. We need to enter into negotiations with the sponsoring community over what is worthwhile and reasonable to expect of evaluation. At the same time, we need to be more confident about addressing the big picture as we see it reflected in our evaluation data. To start the conversation, here are some measures we need to engage with: Build in a post-reporting phase to the evaluation to dedicate resource to deliberation; Grow our confidence at working with small samples to address the bigger picture. This means loosening the ties of validity. Evaluation reports are most useful when they serve to focus attention on program and policy development and on building consensus. The pressure to report on ‘what works’ and on ‘impact’ will not go away – but nor does the need to report constructively on the following: - Qualities to be found in programs - key issues – especially differences of views and experiences across program constituencies - constraints and enablers to change - views of change (prospective rather than retrospective evaluation) - options and accompanying consequences focus on problems and issues before method. Evaluation is generally more vulnerable to getting the question wrong than to not validating the answer; distinguish more closely between evaluation and research. Measuring effects, documenting practices and outputs, identifying causal mechanisms are not, in themselves, evaluation. Evaluation adds the requirement of placing these things in a judgement framework and subjecting them to the test of political feasibility; Include data on program interactions in our accounts. Evaluation reports should be inviting of challenge and reinterpretation. Cronbach et al (1980) Toward Reform of Program Evaluation, SF: Jossey Bass Kushner, S (2011) ‘A Return to Quality’ – Presidential Address to UK Evaluation Society Annual Conference, November, published in Evaluation, 17(3), pp. 309-312 Scriven, M (1996) Types of evaluation and types of evaluator in Evaluation Practice, Vol.17(2), pp.151161