Iowa Support System for Schools and Districts in Need of Assistance Evaluation Questions Examples and Commentary National Staff Development Council: Assessing Impact: Evaluating Staff Development, pp 72-73 Four examples of program goals and evaluation goals appear below with some commentary about how each meets the criteria recommended: reasonable, appropriate, answerable, specific regarding the standard and the measure. EXAMPLE ONE Program Goal: Increase teachers’ repertoire of instruction and assessment options to increase differentiation in instruction and assessment. Evaluation Question: Did teachers benefit from the multiple intelligences training program? The question is probably reasonable and even answerable. The evaluator could ask teacher participants if they found any benefits from the training. It is not, however, appropriate as a question. The program’s goal was to increase teachers’ repertoire of instruction and assessment options. The question does not specify what kind of “benefit” and it does not suggest a way to measure whether anything has “increased” in relation to the “benefit.” EXAMPLE TWO Program Goal: Increase students’ performance on the science assessment in scientific inquiry and knowledge. Evaluation Question: Did students’ performance on the 4th and 8th grade science assessments indicate improvement when compared to last year’s results? This question meets the criteria. It is reasonable because the assessment was obviously given the previous year and will be given again; it is appropriate because it aligns with the program’s intended goals; it is answerable because an increase can be measured easily; it specifies the measure, the state’s 4th and 8th grade science assessment. EXAMPLE THREE Program Goal: Increase teachers’ implementation of guided reading in the daily literacy block. Evaluation Question: Did student achievement in reading as measured by the Informal Reading Inventory (IRI) increase by one grade level during the school year? The question focuses on student achievement, yet the program is focused on teachers’ implementation of reading strategies. The question may be reasonable, yet not appropriate (program and question are not aligned). It is answerable; however, answering it may produce invalid conclusions about the impact of the staff development initiative. The question does Design Phase: Evaluation Questions ©2009 Design - 121 Iowa Support System for Schools and Districts in Need of Assistance specify the standard, one grade level. It also specifies the measure, the Informal Reading inventory (IRI). EXAMPLE FOUR Program Goal: Fully implement new social studies curriculum in grades 5-8. Evaluation Question: What is the degree to which the new middle school social studies curriculum is fully implemented as measured on the implementation rubric? The question may or may not be reasonable. The implementation rubric may require evaluators to visit each school and collect observations and/or conduct interviews to determine level of implementation. There may be unanticipated costs such as paying staff or consultants to make visits and/or train for inter-rater reliability. If these costs have been anticipated and planned, then the question is reasonable. Alternate data collection methods such as a survey could be administered to reduce visitation costs. Another consideration is the rubric. Whether the rubric has been developed and field-tested or whether it needs to be developed and field-tested will affect the cost of obtaining an answer to this question. If costs for rubric design and data collection have been anticipated and planned for, the question is reasonable. The question is appropriate because it reflects the program’s purpose. It is answerable in that a level or stage of implementation can be assessed, especially since an implementation rubric is available or can be developed. The question specifies the standard – full implementation or the highest level on the rubric. The question does not imply that all schools will be at the highest level of implementation. Knowing where they are along a continuum of implementation stages is feasible because “full implementation” and other benchmark stages have been designed by the rubric. The rubric will be the measure of implementation. Design Phase: Evaluation Questions ©2009 Design - 122