Student Evaluation of Teaching Task Force Final Report and Proposal Presented to OSU Faculty Senate February 9, 2012 1 Why Conduct SET? • Improve both teaching and learning • Provide students a voice in assessment of instruction/faculty • Meet state OAR 580-021-0135 (3) requirements, which do not specify the current format: “Specific provision shall be made for appropriate student input into the data accumulated as the basis for reappointment, promotion, and tenure decisions, and for post-tenure review. Sources of such input shall include, but need not be limited to, solicitation of student comments, student evaluations of instructors and opportunities for participation by students in personnel committee deliberations.” 2 Previous situation: Paper form SET used Questions 1 and 2: Intended to be Summative Questions 3 and on: Intended to be Formative Written comments: Instructor only Current situation: Electronic SET is used for • Formative evaluation • Summative evaluation • Program assessment Our task force is not involved with the transition SET to eSET Charge of the committee: • • • • • Identify the university values in teaching expectations. Compare and contrast the advantages and disadvantages of the student assessments of teaching (SAT) and student evaluations of teaching (SET) as a means of acquiring student input. If SET forms are deemed most appropriate, assess, using informed psychometrics, the validity and reliability of the current SET form and recommend changes as needed. If SAT forms are deemed most appropriate, consider new forms and provide recommendations. Assess the role of student input forms on teaching effectiveness and make recommendations for consistent use of the form in teaching evaluations across academic units. Extensive literature on topic. Is there valid information for the evaluation of an instructor in SET data? Split vote on this. Should SET data be used for personnel decisions? Strong no. Our findings show that student evaluations are strongly related to grades and that learning, as measured by future grades, is unrelated to student evaluations once current grades have been controlled. We also provide evidence that evaluations vary with instructor characteristics, the type of section, and composition of the class. We find, for example, that students sometimes give lower evaluations to women and to foreign-born instructors. We do not believe that our results are specific to our institutional setting, and expect our results to be qualitatively similar for higher education generally. EVALUATING METHODS FOR EVALUATING INSTRUCTION: THE CASE OF HIGHER EDUCATION Bruce A. Weinberg Belton M. Fleisher Masanori Hashimoto William & Mary Journal of Women and the Law Volume 13 | Issue 1 Article 4 Observations on the Folly of Using Student Evaluations of College Teaching for Faculty Evaluation, Pay, and Retention Decisions and Its Implications for Academic Freedom William Arthur Wines Terence J. Lau For administrators, the attractiveness of student evaluations of faculty is that they provide an easy, seemingly objective assessment of teaching that does not require justification. The ease of student evaluations comes in reducing the complexities of teaching performance to a series of numbers, particularly when commercial forms are used. The most common type of commercial student evaluation form utilizes a Likert-type scale for students to rate faculty related to a series of statements about the course and instruction. Each point on the scale is assigned a numerical value which allows the computation of composite scores for individual items, groups of items, or all of the items. Finally, the student ratings are often normed nationally and locally in spite of the near universal recommendations in the literature against norming of student ratings. Literature Can Be Grouped into Following Areas: a. Evaluations Used for Improper Purposes b. Student Evaluations Reveal Bias Against Certain Groups 1. Double Standard 2. Beauty bias 3. Asian bias 4. “Miss Congeniality” bias 5. Thirty second snap-shot 6. Classroom environment 7. Correlation with anticipated grade 8. Smaller classes score higher What the Task Force Learned • From students: – Expect anonymity – Like the idea of formative feedback – Don’t know why student evaluations of teaching are conducted or how information is used • From administrators: – Express a need for summative information 11 What the Task Force Learned • From faculty: o Worry about inconsistent use of scores in current system o Have concerns about variability in value constructs o Doubt the validity of a single instrument for such a wide range of course types o Appreciate customization of proposed feedback o Written comments are more useful o Numerical data can give trends over time 12 Problems with Current SET Form • Feedback comes too late • Require value constructs (excellent, etc.), which tend to vary between students • Global/overall ratings (#1 and #2) ignore complexity of teaching • May be influenced by situational factors • Inconsistent use in faculty evaluation – Discourages innovation – Creates perverse incentives 13 Formative, summative, and program assessment goals are contradictory! Formative: look for what is or is not working well in my class. Summative: show that I deserve a pay raise. Program: show that my department deserves more resources. Need to decouple these three functions! Summative data goes to personnel file. Program assessment. How does a course fit program criteria? This is a curricular issue and should be addressed on the departmental level. The department is responsible for ensuring that instructors address learning outcomes. Our task force is not responsible for using the eSET for bac core purposes. Summative assessment. Examples: • Student focus groups • Exit interviews • Peer review • Supervisor review If numerical data are used, proper statistical analysis needs to be performed. If we do SAT, use SAT as the start and context of a discussion between the supervisor and instructor. Current use of SET scores raises legal questions. Formative assessment: we do it all the time. Clickers are great! But: NOT anonymous! Need for documented assessment. Time spent on each homework set: A) B) C) D) E) F) Less than 1 hour Between 1 and 2 hours Between 2 and 3 hours Between 3 and 4 hours Between 4 and 5 hours More than 5 hours Time spent on each homework set: A) B) C) D) E) F) Less than 1 hour Between 1 and 2 hours Between 2 and 3 hours Between 3 and 4 hours Between 4 and 5 hours More than 5 hours 2% 13% 27% 31% 19% 9% Task Force’s Goals for Assessment Tool • Focus on improving teaching • Focus on elements that affect student learning • Employ a formative approach • Allow for evaluation of diverse teaching methods and philosophies • Provide a flexible system that faculty can adapt to their course 20 An Assessment Tool Should . . . • Permit feedback during the term, when it’s helpful to the class • Allow instructors to choose items • Limit access to the data to discourage misleading and invidious comparisons • Address factors that affect learning (e.g., course design, classroom environment, materials) 21 Proposed Formative Categories • Instructional design – Objectives – Exams and assignments – Materials and resources • Engaging learning – Learning activities – Classroom environment – Extended engagement • Instructional assessment – Fairness – Helpfulness – Opportunity to demonstrate knowledge 22 Proposed Formative Categories • Self-reported course impact on the student – Motivation – Cognitive expansion – Skill development • Alternative and supplemental teaching/learning environment – – – – – – Laboratory and discussion Clinical Seminars Team teaching Field trips Studio 23 Proposal • Change to a formative assessment tool • Create a fully customizable instrument • Rename “Student Assessment of Teaching” (SAT) • Deploy online • Allow teachers control of items used, timing/frequency, and access to data • Report which items were used and when to administrators, but not results • Have teachers share with supervisor steps taken to improve teaching (Periodic Review of Faculty: PROF) 24 We propose to run a pilot test in Fall 2012 and Winter 2013. • Find implementation problems. • Compare results from new and old format. • Feedback on summative use. Four units are willing to participate.