Student Ratings of Instruction at Metropolitan State College Recommendations of the Subcommittee to the Faculty Evaluation Task Force August 2010 Subcommittee Members: Ellen Boswell Juan Dempere Erick Erickson Clark Germann Jeffrey Lewis Ruth Ann Nyhus Mark Potter (chair) David Sullivan Sheila Thompson Jacob Welch Metropolitan State College of Denver Contents 1. Executive Summary Page 2 2. Teaching and Instruction in the Higher Educational Setting Page 3 3. Terminology and Intent Page 6 4. Review, Adaptation, and Development Processes Page 7 5. Proposed Metro State Instrument Page 11 6. Administration, Reporting, and Evaluation Recommendations Page 17 7. Proposal for a Pilot Page 21 8. Appendix Page 22 9. References Page 24 1 Metropolitan State College of Denver 1. Executive Summary Members of the Student Ratings of Instruction (SRI) subcommittee respectfully submit the following recommendations to the Faculty Evaluation Task Force. We recommend that: 1. The Faculty Evaluation Task Force adopt an understanding of instructional responsibilities, as they pertain to the overall evaluation of teaching, to include instructional content, instructional design, instructional delivery, and instructional assessment. 2. The College consistently adopt the terminology of “student ratings of instruction” (SRIs) in place of what has been termed “student evaluation of teaching.” 3. Through every level of the evaluation process reviewers draw conclusions from a variety of available sources, including, as determined by the FETF, faculty self assessment, peer observations, department chair observations, and peer review of instructional materials. 4. The College retain items 1 and 3 from the current SRI instrument to use as the two “global items” for summative evaluation. 5. The College adopt, after careful piloting, the SRI instrument described in Section 5 of this report. 6. The College establish a two-pronged approach to moving toward online administration of SRIs by a) establishing a targeted long-term goal of moving fully to online administration of SRIs, and b) offering in the meantime, beginning in Fall 2011, the option to individual instructors in face-to-face and hybrid classes of administering SRIs either online or by paper. 7. Administrators and faculty alike pursue efforts, enumerated herein, to increase student participation in online-administered SRIs. 8. The Faculty Evaluation Task Force engage faculty and administrators campus-wide on the key question of how much SRI data is necessary for informed and meaningful summative evaluation. This discussion should inform decisions on the desired frequency of SRIs for faculty at different ranks. 9. For 16-week courses, SRIs be conducted during the final three weeks of instruction; for other types of courses, a proportional timing for SRI administration be followed. 10. The original SRIs with student responses be returned to the faculty member, via Deans Offices, along with the Office of Institutional Research (OIR)-generated report. 11. Statistical reports include a histogram presentation of scores for the two global items in Section I of the instrument, along with the mean value and standard deviation, as are currently reported by the OIR. 12. The Faculty Evaluation Task Force consider very carefully, in discussion with faculty and administrators, which norms, if any, are essential for comparative purposes when conducting summative evaluations from SRI data. 13. The Faculty Evaluation Task Force conduct scaled pilots of the proposed instrument—a “mini-pilot” during fall semester and a large-scale pilot during spring semester, in order to launch a new instrument by Fall 2011. 2 Metropolitan State College of Denver 2. Teaching and Instruction in the Higher Educational Setting Members of the sub-committee have decided neither to try to define the activity of teaching nor to specify in detail the role of the teacher. This is because of the oft-perceived “multidimensionality” both of the activity as a whole and of certain aspects of the role (Marsh & Roche, 1997; Theall & Arreola, 2006). We aspire to create as best we can an evaluation system that reflects this multi-dimensionality. Teaching is a complex and reflective human activity that, in the higher education context, is offered in a forum that is advanced (“higher”), semi-public, and essentially critical in nature. As instructor, the most important responsibilities of a teacher to his/her students are the following, which we have adapted from Theall and Arreola (2006): 1. To possess recognized knowledge and/or relevant experience (content expertise); 2. To re-order and re-organize this knowledge/experience for student learning (instructional design); 3. To communicate and ‘translate’ this knowledge/experience into a format accessible to students (instructional delivery); and 4. To evaluate the mastery and other accomplishments of students (instructional assessment). We distinguish, thus, between teaching and instruction. Although instruction is a large part of the teaching activity, it in no way comprises all of the activities, goals or concerns of the college professor. Professors typically aspire to a number of other purposes in the classroom that may include encouraging their students to long for the truth, to aspire to achievement, to emulate heroes, to become just, or to do good, for example (Weimer, 2010). In establishing a roadmap for evaluation, we are being careful not to reduce teaching to that which is measurable. Our focus is thus on the four instructional responsibilities defined above. We encourage the Faculty Evaluation Task Force to adopt a similar understanding of instructional responsibilities as they pertain to the overall evaluation of teaching (Recommendation #1). As complex and multi-dimensional as the nature of teaching is, we proceed according to the following principles that are supported by research and scholarship: 1. Evaluation of teaching, including that based on Student Ratings of Instruction, must be sufficiently comprehensive and flexible to recognize that different teaching modalities require different sets of skills and strengths. 2. Evaluation of teaching, including that based on Student Ratings of Instruction, should lend itself to both summative and formative decisions. 3. Student Ratings of Instruction are not uniformly appropriate as a means of gathering data on each of the four sets of instructional responsibilities defined above. Additional methods of compiling data for evaluation include: a. Peer observation, b. Peer review of instructional materials, c. Department chair observation, and 3 Metropolitan State College of Denver d. Self-evaluation. The following table provides further description of the four instructional responsibilities, weighs the appropriateness of using SRIs to measure them, and indicates alternative appropriate datagathering methods (Arreola, 2007; Bain, 2004; Berk, 2006; Theall & Arreola, 2006). 4 Metropolitan State College of Denver Instructional responsibility Appropriateness of student ratings? Alternate methods of data gathering Content expertise. Includes Instructors bring their the formally recognized expertise to bear on selecting knowledge, skills, and abilities a the content they judge best for faculty member possesses in a maximizing student learning chosen field by virtue of outcomes. Students are not in advanced training, education, an appropriate position to and/or experience. evaluate these choices. Bottom line: SRIs are not appropriate for evaluating this instructional responsibility. Peer review of instructional materials Instructional design. Determines how students interact with content, and includes designing, sequencing, and presenting experiences intended to induce learning. SRI items may ask about learning opportunities, assignments, exams, clarity of objectives, course materials, etc. Self-evaluation Peer review of instructional materials Peer observation Department chair observation Instructional delivery. Includes those human interactions that promote or facilitate learning, as well as various forms of instructional delivery mechanisms. In faceto-face (traditional classroom) instruction, instructional delivery involves, for example: giving organized presentations; motivating students; generating enthusiasm; communicating effectively. Online instruction may also require using various technologies and applications. SRI items may ask about instructor clarity, enthusiasm, openness to student input, class management, use of technology, etc. Instructional assessment. Includes developing and using tools and procedures for assessing student learning both to provide feedback and to assign grades. SRI items may ask about timeliness, frequency, and usefulness of feedback. SRIs can also ask students to report their perceptions of fairness and accuracy of assessments and alignment of assessments with course learning objectives. Department chair observation Peer observation Peer review of instructional materials Self-evaluation 5 Metropolitan State College of Denver 3. Terminology and Intent Presently, the Handbook for Professional Personnel uses the terminology of “student evaluation of teaching” to refer to student ratings. This is a terminology that in the past has been used more or less interchangeably with “student ratings of teaching.” The lists of references in Marsh and Roche (1997) and Cashin (1995) illustrate the past currency of both sets of terms. Today, however, the terminology “student evaluation of teaching” has fallen from usage. Students provide feedback in the form of comments and ratings, and reviewers then use this information in a broader evaluative system to arrive at informed summative decisions (Cashin, 1995). Bain explains: “Any good process should rely on appropriate sources of data, which are then compiled and interpreted by an evaluator or evaluative committee. Student remarks and ratings, in other words, are not evaluations; they are one set of data that an evaluator can take into consideration” (2004, pp. 167-168). We want to insist on this distinction, whereby students rate and faculty and administrators evaluate. This distinction underscores the responsibility of evaluators at every level to consider all appropriate evidence before arriving at summative decisions. Furthermore, Bain (2004) and Cashin (1995) agree that efforts must be made to educate reviewers at all levels that SRI results are only one data point in an overall comprehensive faculty evaluation system. For these reasons, we recommend that Metro State consistently adopt the terminology of student ratings of instruction (SRIs) in place of what had been termed student evaluation of teaching (Recommendation #2). The SRI subcommittee has followed the lead of the Faculty Evaluation Task Force, which identified two reasons for evaluation (February 26, 2010): We evaluate at Metro State to make sure that tenure and promotion and other summative decisions are based on meaningful valid data, and we evaluate to support professional growth for faculty who are interested. This dual purpose of faculty evaluation, in support of both summative and formative decision making, is the standard in colleges and universities today (Arreola, 2007; Seldin, 2006; Seldin & Miller, 2009). Accordingly, we have sought to ensure that SRIs lend themselves to both summative and formative purposes. SRIs are only one measure of instructional effectiveness. For summative decision making it is especially important to avoid overreliance on student ratings. This warning is echoed repeatedly in the literature. In arguing for a comprehensive portfolio-style approach to faculty evaluation, Seldin and Miller, for example, state that “student rating numbers… do not describe one’s professional priorities and strengths” (2009, p. 1). Berk recognizes thirteen distinct methods of measuring teaching effectiveness and argues that “student ratings are a necessary source of evidence of teaching effectiveness for formative, summative, and program decisions, but not a sufficient source” (2006, p. 19, emphasis added). We recommend that through every level of the evaluation process, reviewers draw summative conclusions from a variety of available sources, including, as determined by the FETF, faculty self assessment, peer observations, department chair observations, and peer review of instructional materials. (Recommendation #3). 6 Metropolitan State College of Denver 4. Review, Adaptation, and Development Process The SRI subcommittee followed a thorough and deliberate approach to reviewing, adapting, and developing a set of recommendations for an SRI instrument, for how that instrument will be administered, and for how the student ratings and comments will be used for summative decision-making. Our process has included a literature review, an examination of the SRI procedures of Metro State’s peer institutions, and a review of commercial options. This process has allowed us to identify both strengths and weaknesses of Metro State’s current approach to SRIs. Our literature review yielded a list of questions that we identified to guide our process (see appendix). The lessons we took from the literature pertain to the construction of an instrument, the administration of SRIs, and the use of student ratings for summative evaluation. We have sought to balance these lessons with local considerations pertinent to Metro State. In our examination of practices at peer institutions, we took care to avoid what Arreola calls the “trap of best practices,” where “what works well at one institution may not work at all at another” because of unique values, priorities, tradition, culture, and institutional mission (2007, p. xvi). We thus approached our peer institutional practices through the lens of what we determined at the outset we want from our SRIs and what we considered would “fit” with Metro State. From among Metro State’s thirteen peer institutions approved by the Board of Trustees, we were able to locate and identify SRI instruments, administrative procedures, and/or evaluative guidelines at seven: Appalachian State University, College of Charleston, CSU Chico, CSU Fresno, CSU San Bernardino, James Madison University, Saint Cloud State University, and University of Northern Iowa. We also reviewed the SRI process at CSU San Marcos. There are several high-profile commercial options available as well. We took time to examine and consider Course/Instructor Evaluation Questionnaire (CIEQ), IDEA Center Student Ratings of Instruction, including both the short and long form, Student Instructional Report (SIR II), and Purdue Instructor Course Evaluation System (PICES). Arreola (2007) reviews the CIEQ, IDEA Center long form, and SIR II options. Glenn (2010) reports on faculty responses to several commercial options, including the IDEA Center forms. CSU Fresno, as we found in our review of peer institutions, uses the IDEA Center long form. Through this review of peer institutional practices and commercial options, we discerned features that were common across several instruments, and we identified features that made certain instruments stand out as unique, for better or worse. This process enabled us to identify features and components that we desire as part of a Metro State instrument. We found ourselves drawn to: Relatively short instruments. Most peer-institution instruments contain 16 or fewer items, including scaled items, open-ended items, and student information background items. Commercial options, with the exception of the IDEA Center short (summativeonly) form, tend to be longer. 7 Metropolitan State College of Denver Instruments with two global questions constructed for summative evaluation. We have remained mindful that “poorly worded or inappropriate items will not provide useful information, whereas scores averaged across an ill-defined assortment of items offer no basis for knowing what is to be measured” (Marsh & Roche, 1997, p. 1187). Whereas we judged certain of the peer-institution instruments to have fallen short of this standard, we did find that a widely-followed approach among them, as well as among the commercial options, is to include two global items that read like the following from the PICES form: “Overall I would rate this course as: Excellent-Good-Fair-Poor-Very Poor” and “Overall I would rate this instructor as: Excellent-Good-Fair-Poor-Very Poor.” We found use of these two global items, with only slight variation of wording, on instruments at James Madison University, College of Charleston, and Appalachian State. CSU San Marcos uses these two questions, plus a third (“I learned a great deal in this course” with a scaled response from Strongly Agree to Strongly Disagree). The IDEA Center forms, both the long form and the short form, include these same two global items, while SIR II features only one global item, “Rate the quality of this course as it contributed to your learning.” Cashin reports broad acceptance among scholars of the sufficiency of “one or a few” global items for summative decisions (1995, p. 2). Items designed for formative feedback from students. The inclusion of specifically “formative” items is common to all instruments that we examined with the exception of the IDEA Center short form. Several instruments (James Madison University, CSU San Bernardino, Appalachian State University, PICES, and SIR II, for example) include items relevant to all three instructional responsibilities that we have identified as appropriate for student feedback (instructional design, delivery, and assessment). The PICES and CSU San Bernardino instruments are designed to allow individual instructors to follow the “cafeteria approach” of selecting from a bank of formative items those that are most relevant for their courses. Comments. We seek to balance SRI numerical scores with written student feedback. Fuller qualitative data can provide both context for summative decision-making and feedback for formative decision-making. Furthermore, there is a tendency to overuse quantitative SRI data because, right or wrong, those are often perceived as being the only data points in faculty evaluations that have undergone testing for reliability and validity (Marsh & Roche, 1997). As a corrective against such potential overuse, we desire an intentional approach to asking for student written responses. For example, Saint Cloud University asks students to explain “What are the strong points of the course and the instruction that you believe should be continued?” and “What are the weak points of the course and the instruction that you believe should be modified?” The CSU San Marcos instrument asks students to “List one or two specific aspects of this course that were particularly effective in stimulating your interest in the materials presented or in fostering your learning;” “If relevant, describe one or two specific aspects of this course that lessened your interest in the materials presented or interfered with your learning;” and “What suggestions, if any, do you have for improving this course?” We find these directed questions superior to the practice of designating a box at the end of an instrument for “Comments,” as done on the IDEA Center forms and on the current 8 Metropolitan State College of Denver Metro State form. The CSU San Bernardino form stood out to us for its intentional approach to eliciting written student feedback: It asks students to provide, in written comments, explanations for their ratings of each of the two global items, and its formative “teaching improvement questions” are phrased as open-ended questions rather than as scaled items. Our review process enabled us to make the following observations with regard to the current Metro State instrument: All current Metro State forms (A-H) begin with the same 4 global items, indicated as items “to provide a general evaluation.” o One of these global items (item 2) asks students to rate course content, and we find that question inappropriate on an SRI instrument. Arreola states that “rarely does a well-designed student rating form ask students to evaluate the content expertise of the teacher” (2007, p. 20), and as mentioned in section 2 of this report, there are alternate appropriate methods for evaluating course content. o Two items (item 1 and item 3) ask “The course as a whole was: Very Poor-PoorFair-Good-Very Good-Excellent” and “The instructor’s contribution to the course was: Very Poor-Poor-Fair-Good-Very Good-Excellent.” These items align well with the 2 standard global items found on most instruments. We recommend retaining these two items as standard “global items” intended to elicit ratings that contribute to summative evaluation (Recommendation #4) By retaining these two items, we preserve longitudinal consistency across different Metro State instruments. o Our reading of the fourth item (item 4), “The instructor’s effectiveness in teaching the subject matter was: Very Poor-Poor-Fair-Good-Very GoodExcellent,” leads us to conclude that it is insufficiently differentiated from item 3 (instructor’s contribution). The Metro State instrument relies too heavily on scaled items that elicit numerical scores. There are 27 such items, divided between “general evaluation items” (section 1), “diagnostic feedback items” (section 2), “information items about the course to other students” (section 3), and “information items relative to other courses [students] have taken” (section 4). A fifth section asks students to provide general information about themselves and about the course they are rating. We question the need for this number of items, we are concerned that the instrument wrongly conveys to students how the data are used, and we fear that this large number of numerical scores might lead to overreliance on quantitative versus qualitative data. Specifically, o Scores in section 3 (“To provide information about the course to other students”) are not made available for use by students; instead, students have access through MetroConnect to scores from the four “global” items in Section 1. Furthermore, 9 Metropolitan State College of Denver these items in section 3 are insufficiently differentiated from items in sections 1 and 2 of the instrument. o It is unclear how student responses to items in sections 4 (“To provide information relative to other courses you have taken”) and 5 (“To provide general information about yourself and this course”) are used in either summative or formative decision making. o While there is a space for “comments,” there are no open-ended items that ask students to comment on specific aspects of the course. We are aware that certain academic programs distribute additional pages with open-ended items to elicit student comments. We take this as acknowledgment that the Metro State instrument insufficiently elicits student written feedback. 10 Metropolitan State College of Denver 5. Proposed Metro State Instrument Informed by our review of the literature and of the peer institution and commercial options, and having evaluated the current Metro State instrument in light of our findings, we recommend the piloting and adoption of the following instrument (Recommendation #5). 11 Metropolitan State College of Denver Student Ratings of Instruction Fall 2010 Pilot Section I: Please circle a rating number and provide comments in the boxes. 1. The course as a whole was (6) Excellent (5) Very good (4) Good (3) Fair (2) Poor (1) Very poor Please provide reasons why you gave the above rating 2. The instructor’s contribution to the course was (6) Excellent (5) Very good (4) Good (3) Fair (2) Poor (1) Very poor Please provide reasons why you gave the above rating 12 Metropolitan State College of Denver Section II: Teaching improvement questions 1. 2. 3. 4. 5. 13 Metropolitan State College of Denver Section III Student Ratings of Instruction Supplemental Faculty Comment Form Faculty Name: Course: Completing this form is optional. Use this form only in the event of an unusual circumstance or circumstances that you believe may influence the Student Ratings of Instruction for your course. In order to be made part of the record, this form must be received in the relevant Dean’s office no later than the end of business on the last day of instruction in the semester that the course is taught. Directions: Using the space below, please describe the unusual circumstance(s) that you believe may influence the Student Ratings of Instruction for this class. 14 Metropolitan State College of Denver Notes: 1) Student ratings and responses from Section I are intended for both summative and formative decision-making. While we prefer in abstract that comments be included along with the OIR (Office of Institutional Research)-generated report in RTP/PTR dossiers, we realize that this may prove impractical. Instead, individual faculty can quote from and make reference to student comments in order to provide context for the global item scores, and comments from this section should be made available to department chairs and reviewers upon request. 2) Student responses in Section II are for formative decision-making and “belong” to the individual instructor. If the faculty member desires, he/she may choose to include the questions and some or all of the student responses from this section in his/her RTP/PTR dossier. 3) If a faculty member completes the Supplemental Faculty Comment Form (Section III), a copy of the completed form should follow the OIR-generated report from Section I of the instrument. We propose the following bank of teaching improvement questions. Ultimately, once Metro State moves administration of SRIs entirely online, we envision individual instructors being able to choose from online drop-down menus up to 5 teaching improvement questions that are most appropriate for their courses. Until then, while SRIs are still administered by paper, it is most practical for faculty from each curricular program to choose the teaching improvement questions (up to 5) to be used for all courses across that program. Per note #2 above, however, even though these questions will be selected by programs, it is our intention that they be about and for teaching improvement and not program assessment. As such, the process for selecting questions in Section II should be faculty-driven, and responses to these questions “belong” to individual faculty for formative purposes. Teaching Improvement Questions Category 1: Instructional Design Describe how the syllabus helped/hindered your learning in this course. Describe what you liked best/least about “hands on” learning activities, such as research, experiments, case studies, or problem-solving activities. Describe what you liked best/least about the sequencing of the course content. Describe what you liked best/least about the scheduling of course work, such as reading, assignments, and exams. Describe how the overall workload helped/hindered your learning in this course. Category 2: Instructional Delivery Comment on the strengths/weakness of group activities in this course. Describe how class discussions helped/hindered your learning in this course. Describe how online discussions helped/hindered your learning in this course. Describe what you liked best/least about the instructor’s interaction with the class. 15 Metropolitan State College of Denver Comment on the strengths/weaknesses of the instructor’s explanations of course material. Describe how the classroom climate helped/hindered your learning in this course. Describe how the online climate helped/hindered your learning in this course. Comment on the strengths/weaknesses of the instructor’s use of technology. Category 3: Instructional Assessment Describe how exams and assignments helped/hindered your learning in this course. Describe what you liked best/least about the instructor’s overall approach to grading in this course. Comment on the strengths/weaknesses of instructor feedback in this course. Describe how exams and assignments in this course challenged you intellectually. Category 4: Student engagement and motivation Describe how this class helped/hindered your motivation to learn the subject material. Describe how actively you have participated in all aspects of the learning process (for example completing required readings and assignments, participating in class activities, studying for exams). 16 Metropolitan State College of Denver 6. Administration, Reporting, and Evaluation Recommendations Developing an instrument is only one small piece of the total SRI process. For the system to be valued by faculty and administrators alike as one that supports meaningful, fair, and valid summative decision-making while also providing robust support for professional growth, it must adhere to research-based best practices, as referenced in the responses below to our guiding questions. 1. Do we want to use online administration or pencil/paper administration? There are numerous advantages of using online administration of SRIs, and we recognize that the full potential of our proposed instrument, with its emphasis on open-ended items and with the bank of teaching improvement questions intended eventually to be made available to individual instructors, can be best met through online administration. Challenges with response rates, experienced nationally as well as here at Metro State with online course evaluations, give us pause, however. As reported in the Chronicle of Higher Education (Miller, 2010), the IDEA Center found in a study examining responses at nearly 300 institutions between 2002 to 2008 that, while there was no change in how students rated their instructors, response rates dropped from 78% for paper surveys to 53% for online surveys. Thus, we recommend establishing a long-term goal of moving fully to online administration of SRIs, but for the present we wish for individual instructors in face-to-face and hybrid classes to have the choice of administering SRIs for their courses either online or by paper (Recommendation #6). 2. If we pursue an online option, what criteria do we want to prioritize in selecting a vendor or platform, if applicable? We did not directly address this question, though we do note that Metro State already has contracts with Blackboard and Digital Measures, both of which have the capability of administering online SRIs. We also support directing students to their course-specific SRIs through MetroConnect rather than through email. 3. Do we want to establish a minimum response rate to ensure representativeness of results? Whereas there is no agreed-upon standard for what is considered a minimum acceptable response rate, scholars widely agree that summative decisions should not be made using SRI data when response rates fall below a certain minimum, because those results cannot be assumed to represent class opinions as a whole. For formative decisionmaking, on the other hand, even the potentially biased responses from an unrepresentative sample of students can yield useful insights. We hesitate to make a specific recommendation on this matter before more fully engaging with the campus community. Furthermore, any policy decision in response to this question must be formulated in conjunction with responses to question 5 regarding frequency of administration. Both questions (minimum necessary response rate and minimum frequency of administration) beg the broader question of how much SRI data is needed to make informed and meaningful summative decisions. 4. What steps can we take to ensure acceptable response rates? As Metro State moves gradually toward online administration of SRIs (see Recommendation # 6), there are 17 Metropolitan State College of Denver several steps that can be taken to increase response rates. We do not recommend punitive measures against students, for example withholding grades, as a means of increasing response rates. Our efforts toward increased participation in online administration of SRIs should come primarily through joint effort on the part of administrators and faculty. We recommend that Metro State administrators and faculty alike pursue the following efforts, adapted from Berk (2006), intended to increase student participation in onlineadministered SRIs (Recommendation #7): a. Make computer labs available for completion of online SRIs during class time. b. Communicate to students assurances of anonymity. c. Provide frequent reminders to students. d. Communicate to students the importance of their input and how their results will be used. e. Ensure that the online SRI system is convenient and user-friendly. Part of the responsibility for response rates rests upon individual instructors as well, and there are certain contributions that they can make. As a general rule of thumb, the time saved by not conducting SRIs in class using paper should be used by the instructor to remind students over the course of several class meetings why it is important that they complete their SRIs. Furthermore, students will be more likely to complete SRIs if they perceive throughout the semester that their instructors want and care about their input (Weimer, 2010). 5. How often should we be gathering SRI data? Once again, there is no universal standard, and the answer also depends on whether the use of SRI responses is for summative or formative decision-making. Cashin argues that summative decision-making in cases of full-time instructors should consider “ratings from a variety of courses, for two or more courses from every term for at least two years, totaling at least five courses” (1995, p. 2). We acknowledge that there may be a desire to administer SRIs more frequently for untenured assistant professors. In addition, faculty of any rank should have the option of going beyond whatever minimum is required and administering SRIs for their own formative purposes. We recommend that the Faculty Evaluation Task Force determine policy regarding the minimum frequency of SRI administration only after engaging faculty and administrators on the key question of how much SRI data is necessary for informed and meaningful summative decision-making (Recommendation #8). Cashin’s rule of thumb may provide a starting point, but this decision also needs to be made in concert with several other considerations: a. What additional sources of instructional data will be used to inform summative decisions? b. What will be the frequency and content of annual evaluations and RTP/PTR dossiers? c. What expectations will be established regarding minimum SRI response rates? 6. What is the best timing for administering SRIs? Providing for relatively standardized conditions in the administration of SRIs is important to the reliability of results. In fact, all else being equal, paper administration of SRIs tends to produce greater standardization of conditions than does online administration (Berk, 2006). Whichever the 18 Metropolitan State College of Denver mode of administration, a relatively narrow time frame for completion of SRIs is preferred for the purpose of standardization. Further, we prefer that SRIs be completed toward the end of the 15-week instructional period of the term, so that students will have the whole of the semester to reflect on while responding. We recommend that, for 16-week courses, SRIs be conducted during the final three weeks of instruction; for differently scheduled courses, a proportionate timing be followed (Recommendation #9). 7. In what format will reporting take place? Because initially there will be a significant portion of SRIs completed in pencil, and because open-ended narrative responses in Section I of SRIs will be integral to the evaluation process, we recommend that the original SRIs with student responses be returned to the faculty member via each School’s Deans Office (Recommendation #10). Student written comments in response to the global items in Section I should be made available upon request from any level of summative review, and responses to items in Section II should stay with the faculty member. Some may argue that returning the original SRIs with student comments included on them can create conditions for retaliation against students. However, we find such a risk to be more theoretical than practical, and we feel that the value of providing rich, contextualized data, both quantitative and qualitative, for evaluation outweighs the need to guard against the remote and theoretical risk of retaliation. For purposes both of summative and formative decisionmaking, there is value in keeping individual students’ comments paired with their ratings to better understand the context and reasons for global item scores. Over time, as Metro State makes a concerted institutional move to online administration of SRIs, even the theoretical risk of retaliation will disappear. In the meantime, if departments or programs choose to type comments to preserve respondent anonymity, they reserve the option to do so—as long as individual students’ comments remain paired with their global item scores—but we urge against any institution-wide mandate to do so. Additionally, we recommend that statistical reports include a histogram presentation of scores for the two global items in Section I, along with the mean value and standard deviation, as is currently reported by the Office of Institutional Research (Recommendation #11). Currently, OIR reports student rating results for each course with results from the following norm groups: course prefix (upper or lower), department, school, college, and (all courses taught by individual) faculty. We do not believe that the use of 5 different norms is necessary, and including that many norm groups in reports creates a risk of over-reliance upon and misuse of quantitative data. Weimer (2010) warns that SRI results tell only the story of “what happened in one class with one group of students at one point in a career.” These highly particular contexts become lost when evaluators turn right away to comparisons with norm groups. The inclusion of “faculty mean” is especially troubling to us, since it purports to capture broadly an instructor’s performance and thus invites evaluators to make summative decisions based on a single number. An alternative to norm referenced evaluation (using norm groups for comparison) is criteria referenced evaluation, wherein programs, departments, or levels of review determine a minimum standard for performance on the scale (Berk, 2006). We believe that reviewers often intuitively look for scores to be at a certain level or above. In such cases, they are following a criteria referenced approach to evaluation. We are not advocating for either a norm-based or a criteria-based approach to 19 Metropolitan State College of Denver evaluation. We recommend, however, that the Faculty Evaluation Task Force consider very carefully, in discussion with faculty and administrators, which norms, if any, are essential for conducting fair and meaningful summative evaluations (Recommendation #12). As a starting point for this conversation, we note that CSU San Bernardino reports the following norm groups: lower division courses within the same college (school); upper division courses within the same college (school); and graduate courses within the same college (school). 20 Metropolitan State College of Denver 7. Proposal for a Pilot We recommend that the Faculty Evaluation Task Force conduct scaled pilots of the proposed instrument—a “mini-pilot” during fall semester and a large-scale pilot during spring semester (Recommendation #13). Since, initially, the questions in Section II of the proposed instrument will be determined by program, it will make most sense to select entire departments to participate in the pilots. For the fall semester mini-pilot, we suggest 4 departments: one each from the Schools of Business and Professional Studies, and two from the School of Letters, Arts, and Sciences. We believe that for this first pilot, these should be departments with few or no untenured faculty. The purpose of the pilots, both in fall and spring, will be to gather user feedback on 1. The process in general for administering SRIs, as proposed in this report, and 2. The quality and usefulness of the SRI data, qualitative as well as quantitative, for making both summative and formative decisions. 21 Metropolitan State College of Denver 8. Appendix The SRI subcommittee used the following questions as a “roadmap” for its work: General questions What is the purpose of SRIs? What do we want to use them for? What is the teaching role at Metro? How do we describe teaching so that we know what we’re evaluating, so that we are evaluating using appropriate methods, and so that all appropriate modalities are included? How can we ensure a system that encourages discussion and sharing about teaching and learning? What are our peer institutions doing? What is the history of the current MSCD instrument? Instrument-related questions What type of questions do we want to include on the instrument(s)? o Global questions? o Questions that rate students’ perception of how well the learning environment helped them learn? o Low-inference questions about instructor behaviors that are related to teaching effectiveness? o Open-ended questions? Will faculty have the opportunity to pick from a menu of optional questions (e.g. the Purdue Cafeteria [PICES] approach)? What steps can we take in the design of the instrument to minimize bias? Administration of SRIs Do we want to use online administration or pencil/paper administration? If we pursue an online option, what criteria do we want to prioritize in selecting a vendor or platform, if applicable? Do we want to establish a minimum response—perhaps tied to the size of the course—to ensure representativeness of results? (The answer here may differ depending on formative or summative purposes). What steps can we take to ensure acceptable response rates? How often should we be evaluating faculty using SRIs? Every class every semester? Something less than that? At frequencies determined by rank/tenure status? (The answer here may differ depending on formative or summative purposes). How much data is needed and over what spread of time in order to make summative decisions? (I.e. it is widely agreed that summative decisions should not be based only on SRIs from one semester, let alone one course). What is the best timing for administering SRIs? How can we balance the need for flexibility along with the desire for standardization of conditions? 22 Metropolitan State College of Denver Post-administration of SRIs Who will have access to results? Will we differentiate between results for summative purposes and results for formative purposes? What additional measures and what additional contextual information will be needed in dossiers for summative evaluation of teaching? What steps can we take to avoid over-reliance on or misuse of SRIs for summative decisions? In what format will the reporting take place? 23 Metropolitan State College of Denver 9. References Arreola, R. A. (2007). Developing a comprehensive faculty evaluation system: A guide to designing, building, and operating large-scale faculty evaluation systems (3rd ed.). San Francisco, CA: Anker Publishing Company, Inc. Bain, K. (2004). What the best college teachers do. Cambridge, Mass: Harvard University Press. Berk, R. A. (2006). Thirteen strategies to measure college teaching: A consumer's guide to rating scale construction, assessment, and decision making for faculty, administrators, and clinicians. Sterling, Va: Stylus Pub. Cashin, W. E. (1995). Student ratings of teaching: The research revisited. IDEA Paper, (32). Retrieved from http://www.theideacenter.org/sites/default/files/Idea_Paper_32.pdf Glenn, D. (2010, April 25). Rating your professors: Scholars test improved course evaluations. The Chronicle of Higher Education. Retrieved from http://chronicle.com/article/Evaluations-That-Make-the/65226/ Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 11871197. Miller, M. H. (2010, May 6). Online evaluations show same results, lower response rate. Chronicle of Higher Education. Retrieved from http://chronicle.com/blogPost/OnlineEvaluations-Show-Same/23772/?sid=wc&utm_source=wc&utm_medium=en Seldin, P. (2006). Building a successful evaluation program. In P. Seldin (Ed.), Evaluating faculty performance: A practical guide to assessing teaching, research, and service (pp. 119). San Francisco, CA: Anker Publishing. Seldin, P., & Miller, J. E. (2008). The academic portfolio: A practical guide to documenting teaching, research, and service. San Francisco, CA: Jossey-Bass. Seldin, Peter, and J. Elizabeth Miller. 2008. The academic portfolio: A practical guide to documenting teaching, research, and service. San Francisco, CA: Jossey-Bass. Theall, M., & Arreola, R. (2006, June). The meta-profession of teaching. Thriving in Academe, 22(5), 5-8. Weimer, M. (2010). Inspired College Teaching: A Career-Long Resource for Professional Growth. San Francisco, CA: Jossey-Bass. 24