Grading Principles and Practices James Wollack Department of Educational Psychology Office of Testing & Evaluation Services UW-Madison Teaching Academy September 28, 2012 Walvoord & Anderson (1995) • Which assessment method… • • • • • • has nearly universal faculty participation enjoys superb student participation is never accused of violating academic freedom provides detailed diagnostic assessment of student learning is tightly linked to teaching has a tight feedback loop into classroom learning and into teacher planning • is cheap to implement? • Grading, when well done Value of Grading to Instructors • To what extent are the students achieving the stated course goals? • Are students understanding the concepts? • How should class time be allocated for the current topic? • Can this topic be introduced more effectively? • What parts of this course are students finding most valuable or most difficult? • How should this course be altered next time it is taught? Value of Grading to Students • Do I know what my instructor thinks is most important? • Am I mastering the course content? • How can I improve the way I study in this course? • What grade am I earning in this course? Value of Grading • Assessment and evaluation (i.e., grading) are essential components of effective instruction • Research suggests that faculty spend 30% of their planning/instructional time on assessment activities • Clearly, we are assessment driven and recognize the value of proper evaluation • Why, then, do we not evaluate our own grading practices? Evaluating One’s Grading • Should assessments be learning experiences for students? • What evidence of student learning should figure into grades? • How should these data be combined to form final grades? • Ought grades reflect developmental nature of learning? • How can my grading criteria remain reasonable and comparable when assessments keep changing? Components of Meaningful Grades • Although grades serve multiple purposes, their primary purpose should be to reflect the extent to which a student has mastered the essential learning outcomes. • Grades should be reliable • Students’ demonstrated mastery should depend little on assessment modality or specific tasks • Students’ work should be scored similarly (using same criteria) regardless of who is evaluating or when it is evaluated • Grades should be valid • Based on assessments closely linked to course objectives and desired learning outcomes • Speak to students’ mastery of ELOs #1 Grading Rule • Focus on Fairness and Consistency • Fairness = points/grades accurately reflect the weighting of course objectives. • Consistency = similar scoring/grading for comparable work/depth of understanding. Making Grades Fair and Consistent • Consider what you want your students to learn • Select assignments/assessments that measure what you value most • Course Assessment Map: Description of how various learning goals will be assessed Test Blueprints • Develop test blueprints for exams • Identifies the objectives/skills to be measured • Objectives should be specific, but general enough to allow for multiple questions in that category. • Blueprint should specify the relative weight of each category Sample Biology Blueprint • General Biology 45% • Cellular and Molecular Biology • Diversity of Life Forms • Health • Microbiology • • • • • 35% Microorganisms Infectious Diseases & Prevention Microbial Ecology Medical Microbiology Immunity • Human Anatomy and Physiology • Structure • Systems 20% Sample U.S. History Blueprint • Domestic Affairs 25% • American political System • Major Social Problems • Global Affairs • Civil Rights/Human Rights • Economics 20% 20% 25% • Economic Transformation of US • Government involvement in economy • Culture 10% More on Blueprints • Develop blueprints before making tests • Can build into blueprint a measure of cognitive load • Bloom’s taxonomy • Overlap with textbook/lecture Grading Subjective Assessments • Embrace the grading challenge • There is no one right way to score artifacts of student learning. • Grading is socially constructed and context-dependent • Expert graders will likely differ—sometimes considerably—in their evaluations. • Overall harshness • Importance of material • Magnitude of student errors Develop a Scoring Rubric • A scoring rubric identifies the basis for awarding and subtracting points at each phase of each item. • Nature of scoring rubric will change depending on whether item is computational or a narrative response. Class Presentation Scoring Rubric (Sum-to-total) • • • • Knowledge/Understanding Thinking/Inquiry Communication Use of Visual Aids 30% 30% 20% 20% 6 points 6 points 4 points 4 points • 20-point assignment • For each domain, construct a scale, describing characteristics of each scale point (or range of points) Class Presentation Scoring Rubric (Sum-to-total) Needs Work (0-2) Knowledge/ Understanding 30% Competent (3-4) Excellent (5-6) • The presentation • The presentation • The presentation uses little relevant uses knowledge demonstrates a or accurate which is generally depth of historical information, not accurate with only understanding by even that which was minor inaccuracies, using relevant and presented in class or and which is accurate detail to in the assigned generally relevant to support the student’s texts. the student’s thesis. thesis. • Little or no research • Research is • Research is through is apparent adequate but does and goes beyond not go much beyond what was presented what was presented in class or in the in class or the assigned texts. assigned texts. Class Presentation Scoring Rubric (Sum-to-total) Thinking/ Inquiry 30% Needs Work (0-2) Competent (3-4) Excellent (5-6) • The presentation shows no analytical structure and no central thesis. • The presentation shows an analytical structure and a central thesis, but the analysis is not always fully developed and/or linked to the thesis. • The presentation is centered around a thesis which shows a highly developed awareness of historiographic or social issues and a high level of conceptual ability. Class Presentation Scoring Rubric (Sum-to-total) Needs Work (0-1) Communicat’n 20% • The presentation fails to capture the interest of the audience and/or is confusing in what is communicated. Competent (2-3) Excellent (4) • Presentation • The presentation is techniques used are imaginative and effective in effective in conveying main conveying ideas to ideas, but a bit the audience. unimaginative. • The presenter • Some questions responds effectively from the audience to audience remain unanswered. reactions and questions. Class Presentation Scoring Rubric (Sum-to-total) Needs Work (0-1) Use of Visual Aids 20% • The presentation includes no visual aids or visual aids that are inappropriate and/or too small or messy to be understood. • The presenter makes no mention of visual aids in the presentation. Competent (2-3) Excellent (4) • The presentation • The presentation includes appropriate includes appropriate visual aids, but they and easily are too few in understood visual number and/or in a aids which the format that makes presenter refers to them difficult to use and explains at or understand. appropriate moments in the presentation Art History News Report (Holistic) Points 14-15 • • • • • • • • Describes work concisely Relates message to artist’s choice and use of various devices Develops how message affects beholder Considers audience in writing Clearly organized and presented Well-imagined Legible No problems with mechanics, grammar, spelling, or punctuation 11-13 • • • • • • • Good description Relates message to artist’s choices and use of various devices Some consideration of effect on beholder Perhaps could be better organized or presented Adequately imagined Legible Few problems with mechanics, grammar, spelling, or punctuation Art History News Report (Holistic) Points 8-10 • • • • • • 6-7 • Lacking substantially in either description or analysis • Problems with audience, organization, presentation, or mechanics interfere with understanding 0-5 • Substandard on more than two of these: description, analysis of choices and devices, effects on beholder • Major problems with audience, organization, presentation, or mechanics Adequate description Less thorough analysis of how artist conveys message and devices Audience not necessarily kept in mind Needs significant improvement in organization or presentation Needs better imagination Problems with legibility, mechanics Statistics Problem Scoring Rubric Students are given scores for 12 students on a Physics quiz and asked to compute the mean, median, and standard deviation of quiz scores, showing all work. Mean Median Standard Deviation Correct formula (1 point) Apparent understanding that median refers to middle score (1 point) Correct formula (1 point) X i correct (1 point) Formula applied correctly Scores ranked correctly (2 points) (2 points) Correct answer* (1 point) Correct answer (2 points) • Given (possibly incorrect) numbers from previous steps Confusing mean for median (& vice versa) (−3 points) Nonsensical answer without annotation (−2 points) 2 i X or X i X 2 correct (1 point) Formula applied correctly (2 points). Correct except no square root (1 point) Correct answer* (1 point) Strategies for Maintaining Consistency • Grading should be done anonymously • Grade all students’ responses one question at a time. • Assign each rater a unique set of problems • Multiple raters per item • Maintain an error log and corresponding deductions. Assigning Final Grades • To curve or not to curve? • Should only be considered for large classes (N ≥ 100) • Not appropriate if course involves scored group work • Runs counter to philosophy that teaching affects learning. • Easiest if all assignments are scored numerically • Hard to average (or weight) letter grades • Assignments’ points may not reflect their relative weighting • Problem for norm-referenced grading schemes Assigning Final Grades Test A Test B Comp. (15 points) (30 points) (45 points) Rank A 15 13 28 1 B 13 14 27 2 C 11 15 26 3 D 9 16 25 4 E 7 17 24 5 F 5 18 23 6 G 3 19 22 7 H 1 20 21 8 Mean 8 16.5 St. Dev. 4.9 2.45 Assigning Final Grades • To curve or not to curve? • Should only be considered for large classes (N ≥ 100) • Not appropriate if course involves scored group work • Runs counter to philosophy that teaching affects learning. • Easiest if all assignments are scored numerically • Hard to average (or weight) letter grades • Assignments’ points may not reflect their relative weighting • Problem for norm-referenced grading schemes • Unless assessment standard deviations are comparable, weighting will not be as intended Assigning Final Grades Assigning Final Grades Test A (15 ) Test B (30) Z(A) Z(B) 15Z(A) 30Z(B) Comp. Rank A 15 13 1.43 −1.43 21.43 −42.87 −21.43 8 B 13 14 1.02 −1.02 15.31 −30.62 −15.31 7 C 11 15 0.61 −0.61 9.19 −18.37 −9.19 6 D 9 16 0.20 −0.20 3.06 −6.12 −3.06 5 E 7 17 −0.20 0.20 −3.06 6.12 3.06 4 F 5 18 −0.61 0.61 −9.19 18.37 9.19 3 G 3 19 −1.02 1.02 −15.31 30.62 15.31 2 H 1 20 −1.43 1.43 −21.43 42.87 21.43 1 Mean 8 16.5 0 0 0 0 0 St. Dev. 4.9 2.45 1 1 15 30 Assigning Final Grades • What should contribute? Student Engagement Students’ Mastery of ELOs • Should homework, participation, attendance, etc. count? • Perhaps we could adopt a grading strategy that was reflective of this relationship Noncompensatory Grading • Non-assessment activities likely to improve student learning can serve as gate-keepers for grades • Composite score based entirely on assessments • Can have separate engagement targets that must be met in order to qualify for grades • A 90 – 100% + passing score for at least 85% HW AB 85 – 89% + passing score for at least 80% HW B + passing score for at least 70% HW 80 – 84 % What else should contribute? • Learning is often cumulative and unscheduled • Adopting a developmental grading scheme • Exams count more later in semester than earlier • Final counts more and gives students chance to demonstrate mastery of concepts assessed earlier in semester • Dropping lowest midterm score? How Many Points is an A? • No ubiquitous definition • Clearly some conventions • Like it or not, a B is an average grade • Dramatic variations from conventions communicates misinformation to stakeholders (students, industry, grad schools, etc.) • Test development is very difficult • What if I write an exam that’s too hard? • Consider adopting a class-referenced approach Class-Referenced Grading • Write exams so that the very best students should be able to attain a perfect or nearly perfect score. • Rescale exams (percentage correct scores) so that the top score is treated as the maximum achievable score • Highest earned score will always be 100% • Students not punished because of an overly (and inadvertently) hard exam • Standards in syllabus are conservative • Different from norm-referenced in that percentage of students at each grade is not predetermined. Grading Challenges • Grading is really tricky • Appropriate assignments that cover domain and allow students to demonstrate their knowledge • Consistent, fair, and appropriate scores for each • Determining what counts and in what combination • Obscure task of assigning composite score ranges to ill-defined grade labels. • Doing it right, is paramount! Thank You, Teaching Academy • Questions? James Wollack Department of Educational Psychology Office of Testing & Evaluation Services jwollack@wisc.edu