Scoring Sessions and Sample Papers Grades 6 - 12 2004-2005 Writing is learningThe Neglected ‘R’: The Need for a Writing Revolution, a report from the National Commission on Writing in America’s Schools and Colleges, April 2003. Tucson Unified School District Professional Development and Academics September 2004 Scoring Sessions for Writing Assessment Preface We owe some of the terms from this document to Edward White’s book Teaching and Assessing Writing: Recent Advances in Understanding, Evaluating, and Improving Student Performance, 2nd edition. TUSD is one of the first districts in the nation to require quarterly writing assessments (Los Angeles Unified will implement them this coming school year), and we are still learning how best to administer scoring sessions that will produce reliable and useful student data. TUSD site administrators requested help regarding anchor papers. This document was created to fulfill that request. Glossary of terms (non-alphabetized) to be read in the order listed Anchor paper: Student writing samples used as a benchmark/indicator of rubric traits for a particular prompt. For instance, sites may have 36 anchor papers, one for each trait and each score. A separate set of anchor papers is needed for each prompt. A paper is an anchor paper only if it is a student sample responding to the same prompt for which you are scoring/grading. Anchor papers expire as soon as a scoring session concludes. Sample paper: Student writing samples which have been scored and which have scorers’ comments directed at specific trait bullets. Sample papers are used to discuss, determine, and refine readers’ understanding of the rubric. Sample papers differ from anchor papers because anchor papers are only good when used during a specific scoring session. In other words, sample papers can be used to discuss nuances of the traits at any time; whereas, anchor papers expire as soon as a scoring session concludes. An anchor paper becomes a sample paper after it expires. Inter-rater reliability: Is the most important factor in making scores valid. A school or group of educators can achieve inter-rater reliability by norming at grade level, for kind of writing assignment, using anchor papers from that specific assignment. If any of these factors remains unmonitored, the scores are not valid. Norming: Coming to consensus regarding what a trait means in connection to a specific writing assignment at a specific grade level. One must use anchor papers from that specific writing assignment to norm successfully. Norming a group: Using anchor papers from a specific writing assignment at a particular grade level, members of the group reach consensus regarding how a trait manifests itself at each score. For the group to be normed successfully, the individual members of the group must accept majority decisions while scoring papers for this group, and individual members must score using the norms when participating in this group. Norming at grade level: A particularly thorny issue in teaching writing skills is how each grade level presents different struggles and successes regarding student success at grade level in mastering appropriate writing thinking skills. To chart student growth, we must norm for grade level mastery in each kind of writing assignment. Norming for kinds of writing: Each of the six traits works differently for each kind of writing assignment. For example, a group must norm differently for ideas and content when scoring narrative assignments than when scoring academic writing assignments. For guidance regarding kinds of writing that would be most fruitful for schools to consider, examine the state standards. Each state standard requires using the 6-traits differently. This means that if a group has normed for narrative writing, they are not prepared to score, with any inter-rater reliability, academic writing. That would require a separate norming session using academic writing anchor papers. The goals of a scoring session One of the biggest causes for concern regarding scoring sessions is the purpose. Why are we doing this? A well-run scoring session can serve a number of purposes, and professional development time can be well spent discussing which purpose best serves a particular site. The following list of goals may or may not be reasons your site has chosen to run a scoring session according to the guidelines in this document, but we offer them as examples: Create a professional learning community Engage in conversation about student achievement in the kind of writing the particular scoring session has its object. Create inter-rater reliability through achieving consensus Accrue data regarding current student mastery of skills assessed Any conversation regarding the scoring session will prove fruitful if everyone involves refers to student essays, the rubrics, and the overall student data gathered as evidence for assertions. Holistic scoring vs. 6-Trait Rubric scoring Since the 1980’s, most teachers have been trained in holistic scoring, where one uses the scoring guide and gives a single score based on impressions from a single reading of the student work. 6-Trait rubric scoring is not holistic scoring. The goal in using the rubrics is to define and identify benchmarks in student mastery of skills inherent in each trait. An added bonus is that 6-trait scoring allows us to be objective and specific in our conversations about student work. Organizing a Scoring Session Step 1: Identify a Chief Reader Step 2: Identify Table Leaders Step 3: Chief Reader and Table Leaders select sample papers that will be used as anchor papers to norm the group. Step 4: At the first committee meeting, define terms, procedures, and outline itinerary. Step 5: Norming the group: all readers read, score, and discuss the anchor papers. Chief Reader, Table Leaders, and all readers spend great time, detail, and effort achieving consensus so that subsequent scoring is reliable and consistent. Step 6: Chief Reader distributes sets of papers to each table Each paper has a scoring page that allows the students and their teachers to remain anonymous. Step 7: Readers score papers for all six traits (each paper needs two scores for each trait, and the second set of scores must be determined with no knowledge of the first set of scores) Step 8: Chief Reader and Table Leaders check sets of paper for inter-rater reliability (if scores for a trait differ by more than one number, the paper needs a third reading. If an individual reader’s scores are not consistent with the group, this reader might benefit from a re-examination of the anchor papers and a discussion with the Table Leader &/or Chief Reader) *As common questions arise at all of the tables, the chief reader will pull these problematic papers so that the group may discuss the issues as they emerge. Leadership Roles Start the scoring training by choosing a committee, identifying a Chief Reader and Table Leaders—people who want to move student achievement forward and who know how to do that through skill development and who are willing to take a leadership role in this endeavor. “The chief reader runs the reading, choosing sample papers, moderating and ending debate over standards, arbitrating differences of opinion, and accepting final responsibility for a reliable scoring session” (White 202). Be sure to identify alternates in the event that some individuals turn you down. Create a calendar for meeting with the committee. With the scoring committee, define terms and procedures. For example, what is norming and why do we need to do it? “The readers can work together to develop that information reliably if they form a temporary discourse community using common criteria” (211). The committee might discuss how to create blind scoring conditions (hiding student identity and hiding previous scores on the paper). “Every test paper must receive at least two independent readings. . . .those scores must be reached independently. The score given by the first reader must allow for ready comparison of the two scores after the second score is registered, so that a reconciliation score may be added if necessary” (211). Anchor papers Once the student writing has been completed and collected, the Chief Reader meet with the entire committee to find anchor papers. “For an essay test, only one writing prompt can be scored in a room at any one time” (201). “Before readers come together to begin scoring, the chief reader and the table leaders will have met to read through many student papers. These experienced readers will have spent a day or more reading a relatively large number of essays, choosing those which can be used as range [finder/anchor papers for the 6-trait rubrics], and articulating how [the 6trait rubrics] allow reliable scoring of this particular writing prompt. By the time the readers arrive, those responsible for the reliability of the reading will have already reached substantial agreement on standards. Readers must be given an opportunity to score sample papers, to argue out differences, to come to an understanding of the ranking system. The training of teachers, or ‘calibration,’ as it is sometimes called, is not indoctrination into standards determined by those who know best (as it is too often imagined to be) but, rather, the formation of an assenting community that feels a sense of ownership of the standards and the process” (214-215). This mini-scoring session should provide meaningful conversation regarding writing instruction and its relation to critical thinking if the participants cite evidence from the student sample papers and the 6-trait rubrics to state their case regarding scores. It might be helpful to have someone record the conversation as the group reaches consensus so that the chief reader and the table leaders can refresh their memories as they guide table discussion during the time set aside for norming the scorers during the scoring session. “The chief reader and the table leaders form a team that must work together comfortably. If they cannot reach agreement on the issues that emerge, the readers will never come to agree on consistent standards” (204). Norming (for a scoring session) “The leadership team must take as its first responsibility the development of collegial consensus, not the rapid production of scores” (204). Ideally, norming scorers will require a large investment of time. The chief reader and table leaders guide the group to consensus by allowing them to discuss the evidence the readers find in the papers to support the scores they have tentatively arrived at using the 6-trait rubrics. The hope is that the discussion at the tables will mirror the discussion among the committee members when they chose anchor papers. “The chief reader/table leaders might distribute one set of papers representing [the 36 benchmark] scores, already scored by the committee. This has the advantage of allowing the readers to critique the committee’s assessment as well as allowing the readers to see how the committee understood the rubrics on this specific writing prompt” (209). “In a well-run scoring session, there is ample time for discussion of the sample &/or anchor papers, and some time is allowed for the readers to consider the implications of the assessment and the scoring procedure for teaching” (199). Scoring Once the chief reader and table leaders feel comfortable with the general consensus among the scorers, the group is ready to grade. “The chief reader needs to allow enough discussion of the rubrics so that readers internalize it to some extent, but discussion has to be cut off when it becomes unproductive. During the reading, the chief reader is constantly scoring papers culled from the tables and discussing those scores with the table leaders; the essential task is to make sure that all table leaders have the same standards and that, therefore, the readers at the various tables do not drift up or down in their scoring” (203). Scorers should feel free to confer with table leaders regarding papers that fail to meet the requirements of the assignment or papers which represent unusual problems that were not addressed during the norming process. “Some sacrifices are necessary to reach high reliabilities. Individual readers need to put aside their idiosyncrasies and agree with group judgments; a reader whose scores are different from everyone else’s is scoring incorrectly. This reader must be persuaded to change, at least for the sake and time of the reading” (211). “The table leaders[’] function is to maintain a consistent grading standard at their tables” [200]. “From time to time during the scoring, generally three or four times a day, additional brief training sessions will deal with particular problems papers that have emerged” (203). “Retraining on new samples must occur from time to time, to prevent readers from drifting up or down in their scoring. Readers must see this process as a group endeavor to work together, not as a check by management on the accuracy of the workers”(215). After Scoring “A scoring session is the most effective in-service training device yet discovered for the teaching of writing” (217). “Use assessment to improve as well as judge student work” (xv). The conversations that emerged from the process can, and should, inform further professional development in writing instruction. The process can serve as a diagnostic of student achievement and can serve as a source of data for discussion regarding writing instruction.