Scoring Sessions and Sample Papers

advertisement
Scoring Sessions and Sample Papers
Grades 6 - 12
2004-2005
Writing is learningThe Neglected ‘R’: The Need for a Writing Revolution, a report from the
National Commission on Writing in America’s Schools and Colleges, April 2003.
Tucson Unified School District
Professional Development and Academics
September 2004
Scoring Sessions for Writing Assessment
Preface
We owe some of the terms from this document to Edward White’s book Teaching and
Assessing Writing: Recent Advances in Understanding, Evaluating, and Improving
Student Performance, 2nd edition.
TUSD is one of the first districts in the nation to require quarterly writing assessments
(Los Angeles Unified will implement them this coming school year), and we are still
learning how best to administer scoring sessions that will produce reliable and useful
student data.
TUSD site administrators requested help regarding anchor papers. This document was
created to fulfill that request.
Glossary of terms (non-alphabetized)
to be read in the order listed
Anchor paper: Student writing samples used as a benchmark/indicator of rubric traits for
a particular prompt. For instance, sites may have 36 anchor papers, one for each trait and
each score. A separate set of anchor papers is needed for each prompt. A paper is an
anchor paper only if it is a student sample responding to the same prompt for which you
are scoring/grading. Anchor papers expire as soon as a scoring session concludes.
Sample paper: Student writing samples which have been scored and which have scorers’
comments directed at specific trait bullets. Sample papers are used to discuss, determine,
and refine readers’ understanding of the rubric. Sample papers differ from anchor papers
because anchor papers are only good when used during a specific scoring session. In
other words, sample papers can be used to discuss nuances of the traits at any time;
whereas, anchor papers expire as soon as a scoring session concludes. An anchor paper
becomes a sample paper after it expires.
Inter-rater reliability: Is the most important factor in making scores valid. A school or
group of educators can achieve inter-rater reliability by norming at grade level, for kind
of writing assignment, using anchor papers from that specific assignment. If any of
these factors remains unmonitored, the scores are not valid.
Norming: Coming to consensus regarding what a trait means in connection to a specific
writing assignment at a specific grade level. One must use anchor papers from that
specific writing assignment to norm successfully.
Norming a group: Using anchor papers from a specific writing assignment at a
particular grade level, members of the group reach consensus regarding how a trait
manifests itself at each score. For the group to be normed successfully, the individual
members of the group must accept majority decisions while scoring papers for this group,
and individual members must score using the norms when participating in this group.
Norming at grade level: A particularly thorny issue in teaching writing skills is how
each grade level presents different struggles and successes regarding student success at
grade level in mastering appropriate writing thinking skills. To chart student growth, we
must norm for grade level mastery in each kind of writing assignment.
Norming for kinds of writing: Each of the six traits works differently for each kind of
writing assignment. For example, a group must norm differently for ideas and content
when scoring narrative assignments than when scoring academic writing assignments.
For guidance regarding kinds of writing that would be most fruitful for schools to
consider, examine the state standards. Each state standard requires using the 6-traits
differently. This means that if a group has normed for narrative writing, they are not
prepared to score, with any inter-rater reliability, academic writing. That would require a
separate norming session using academic writing anchor papers.
The goals of a scoring session
One of the biggest causes for concern regarding scoring sessions is the purpose. Why are
we doing this? A well-run scoring session can serve a number of purposes, and
professional development time can be well spent discussing which purpose best serves a
particular site. The following list of goals may or may not be reasons your site has chosen
to run a scoring session according to the guidelines in this document, but we offer them
as examples:




Create a professional learning community
Engage in conversation about student achievement in the kind of writing the
particular scoring session has its object.
Create inter-rater reliability through achieving consensus
Accrue data regarding current student mastery of skills assessed
Any conversation regarding the scoring session will prove fruitful if everyone involves
refers to student essays, the rubrics, and the overall student data gathered as evidence for
assertions.
Holistic scoring vs. 6-Trait Rubric scoring
Since the 1980’s, most teachers have been trained in holistic scoring, where one uses the
scoring guide and gives a single score based on impressions from a single reading of the
student work.
6-Trait rubric scoring is not holistic scoring. The goal in using the rubrics is to define
and identify benchmarks in student mastery of skills inherent in each trait. An added
bonus is that 6-trait scoring allows us to be objective and specific in our conversations
about student work.
Organizing a Scoring Session
Step 1: Identify a Chief Reader
Step 2: Identify Table Leaders
Step 3: Chief Reader and Table Leaders select sample papers that will be used as anchor
papers to norm the group.
Step 4: At the first committee meeting, define terms, procedures, and outline itinerary.
Step 5: Norming the group: all readers read, score, and discuss the anchor papers. Chief
Reader, Table Leaders, and all readers spend great time, detail, and effort achieving
consensus so that subsequent scoring is reliable and consistent.
Step 6: Chief Reader distributes sets of papers to each table
Each paper has a scoring page that allows the students and their teachers to
remain anonymous.
Step 7: Readers score papers for all six traits (each paper needs two scores for each trait,
and the second set of scores must be determined with no knowledge of the first set of
scores)
Step 8: Chief Reader and Table Leaders check sets of paper for inter-rater reliability (if
scores for a trait differ by more than one number, the paper needs a third reading. If an
individual reader’s scores are not consistent with the group, this reader might benefit
from a re-examination of the anchor papers and a discussion with the Table Leader &/or
Chief Reader)
*As common questions arise at all of the tables, the chief reader will pull these
problematic papers so that the group may discuss the issues as they emerge.
Leadership Roles
Start the scoring training by choosing a committee, identifying a Chief Reader and Table
Leaders—people who want to move student achievement forward and who know how to
do that through skill development and who are willing to take a leadership role in this
endeavor.
“The chief reader runs the reading, choosing sample papers, moderating and ending
debate over standards, arbitrating differences of opinion, and accepting final
responsibility for a reliable scoring session” (White 202).
Be sure to identify alternates in the event that some individuals turn you down.
Create a calendar for meeting with the committee.
With the scoring committee, define terms and procedures. For example, what is norming
and why do we need to do it?
“The readers can work together to develop that information reliably if they form a
temporary discourse community using common criteria” (211).
The committee might discuss how to create blind scoring conditions (hiding student
identity and hiding previous scores on the paper).
“Every test paper must receive at least two independent readings. . . .those scores must be
reached independently. The score given by the first reader must allow for ready
comparison of the two scores after the second score is registered, so that a reconciliation
score may be added if necessary” (211).
Anchor papers
Once the student writing has been completed and collected, the Chief Reader meet with
the entire committee to find anchor papers. “For an essay test, only one writing prompt
can be scored in a room at any one time” (201).
“Before readers come together to begin scoring, the chief reader and the table leaders will
have met to read through many student papers. These experienced readers will have spent
a day or more reading a relatively large number of essays, choosing those which can be
used as range [finder/anchor papers for the 6-trait rubrics], and articulating how [the 6trait rubrics] allow reliable scoring of this particular writing prompt. By the time the
readers arrive, those responsible for the reliability of the reading will have already
reached substantial agreement on standards. Readers must be given an opportunity to
score sample papers, to argue out differences, to come to an understanding of the ranking
system. The training of teachers, or ‘calibration,’ as it is sometimes called, is not
indoctrination into standards determined by those who know best (as it is too often
imagined to be) but, rather, the formation of an assenting community that feels a sense of
ownership of the standards and the process” (214-215).
This mini-scoring session should provide meaningful conversation regarding writing
instruction and its relation to critical thinking if the participants cite evidence from the
student sample papers and the 6-trait rubrics to state their case regarding scores. It might
be helpful to have someone record the conversation as the group reaches consensus so
that the chief reader and the table leaders can refresh their memories as they guide table
discussion during the time set aside for norming the scorers during the scoring session.
“The chief reader and the table leaders form a team that must work together comfortably.
If they cannot reach agreement on the issues that emerge, the readers will never come to
agree on consistent standards” (204).
Norming (for a scoring session)
“The leadership team must take as its first responsibility the development of collegial
consensus, not the rapid production of scores” (204).
Ideally, norming scorers will require a large investment of time. The chief reader and
table leaders guide the group to consensus by allowing them to discuss the evidence the
readers find in the papers to support the scores they have tentatively arrived at using the
6-trait rubrics. The hope is that the discussion at the tables will mirror the discussion
among the committee members when they chose anchor papers.
“The chief reader/table leaders might distribute one set of papers representing [the 36
benchmark] scores, already scored by the committee. This has the advantage of allowing
the readers to critique the committee’s assessment as well as allowing the readers to see
how the committee understood the rubrics on this specific writing prompt” (209).
“In a well-run scoring session, there is ample time for discussion of the sample &/or
anchor papers, and some time is allowed for the readers to consider the implications of
the assessment and the scoring procedure for teaching” (199).
Scoring
Once the chief reader and table leaders feel comfortable with the general consensus
among the scorers, the group is ready to grade.
“The chief reader needs to allow enough discussion of the rubrics so that readers
internalize it to some extent, but discussion has to be cut off when it becomes
unproductive. During the reading, the chief reader is constantly scoring papers culled
from the tables and discussing those scores with the table leaders; the essential task is to
make sure that all table leaders have the same standards and that, therefore, the readers at
the various tables do not drift up or down in their scoring” (203).
Scorers should feel free to confer with table leaders regarding papers that fail to meet the
requirements of the assignment or papers which represent unusual problems that were not
addressed during the norming process.
“Some sacrifices are necessary to reach high reliabilities. Individual readers need to put
aside their idiosyncrasies and agree with group judgments; a reader whose scores are
different from everyone else’s is scoring incorrectly. This reader must be persuaded to
change, at least for the sake and time of the reading” (211).
“The table leaders[’] function is to maintain a consistent grading standard at their tables”
[200].
“From time to time during the scoring, generally three or four times a day, additional
brief training sessions will deal with particular problems papers that have emerged”
(203).
“Retraining on new samples must occur from time to time, to prevent readers from
drifting up or down in their scoring. Readers must see this process as a group endeavor to
work together, not as a check by management on the accuracy of the workers”(215).
After Scoring
“A scoring session is the most effective in-service training device yet discovered for the
teaching of writing” (217).
“Use assessment to improve as well as judge student work” (xv).
The conversations that emerged from the process can, and should, inform further
professional development in writing instruction. The process can serve as a diagnostic of
student achievement and can serve as a source of data for discussion regarding writing
instruction.
Download