Standard Setting and Maintenance

advertisement
Standard setting and
maintenance for Reformed
GCSEs
Robert Coe
∂
2
Defining ‘standards’
 Don’t
– Think about criteria or intended meanings of
grades
– Think about subject-specific
knowledge
∂
 Do
– Focus on when (and why) grades are treated
interchangeably
– Focus on the actual interpretations given to
grades
3
How are grades interpreted?

The claim by teachers in the recent GCSE English dispute
that students who met the criteria deserve a C
– The grade indicates specific competences within the subject
domain that have been demonstrated on the assessment occasion.

The use of a B in GCSE maths as a filter for A level study in
maths
– The grade indicates specific competences within the subject
domain that the candidate is∂likely to be able reproduce in the
future.

Employers requiring C in maths and English (or 5Cs)
– The grade indicates competences transferable to employment
contexts that the candidate is likely to be able reproduce in the
future.

Use of GCSE results in league tables to judge schools
– Average grades for a class or school (especially if referenced
against prior attainment) indicate the impact (and hence quality) of
the teaching experienced.
4
Standard setting and
maintenance in high
performing jurisdictions
Typology of methodologies for
standard setting and maintaining
Judgement-based
methods
– Criterion-based
judgement
– Item-based judgement
– Comparative judgment
– Judgement of demand
Statistics-based methods
– Classical equating
models
– IRT equating
– Equating designs
– Reference/anchor test
– Common candidate
methods
– Pre-testing designs
– Norm/cohort referencing
∂
6
Jurisdictions








England (GCSE & GCE)
China (Gaokao)
Finland (Matriculation Exam)
Australia (NSW)
∂
USA (Delaware, Texas)
South Korea
Hong Kong
PISA
7
Options to consider
Judgement against criteria (p49)
Pros
Cons
Familiarity & (perceived) continuity
with the current system
Grades are readily interpretable in
terms of performance (skills,
knowledge)
Provides a sense check on statistical
methods
Hard to develop criteria that are
neither vague nor constrainingly
narrow.
Very difficult to maintain a standard
– systemic grade inflation and annual
fluctuations are likely.
Undermined by ‘compensation’ –
overall grades do not guarantee
specific competences.
Criterion-referenced interpretations
are problematic if approach is
blended with other (statistical)
approaches.
Awarding using collective judgement
of experts is expensive (if done
properly) and adds time to the
process.
∂
9
Norm referencing (p54)
Pros
Simple to understand
Quick and easy to apply
Prevents spurious rises in
grades (‘grade inflation’)
Cons
Does not allow interchangeability
of grades across years or
subjects.
Grades have no extrinsic
meaning.
∂ Cannot measure change in
performance over time.
Big discontinuity with previous
standards if all subjects have
same norm.
10
Scaling against a reference test (p62)
Pros
Cons
Clear reference points for standard
setting.
Definite reference points for
maintaining standards.
Rigorous, academically supported
(research papers etc.)
Stops ‘grade inflation’
Allows international comparison.
Allows grades to be interpreted
against construct.
Can measure change in
performance over time.
Allows interchangeability of grades
across years and subjects.
New to UK, unfamiliar and
potentially controversial (Rasch,
etc).
Complex. May be hard to explain
to public and other stakeholders
Annual cost (financial and time) of
additional testing.
Initial development cost of
reference test.
Security of reference test may be
hard to maintain.
∂
11
Recommendations
Recommendations (1-4)
1. The SS&M approach should combine elements
of criterion-referencing, norm-referencing and
scaling against a reference test
2. A clear statement of the interpretations of
outcomes (grades or scores) that are intended
or supported, and any expected but unintended
interpretations that are∂ not supported.
3. Outcomes should be reported both as broad
grades and as fine scores.
4. Development of a high-quality reference test
must include piloting, psychometric analysis and
validation against intended interpretations.
13
Recommendations (5-7)
5. The reference test should be taken during the
normal examination window
6. Initial standard setting should draw on a range
of approaches including expert judgement
against grade descriptors and specific
competences, analysis∂ of demand, Angoff and
bookmark methods; using population norms as
a guide and scaling against items from
international benchmarks in a reference test.
7. Standards across subjects that may be treated
interchangeably must be aligned.
14
Recommendations (8-10)
8. Maintaining standards in subsequent years
should depend largely on the reference test,
supported by judgement methods and checked
against changes in population norms.
9. A strategy for updating∂ and releasing items from
the reference test needs to be developed.
10. As far as possible, the principle of full
transparency and disclosure of the SS&M
procedures and results should be observed,
supported by a strong communications strategy.
15
Download