1 Performance Assessment, Rubrics, & Rating Scales

advertisement
Performance Assessment,
Rubrics, & Rating Scales





1
Trends
Definitions
Advantages &
Disadvantages
Elements for Planning
Technical Concerns
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Types of
Performance Assessments
Performance Assessment
Portfolios
Exhibitions
Experiments
Essays or Writing Samples
2
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Performance Assessment
3

Who is currently using performance
assessments in their courses or programs?

What are some examples of these
assessment tasks?
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Primary Characteristics
4

Constructed response

Reviewed against
criteria/continuum
(individual or program)

Design is driven by
assessment question/
decision
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Why on Rise?



5
Accountability issues
increasing
Educational reform
has been underway
Growing
dissatisfaction with
traditional multiple
choice tests (MC)
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Exercise 1



6
Locate the sample rubrics in your packet.
Working with a partner, review the different
rubrics.
Describe what you like and what you find
difficult about each (BE KIND).
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Advantages As
Reported By Faculty
7

Clarification of goals & objectives

Narrows gap between instruction &
assessment

May enrich insights about students’
skills & abilities

Useful for assessing complex learning
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Advantages for Students
8

Opportunity for detailed feedback

Motivation for learning enhanced

Process information differently
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Disadvantages

Requires Coordination
–Goals
–Administration
–Scoring
–Summary
9
& Reports
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Disadvantages

Archival/Retrieval
–Accessible
–Maintain
10
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Disadvantages
Costs
–Designing
–Scoring
(Train/Monitor)
–Archiving
11
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Steps in Developing
Performance Assessments
1.
2.
3.
4.
5.
12
Clarify purpose/reason for assessment
Clarify performance
Design tasks
Design rating plan
Pilot/revise
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Steps in Developing Rubrics
1. Identify purpose/reason for rating scale
2. Define clearly what is to be rated
3. Decide which you will use
a. Holistic or Analytic
b. Generic or Task-Specific
4. Draft the rating scale and have it reviewed
13
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Recommendations
Generic
Taskspecific
14
Holistic
Analytic
Cost-effective but lacking Desirable
in diagnostic value
Not recommended
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Very desirable but
expensive
Spring 2002
Steps in Developing Rubrics (continued)
5.
6.
7.
8.
15
Pilot your assessment tasks and review
Apply your rating scales
Determine the reliability of the ratings
Evaluate results and revise as needed.
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Table 4. Item Means
Rubric Item
General Statement of Problem:
1. Engages reader.
1
2. Establishes formal tone
1
(professional audience).
3. Describes problem clearly.
1
4. Defines concepts and terms
1
effectively.
Literature Review:
5. Describes study/studies clearly.
1
6. Paraphrases/interprets concisely.
1
7. Organizes meaningfully.
1
Power/Appeal:
8. Expresses voice.
1
Synthesis:
9. e.g. Makes connections between sources. 1
Identifies missing pieces.
Predicts reasonable future directions
Format and Structure:
10. Mechanics (spelling, punctuation)
1
and grammar.
11. A.P.A. style (in text citation)
1
12. Develops beginning, middle, and
1
end elements logically.
13. Provides sound transition.
1
14. Chooses words well.
1
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
2 3 4 5
Descriptive Rating Scales


17
Each rating scale point has a phrase,
sentence, or even paragraph describing what
is being rated.
Generally recommended over graded-category
rating scales.
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Portfolio Scoring Workshop
18
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Subject Matter Expertise
Experts like Dr.
Edward White join
faculty in their work
to refine scoring
rubrics and monitor
the process.
19
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Exercise 2



20
Locate the University of South Florida
example.
Identify the various rating strategies that are
involved in use of this form.
Identify strengths and weaknesses of this form.
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Common Strategy Used



21
Instructor assigns individual grade for an
assignment within a course.
Assignments are forwarded to program-level
assessment team.
Team randomly selects a set of assignments
and assigns a different rating scheme.
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Exercise 3




Locate Rose-Hulman criteria.
Select one of the criteria.
In 1-2 sentences, describe an assessment
task/scenario for that criterion.
Develop rating scales for the criterion.
–
–
22
List traits
Describe distinctions along continuum of ratings
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Example of Consistent &
Inconsistent Ratings
Students
23
CONSISTENT RATING
INCONSISTENT RATING
(7-Point Scale)
(7-Point Scale)
Judge
A
Judge
B
Judge
C
Judge
A
Judge
B
Judge
C
Larry
7
6
7
7
3
1
Moe
4
4
3
4
6
2
Curley
1
2
2
2
4
7
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Calculating Rater Agreement
(3 Raters for 2 Papers)
Is Rater in Agreement with the
Criterion Score?
Judges
Paper
1
Paper
2
Rater’s
Agreement
Paper
1
Paper
2
Rater’s
Agreement
Larry
Yes
No
50%
Yes
No
50%
Moe
No
No
0%
Yes
Yes
100%
Curley
Yes
Yes
100%
Yes
Yes
100%
67%=
Yes
33%=
Yes
50%
100%
=Yes
67%=
Yes
83%
Total
24
Is Rater in Agreement
with the Criterion Score,
Plus or Minus 1 Point?
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Rater Selection and Training
25

Identify raters carefully.

Train raters about purpose of assessment and
to use rubrics appropriately.

Study rating patterns and do not keep raters
who are inconsistent.
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Some Rating Problems





26
Leniency/Severity
Response set
Central tendency
Idiosyncrasy
Lack of interest
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Exercise 4





27
Locate Generalizability Study tables (1-4).
In reviewing table 1, describe the plan for
rating the performance.
What kinds of rating problems do you see?
In table 2, what seems to be the biggest rating
problem?
In table 3, what seems to have more impact,
additional items or raters?
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Generalizability Study (GENOVA)
28

G Study: identifies sources of error (facet) in
the overall design; estimates error variance for
each facet of the measurement design

D Study: estimates reliability of ratings with
current design to project outcome of
alternative designs
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Table 1. Rater and Item Means (Scale 1 - 5)
Rubric Item Number
Team 1
Team 2
Team 3
Team 4
Team 5
3.0
2.7
2.9
3.1
3.3
3.1
2.9
3.0
2.5
3.8
2.0
2.9
2.9
3.5
3.0
3.3
3.3
3.2
3.8
3.2
3.4
3.1
2.9
3.8
2.0
3.3
3.4
3.2
3.3
3.2
3.5
3.1
3.4
3.4
3.4
3.4
3.1
3.8
2.7
3.6
3.3
3.2
3.9
3.6
3.0
3.8
3.3
3.7
4.0
3.7
3.2
4.2
2.6
4.1
4.1
3.9
3.5
3.6
3.6
3.7
3.7
3.7
3.9
3.7
3.3
3.6
3.5
3.9
3.7
3.7
Rater 1 Mean
Rater 2 Mean
3.2
2.7
2.7
3.6
3.4
3.2
3.7
3.6
3.7
3.6
Grand Mean
2.9
3.2
3.3
3.6
3.6
.21
.51
.45
.67
.50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Dependability of Design
Table 2. Percent of Error Variance Associated with Facets of Measurement
Percent Variance (%)
Source of Error Variance
Team 1
Team 2
Team 3
Team 4
Team 5
Persons
18
31
19
35
47
Raters
23
40
3
2
3
Items
13
8
12
15
3
Persons X Raters
15
5
6
10
29
Persons X Items
12
6
23
13
7
Items X Raters
7
5
18
14
5
Error (Persons X Raters X Items)
11
5
17
11
8
Total*
99
101
98
100
102
* rounding error
Table 3. Phi Coefficient Estimates Based on Alternative Designs for Each Team
Raters
Items
Phi
Raters
Items
Phi
Raters
Vary Number of
Vary Number of
Vary Number of
Vary Number of
Vary Number of
Team 5
Team 4
Team 3
Team 2
Team 1
Items
Phi
Raters
Items
Phi
Raters
Items
Phi
Current
Design
2
14
.21
2
14
.51
2
14
.45
2
14
.67
2
14
.50
Alternatives
2
18
.34
2
18
.68
2
18
.61
2
18
.81
2
18
.67
3
14
.41
3
14
.75
3
14
.62
3
14
.84
3
14
.75
3
18
.42
3
18
.75
3
18
.66
3
18
.85
3
18
.75
4
14
.46
4
14
.79
4
14
.66
4
14
.86
4
14
.80
4
18
.48
4
18
.80
4
18
.70
4
18
.88
4
18
.80
5
14
.50
5
14
.82
5
14
.68
5
14
.87
5
14
.83
5
18
.52
5
18
.83
5
18
.72
5
18
.89
5
18
.83
Table 4. Item Means
Rubric Item
General Statement of Problem:
1. Engages reader.
2.
Mean Score
3.31
Establishes formal tone
3.25
(professional audience).
3. Describes problem clearly.
3.25
4. Defines concepts and terms
3.36
effectively.
Literature Review:
5. Describes study/studies clearly.
3.48
6. Paraphrases/interprets concisely.
3.39
7. Organizes meaningfully.
3.49
Power/Appeal:
8. Expresses voice.
3.36
Synthesis:
9. e.g. Makes connections between sources.3.00
Identifies missing pieces.
Predicts reasonable future directions
Format and Structure:
10. Mechanics (spelling, punctuation)
3.82
and grammar.
11. A.P.A. style (in text citation)
2.53
12. Develops beginning, middle, and
3.52
end elements logically.
13. Provides sound transition.
3.45
14. Chooses words well.
3.47
Summary
33

Interpretation - Raters using the rubric in nonsystematic ways

Reliability (phi) values range from .21 to .67 for
the teams—well below .75 level desired
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
What Research Says About
Current Practice
Instructors have
limited sense of
differentiated
purposes of
assessment
Articulating their
goals for student
outcomes is
difficult
Uneven
understanding
about what
constitutes
thinking &
problem solving
Discipline content Articulating
Instructors worry
gets short shrift
criteria for judging over fairness;
difficult
criteria unevenly
applied
34
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Often design plan
is weak/flawed;
limited thought
about use of
information
Instructors not
prepared to
consider reliability
& validity issues
Spring 2002
Summary



35
Use on the rise
Costly
Psychometrically challenging
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Thank you for your attention.
36

Deborah Moore, Assessment Specialist

101B Alumni Gym
Office of Planning & Institutional Effectiveness

dlmoor2@email.uky.edu

859/257-7086

http://www.uky.edu/LexCampus/;
http://www.uky.edu/OPIE/ Thank you for attending.
Deborah Moore, Office of
Planning & Institutional
Effectiveness
Spring 2002
Download