Chapter 5 (again): Developing Constructed

advertisement
Lecture 8:Subjective Items (Constructed Response) and Essays
Subjective Items
PROS:
Require RECALL of info (less guessing)
Helps identify unanticipated misconceptions and problems
May be more convenient for you to write
Able to measure more complex outcomes than most selected response items
Able to measure outcomes not measured with paper and pencil
Gives the student the freedom to express self, be creative
CONS:
Items must be CLEARLY written and unambiguous, your writing skills are crucial
Subjectively scored so less reliable and therefore, less valid,
May need to develop a rubric of some kind to score so more time consuming
Cover less material because they are more time consuming to take
Difficult to use with large groups
Some students try to bluff on written items
Student writing ability may improperly influence your content-based grades
While the constructed response category contains Performance Assessments, Portfolio
Assessments, and Product Assessments, tonight’s discussion focuses on short-answer items
(including fill-in-the-blank and keyed response) and essay items.
**As always, the behavior required by the item or assessment must match the behavior
described in the instructional objective being measured.
**You also must always remember to consider the previous experience and developmental status
of your students.
I. Short-Answer Items
These items require the student to provide a written response.
The response required of them may be a word or two – as in fill-in-the-blank, a phrase, a
sentence, or several sentences.
A.
There are Two Types of Short-Answer Items
1.
Direct Questions
Example:
What is the technical name for shortsightedness?________________
or
1
In the space below, briefly define myopia.
2.
Incomplete Statements
Example
The technical name for shortsightedness is __________________.
VIP things to remember about short answer items:
1.
A major strength of short answer items is that they provide a vehicle for the
development of written expression that will eventually lead to better essay writing.
2.
However, remember that these are not suitable for measuring complex learning outcomes.
Other types of constructed response (subjective) items are more appropriate for that
purpose.
Short-Answer items are suitable for assessing relatively lower level learning outcomes, such as
those that focus on the acquisition of knowledge, basic comprehension, and simple applications
of information.
B.
Short-Answer Item Writing Guidelines
1. Remove only one or two key words from incomplete statements.
**Avoid “Swiss Cheese” items.
Poor Example:
__________ is __________ but not sufficient for __________.
Better Examples
Reliability is necessary but not sufficient for __________.
In the space below, describe the relationship between reliability and validity.
2. Place blanks near the end of the statement for incomplete statements and at the
right margin for direct questions. Putting the blank at the end makes the item clearer
(less ambiguous). It also saves your time (and vision) during grading.
2
POOR Example:
Warm-blooded animals, called _______________, are born alive and suckle their young.
Better Examples:
Warm-blooded animals that are born alive and suckle their young are called ___________.
What are warm-blooded animals that are born alive and suckle their young called?
_______________________
or, use a more advanced version of this item:
In the space below, name three characteristics of mammals.
1.
2.
3.
3. Write items that direct the student to the one correct and concise answer. If your
item is poorly worded, vague or ambiguous, students will project meaning into the
question and possibly misinterpret it.
POOR Examples:
An animal that eats the flesh of other animals is ______________.
(hungry?)
John Glen first orbited the Earth in _______________.
(a spaceship?)
Better Examples:
Name the term from the text that describes animals that eat the flesh of other animals.
_______________
In your own words, in the space below, define the term ‘carnivore’.
In what year did John Glen first orbit the Earth?
_______________
3
4. Blanks should be kept at a uniform length so that the student is not given
unintentional clues. Avoid using shorter blanks for short answers and longer blanks for
longer answers.
5.
Avoid giving grammatical clues to the correct answer.
Watch your articles a(an) so you don’t give away whether the response begins with a
consonant or a vowel.
6.
If the answer is numerical, indicate the type of measurement units and accuracy
desired.
Examples:
Jill ate one-half of the apple pie. Jack ate one-quarter of the peach pie. How much pie did
they eat all together? Give your answer in decimal notation.
or
Calculate the mean of the following scores. Round your final answer to tenths.
6
7
4
8
2
7.
Employ direct questions rather than incomplete statements for younger students and
students with learning disabilities.
8.
Never take statements directly from the textbook to use for fill-in-the-blank items.
Statements taken out of context are more likely to be ambiguous.
Caveat for using any constructed response item:
The more complex the nature of the student constructed response, the more attention you will
have to give to scoring the responses. Otherwise the reliability of scores, and hence, the
validity of your decisions, will be seriously compromised.
C.
Short-Answer Item Scoring Guidelines
1.
The score value for each response in a fill-in-the-blank item should be a single point
or two, and awarded consistently across all items.
2.
If you accept synonyms for a correct answer, be sure you are consistent across the
class.
4
Examples:
Name the type of item that is scored objectively.
____________________
(I wanted “selected response”, but also accepted particular types of selected response items
that students named, since they were scored objectively.)
3.
For keyed response items, you may want to award credit for student-invented
responses (not taken from the word bank) that are also correct answers.
4.
The score value for more extensive short-answer items should be based on how well
the student’s response meets the criteria specified in the item.
a. For keyed response items, award credit for the correct answer and credit for
correct spelling. (Be sure that students know you score this way by putting that
information in your directions!)
b. For lists, give credit for each component of the list. If it is an ordered list, you
must take order into account when scoring. Specify the criteria to do this.
For example:
Name the first three presidents of the United States, in correct order. Be sure to spell their
names correctly!
Are the correct presidents named? 1 point for each correctly named
Are they in correct order?
Yes = 2, No = 0
Are they spelled correctly?
1 point for each correctly spelled
(3 max)
(2 max)
(3 max)
c. Short answer items may be scored analytically, as described above, or holistically
(all or nothing, categorically), but holistically scored items should never be
worth more than 5 points. Do not use this method for scoring complex
behaviors. Also, remember that holistic scoring is NOT THE SAME as objective
scoring!!
d. Longer written responses (such as essay items) should always be scored
analytically, according to content only, written form only, or both. (Remember
our discussion about this when we learned about reliability?) You will need to
create rubrics that specify what you are scoring (Content? Form? Both?) and
5
describe how points will be awarded. (We will focus on rubrics when we get to
Lecture 9.)
6
Essay Items
A limitation of objective items and Short-Answer items is they do not test a student’s ability to
construct responses that reflect complex cognitive processes or behaviors. We need to know
that student’s know how to approach a given problem, can plan & organize their ideas, and
present their responses. Often these complex goals can be measured with essay questions.
They may be used to call on the student to “Discuss, analyze, compare & contrast, synthesize,
evaluate, etc.”
They measure the students’ ability to write, synthesize, and create.
I. Writing Questions and Instructions
A. Examples of complex cognitive behaviors
BEHAVIOR
TERMS THAT CALL OUT THE BEHAVIOR
Analyzing
Break down, diagram, differentiate, explain
Comparing
Compare, contrast, classify, distinguish between
Creating
Compose, devise, propose, design
Evaluating
Critique, choose and defend, evaluate, judge
Inferring
Extend, extrapolate, predict, conclude, project
Interpreting
Illustrate, translate, interpret, convert
Synthesizing
Combine, rearrange, infer, deduce
B. Two Types of Essay Questions
Restricted-Response Essay Questions: limits both the content and the form of the response
(usually 1 paragraph).
Extended-Response Essay Questions: Provides the examinee with more latitude to produce a
longer response and to vary the context in which content is presented. Typically, the form of
the written response is a component of the scoring criteria.
C. Essay Item Writing Guidelines
1. Restrict the use of essay items to those learning outcomes that cannot be
satisfactorily measured by other (objective or short answer) items.
7
2. Phrase each question (or set of directions) so that the pupil’s task is clearly indicated.
The more complex the task, the more guidance/direction is required.
Poor Example:
What does Newton’s third law have to say about the bounce of a rubber ball?
Better Example:
Using Newton’s third law, explain why a ball bounces higher when dropped from 10 feet, than
when dropped from 5 feet.
Progressively Improved Examples:
(1)
Compare objective and essay tests.
(2)
a.
b.
c.
d.
Compare and contrast objective and essay tests citing the respective strengths
and weakness of each.
(3) Compare and contrast objective and essay tests citing the respective strengths and
weakness of each. Make sure to include the following:
Ease of item construction
e. Nature of student responses
Sampling of subject matter
f. Guessing
Type of objectives measured
g. Time needed for testing
Preparation by student
Another Improved Example:
Poor: What were the causes of the Civil War? This could be a dissertation topic and does not even
specify the country
Better: Discuss the role of agriculture in the North and South as a factor in the outbreak of
the United States’ Civil War.
More Poor Essay Items:
*Actual questions from unidentified campuses across the country.
*HISTORY: Describe the history of the papacy from its origins to the present day,
concentrating especially, but not exclusively, on its social, political, economic, religious, and
philosophical aspects and impact on Europe, Asia, America, and Africa. Be brief, concise, and
specific.
8
*EDUCATION: Develop a foolproof and inexpensive system of education that will meet the
needs of all segments of society. Convince both the faculty and the rioting students outside to
accept it.
*EPISTEMOLOGY: Take a position for or against truth. Prove the validity of your position.
*GENERAL KNOWLEDGE: Describe in detail your general knowledge. Be objective and specific.
4. Indicate an appropriate time limit for each question.
a. Specify a length for the desired response
b. Tell the point value of the item
c. Suggest how long it should take.
5. Don’t give optional questions
Students answering different questions are taking different tests (validity)
6. Judge an item’s quality by composing a model response.
II. Developing Scoring Procedures
Writing the essay item is relatively simple compared to scoring the item.
A. Two methods for scoring constructed responses
1. Holistic Scoring: sort students’ responses into categories by quality.
The categories may be point based or simply pass/fail (all or no points)
The best use of this method is pass/fail and for no more than 5 points.
You only want to know, in general, has the student achieved this objective.
Not focused on detailed or complex information.
Appropriate for short-answer items and as a component of more complex rubrics.
a. Establish the scoring categories you will use
Pass/Fail, Good/Average/Poor, all points or no points
b. Characterize a response that fits each category
What characteristics should a response in each category have?
c. Read each response and form an overall general impression
Don’t belabor the issue; look for the overall gist of the response.
d. Sort the responses into the designated categories
e. Reread the papers and re-categorize as needed
f. Assign the same score to all papers within a category
9
2. Analytical Scoring: systematic scoring using specific procedures, such as checklists
or rating scales (or both), to more accurately assign partial credit and indicate where students
lost points. The scoring plan or procedure is called a rubric.
Advantages: More specific feedback to students
More reliable scores
Disadvantages: Time consuming to construct scoring instrument
A. Checklists – a.k.a. Item Based Rubrics - provides two categories for evaluation
(present/absent, acceptable/unacceptable)
Like the one used for scoring your 10 objectives.
Example:
Compare and contrast maple and pine trees. Describe the maple and the pine, and tell me what
kinds of tree they are. Then tell me how they are alike and how they are different in terms of
the shape of the leaves, when they have leaves, and what kinds of products we get from them.
Essay Checklist
I. Content (2 pts each, 16 pts possible)
_____ Pine _____ Maple
Kind of Tree
_____ Pine _____ Maple
Shape of Leaves
_____ Pine _____ Maple
Time for Leaves
_____ Pine _____ Maple
Products
Comments______________________________________________
II. Structure (1 pt each, 2 pts total)
_____ Topic Sentence (present or absent)
_____ Conclusion (present or absent)
Comments______________________________________________
III. Mechanics (1 pt each, 2 pts total)
_____ Grammar (acceptable/unacceptable)
_____ Spelling
(minimal or no errors/many errors)
Comments______________________________________________
_____/16 Total Points
10
B. Rating Scales – a.k.a. Descriptive Rubrics – an extension of the checklist that also
allows for a judgment of quality, not simply whether the criterion is present or
absent.
Include only as many categories as you can consistently distinguish between
Essay Rating Scale
Absent
Poor
I. Content
Average
Excellent
Maple: (4 pts possible)
Description
0
1
2
3
Kind of tree
0 (absent) 1 (present)
Pine: (4 pts possible)
Description
0
1
2
3
Kind of tree
0 (absent) 1 (present)
Similarities and Differences: (9 pts possible)
Discuss shape
0
1
2
3
Discuss timing
0
1
2
3
Discuss products
0
1
2
3
Comments______________________________________________________________
II. Structure (4 pts possible)
Absent
Poor
Good
Topic Sentence
0
1
2
Conclusion
0
1
2
Comments______________________________________________________________
III. Mechanics (2 pts possible)
Poor
Good
Grammar
0
1
Spelling
0
1
Comments_______________________________________________________________
_____/23 Total Points
*Be sure to evaluate and pilot test your Checklists and Rating Scales before using them to
grade all the papers.
B.
General Suggestions for Writing Rubrics and Scoring Essays
Protect the reliability of your scores!
1. Prepare an outline or an example of the expected answer in advance.
You must have clearly defined scoring criteria.
2. Choose the scoring method that is most appropriate.
3. Decide how to handle factors that are irrelevant to the learning outcomes
measured (spelling, grammar, etc.)
Aspects of the performance that do not apply to your scoring plan.
11
4. Evaluate all of your students’ responses to one question before going on to the
next item. It helps you to be more consistent, intra-rater reliability
5. When possible, score student responses anonymously.
6. Do not look at the student’s scores on previous items
Avoid bias (positive and negative)
7. If big decisions rest on the results, have 2+ independent ratings.
The raters must use the scoring plan in the same way, inter-rater reliability.
8. Give serious consideration to your point breakdown; is the focus on writing
mechanics or knowledge of content?
Stay on target with validity!! What do you intend to measure?
D.
E.
Holistic or Analytical
Avoid the horns and halo effects
All too often teachers are encouraging students to write journals, letters, poems, stories and
give almost exclusively positive feedback to encourage students in their writing. However, many
of the same teachers rake the students over the coals when grading written work on exams.
You might consider offering both positive and negative feedback on written assignments that
students have had time to develop and, while good test writing skills are important, one might
place the emphasis on content for exams.
III. Additional Information on Scoring Subjective Items
A. Definition Items
Example:
What is a norm-referenced test?
Sample Responses:
Jasmine: “A standardized test” (1)
Hyde: “A test where the scores are reported in standard scores such as percentiles, not
percent of information learned” (2)
Homer: “A test that is designed to rank-order students” (2)
Fred: “A test administered under standard conditions” (0)
Rating Scale:
2 pts - indicates the idea of comparison or rank ordering
1 pt - student gives and example
0 pts - wrong answer or missing information
Remember grading all responses to this item before moving on will help!
B. Lists
12
Preliminary decisions:
(1)
(2)
Do we want them to know the entire list?
Does the order matter?
Example:
List, in order, the categories in Bloom’s taxonomy? (8 pts)
Knowledge, Comprehension, Application, Analysis, Synthesis, Evaluation, Create
Simple Rubric:
_____1 pt. for every category correctly listed (6 points possible) plus
_____+2 pts if all in right order (0 or 2 points possible) or
_____+1 pt if two are out of order (0 or 2 points possible) or
_____0 pts if 3 or more are out of order
_____/8 Total points
C. Single-Sentence Responses
Example:
Why did Columbus sail west and not east?
Sample Responses
Jasmine: “Columbus sailed west and not east because he knew the world was round.”
Hyde: “Trying to get to China by sailing around Africa was expensive and dangerous;
knowing the world was round, Columbus sailed west.”
Homer: “west - world round
east - too hard”
Jasmine only 1 piece so 1 point, Hyde both pieces 2 pts, Homer both pieces but not a complete
sentence (use best judgment and consider the age and developmental level of the student)
IV. Avoiding Common Errors in Test Development, Scoring, and Grading
A. Development Errors
1.
Inappropriate difficulty level
Inadequate directions
B. Scoring Errors
1.
Inconsistency when scoring
a. Inter-rater Reliability
b. Intra-rater Reliability
13
2.
Bias
a. Generosity error: This is described as being an “easy grader”. This type
of bias is applied to the whole class. You give an overly favorable
evaluation of student responses.
b. Severity error: The opposite of generosity error. Also applies to the
whole class. You give an overly critical evaluation of student responses.
c. Central-tendency error: Also applies to the entire class. Possible due to
fear of being too easy or too hard, you score everyone as average.
d. Halo effect: This applies to specific individuals. You like the student and
let that influence your evaluation of his or her work.
e. Horns effect: This also applies to specific individuals. You don’t like this
student and let that influence your evaluation of his or her work
V. Helping Students Write Better Essay Tests
1.
2.
3.
4.
5.
6.
VI.
Emphasize vocabulary and logic unique to the discipline
Tell them to read all questions before responding to any of them
Require and Reinforce Legible penmanship
Communicate the relevance of grammar, punctuation, and spelling
Provide practice in essay writing before the test
Promote study habits appropriate for essay testing
Comparison of Subjective and Objective Items
CHARACTERISTIC
SUBJECTIVE ITEMS
OBJECTIVE ITEMS
Writing test items
Relatively easy to construct
Relatively difficult to
construct
Sampling of subject
Limited
Extensive
Measurement of Knowledge
& Complex Achievement
Can measure both; but
complex reserved for essay,
product, and performance
Can measure either;
depending on item used
Preparation by student
Emphasis on larger units of
material
Emphasis is often on details
14
Nature of student response
Organizes original responses
Student selects response
Guessing Answers
Very difficult to guess
Possible to guess
Grading
Difficult, time-consuming,
and somewhat unreliable
Simple, rapid, and highly
reliable
Time needed for testing
Very time consuming
Very quick
15
Download