Paper and Pencil Test Items

advertisement
Developing Selected-Response
Items
Two General Types of
Paper and Pencil Tests
Selected Response Items:
– Binary response items (e. g., true-false).
– Multiple-choice items.
– Matching items.
Constructed Response Items:
– Fill-in-the-blank items.
– Completion and short answer items.
– Essay items.
General Item Writing Rules
(These rules are available on the website for this course)
• Provide clear and understandable
directions to students about how to
respond.
• Be sure the items themselves are clear
(unambiguous) to students.
• Do not provide unintentional cues regarding
the correct response.
• Use grammar and vocabulary consistent
with the source of instructio.
General Item Writing Rules
Keep reading level below students’
ability.
Format the item for efficient scoring.
Be sure content experts would agree on
the correct answer.
WRITE THE ITEM SO THAT IT
MEASURES THE SPECIFIED
LEARNING TARGET.
Additoinal rules for Binary-Choice Items
Binary-choice (or alternate-choice) items
present a proposition for which one of
two opposing options represents the
correct answer.
Several variants exist:
True-false.
Fact-opinion.
Right-wrong.
Yes-no.
Binary-Choice Items
(Variations: Embedded true-false items)
Indicate whether each underlined word is
used as a verb (V) or as something other
(O) than a verb.
Sailing has many advantages as a recreational
sport. You can sail by yourself or with
others. While basic techniques can be
learned quickly, you can spend a life-time
developing your sailing skills.
Answers
V O has
V O with
V O spend
V O as
V O While
V O
sailing
V O can
V O techniques
V O sail
V O learned
Binary-Choice Items
(Variations: Multiple true-false items)
Read each option and indicate which
are correct.
In comparison with multiple-choice items, an
advantage of true-false items is …
1. more items can be administered within a given
time.
2. higher reliability is obtained from a given
number of items.
3. each test item can be developed in less time.
4. students will select the correct answer only
when they have achieved the skill being
assessed.
Binary-Choice Items
Advantages and Limitations
Limitations:
Advantages:
• Allows adequate • Highly susceptible to
guessing.
sampling of
(usually, knowledge-• Can be used only when
dichotomous answers
level) content.
represent sufficient
• Relatively easy to
response options.
construct.
• Usually, only
• Objectively and
indirectly assess
efficiently scored. intellectual skills.
Binary-Choice Items
Qualities of Good Binary-choice Items
• Good binary-choice items should….
– Measure the specified skill (learning
target).
• This requires some serious thinking
– Require appropriate level of reading skill.
– Emphasize adjectives or adverbs when
they alter or reverse the meaning of the
item.
– Have one (of two) response options that is
unequivocally correct.
– Continued on next slide
Binary-Choice Items
Qualities of Good Binary-choice Items
– Exclude adjectives and adverbs that imply an
indefinite degree.
– Avoid adjectives and adverbs that imply
absolute meaning
– Be stated as simply as possible (e.g., should
exclude “window dressing”).
– Should be written so that the incorrect
response is plausible.
– Should present a single proposition (not a
double-barreled proposition)
Binary-Choice Items
Tip for improving the quality: Use contrasts
Without contrast: The reliability of
short-answer tests is unaffected by
guessing.
With contrast: The reliability of shortanswer tests is less affected by
guessing than is the reliability of
multiple-choice tests.
Binary-Choice Items
Examples of “double-barreled” propositions
Although essay tests require less
time to construct than do
multiple-choice tests, they
require more time to score.
Classroom tests should be reliable
and yield consistent scores
across time.
Binary-Choice Items
Evaluation
Learning target: Information. Identify
qualities desired in multiple-choice items.
Poor Item
TRUE or FALSE: It is not important for a
multiple-choice item to contain five
options.
Improved Item
TRUE or FALSE: A multiple-choice item can
contain as few as two options.
Binary-Choice Items
Evaluation
Learning target: Information. Identify
qualities desired in multiple-choice items.
Poor Item
T or F: Sometimes multiple choice items are
superior to true-false items.
Improved Item
T or F: A 10-item multiple-choice test
typically will be more reliable than a 10item true-false test.
Binary-Choice Items
Evaluation
Learning target: Information. Identify
qualities desired in multiple-choice items.
Poor Item
T F Good multiple-choice items measure
important skills.
Improved Item
T F If plausible distracters are easy to
develop, a table of specifications is of
little value when constructing multiplechoice items.
Multiple-Choice Items
Anatomy of
Multiple-Choice Items
• MC items consist of ….
– A stem,
• Either a direct question, or
• An incomplete statement to be
completed.
– A correct answer, and
– Two or more distracters or foils.
Multiple-Choice Items:
Advantages and Limitations
Advantages
• Provide for a wide
sampling of content.
• Effectively structure
the problem to be
addressed.
• Can be quickly and
objectively scored.
Limitations
• Somewhat
susceptible to
guessing.
• Indirectly measure
targeted behaviors.
• Time-consuming to
construct.
Multiple-Choice Items
Example: Due to lack of parallel content this
item may have more than one correct
answer:
Which of the following represents the
warmest temperature?
A. 100 degrees Celsius
B. 200 degrees Fahrenheit
C. 300 degrees Kelvin
D. an oven set a medium
Multiple-Choice Items
Qualities Continued
Options avoid repetitive words.
Example:
Criterion-referenced…
A. refers to how a test is constructed.
B. refers to how a test is interpreted.
C. refers to how a test is scored.
D. refers to how a passing score is
established.
Multiple-Choice Items
Qualities Continued
• Extraneous content (“window
dressing”) is excluded (example on
next slide).
• Adjectives or adverbs are highlighted
when they reverse or alter the
meaning of a stem.
• Words like not and except should be
emphasized.
• These can be used, but only when it is
important to do so.
Multiple-Choice Items
Which item stem contains window
dressing?
A. What is the highest numerical
value of a reliability coefficient?
B. Although usually not obtainable,
the maximum value of a reliability
coefficient 
is 1.0.
Multiple-Choice Items
Examples of uses of not and except
1. Which of the following qualities least
affects the reliability of a test?
2. All of the following represents types of
validity EXCEPT…
3. The quality that is not an advantage of
multiple-choice items is…
4. ALL BUT WHICH ONE of the following
is...
Multiple-Choice Items
(Continued)
Sample item with equally plausible distracters:
Which item format requires students to
spend the greatest portion examination
time actually solving problems presented by
the items:
A. Essay
B. Short-answer
C. True-false
D. Multiple-Choice
Multiple-Choice Items
(Continued)
• Qualities desired in M-C items,
continued
– Options contain grammar consistent with
the item stem.
– The use of “all of the above” or “none of
the above” used only when necessary.
– Options are arranged in “natural” or
logical order.
Evaluating M-C Items
Poor item:
Internal consistency is high…
A. when students who scored high on the
first half of the test score high on the
second half of the test.
B. when students who scored high on the
first half of the test score low on the
second half of the test.
C. when students who scored high on the
first half of the test score in an
unpredictable manner on the second half
of the test.
D. when all of the above are true.
Evaluating M-C Items
Improved item:
If the internal consistency of a test is
good, how will a group of students
score on the second half of the test
if they got the highest scores on the
first half of the test?
A. Highest scores.
B. Lowest scores.
C. Unpredictable scores.
Evaluating M-C Items
Learning Target: Identify characteristics of
formal and informal assessments.
Which of the following is an example of
informal assessment?
A. Allowing students a choice of which
questions they will answer.
B. Not allowing students a choice of which
questions they will answer.
C. Observing which students are paying
attention.
Evaluating M-C Items
Poor item:
Various item formats have specific advantages and
limitations. An advantage the essay format has
over the multiple-choice format is:
A. the essay item can assess more skills in a given
amount of time.
B. the essay item can assess students’ ability to
evaluate ideas.
C. the essay item can be reliably scored.
D. the essay item requires students to
communicate ideas in writing.
Evaluating M-C Items
Improved item:
Which is an advantage of essay over
multiple choice items?
A. Assess more skills in a given amount of
time.
B. Assess students ability to evaluate
ideas.
C. Evaluate students’ ability to
communicate ideas.
D. Facilitate reliable scoring of answers.
Multiple-Choice Items:
Item-writing Guidelines
1. Does the stem present a clearly stated
problem or question?
2. Is extraneous content (“window dressing”)
excluded from the stem?
3. Are adjectives or adverbs emphasized
when they reverse or significantly alter
the meaning of a stem or option?
4. Are negatives avoided wherever possible
or highlighted where necessary?
5. Are the “correct” answers equally
distributed across all choice categories?
Multiple-Choice Items:
Item-writing Guidelines
6.
7.
8.
9.
Are options parallel in form and content?
Do the options avoid repetitive words?
Is each distracter plausible?
Is the grammar in each option consistent
with the stem?
10. Does the item exclude options equivalent
to “all of the above” and “none of the
above”?
11. Unless another order is more logical, are
options arranged alphabetically?
Matching Items
• Anatomy of a matching item:
– Consist of
• Premises (or stimuli) and
• Responses.
– Advantages
• Provides for wide sampling of
knowledge targets.
• Relatively easy to construct.
• Can be scored objectively and
efficiently.
Specific Item-Writing Guidelines for
Matching Items
1. Include homogeneous premises and
responses.
2. Use more responses than premises.
3. Make sure directions are clear to
students.
4. Keep responses short and logically
ordered.
5. Use four to ten premises (and restrict to
one page).
6. Avoid grammatical clues to correct
answers.
Developing ConstructedResponse Items
Paper & Pencil Constructedresponse items, that is.
Developing ConstructedResponse Items
• Major advantage of constructedresponse items:
– They elicit responses that more closely
resemble real-life behavior.
• In general, however, if a selectedresponse item can provide the same
evaluative information as a
constructed-response item, use the
selected-response item.
Short-Answer Items:
Advantages and disadvantages
Advantages:
1. Easy to construct.
2. Require the
student to supply
and answer.
3. Many such items
can be included in
a test.
Disadvantages:
1. Generally limited
to knowledge-level
skills.
2. More likely scored
erroneously than
are selectedresponse items.
Short-Answer Items:
Item-writing rules
1.
2.
3.
4.
5.
Use direct questions rather than
incomplete statements.
Write items so that the correct response
is concise (a few words or a short phrase).
Write items so that they can be scored
efficiently.
Be sure there is a highly limited set of
correct responses.
Think of the correct response, then
write the item.
Completion Items:
Item-writing rules
Same advantages/disadvantages of shortanswer items.
Same rules applicable to completion items,
plus these additional four:
1. Be sure the blank represents a key word or
phrase.
2. Position blank at or near the end of the item.
3. Keep blanks the same length.
4. Use no more than two or three blanks.
Essay Items: Advantages
Unique advantage: Can assess ability to
communicate in writing (synthesize,
evaluate, compose).
Other advantages:
1. Provide more direct measures of
behaviors specified in performance
objectives.
2. Require the student to produce a
response.
Essay Items: Limitations
Scoring is less reliable (more subjective).
1. Inconsistent within teachers across multiple
scorings of the same responses.
2. Inconsistent within teachers across students.
3. Inconsistent among teachers on the same
responses.
Provides less adequate sampling of content
domain.
More time-consuming to score.
Essay Item-Writing Rules
1.
Convey a clear idea of how extensive a response is
expected:
–
–
–
Ten minutes or less (typical for a restricted-response essays).
Specify a range for the number of words or the amount of time
to be spent on the response.
Make the distribution of points obvious.
–
–
–
–
Would different readers assign the same score?
Describe what constitutes a correct and complete response.
The rubric should be obvious to knowledgeable students.
You do not have an essay item unless you have a rubric!
2. Develop a suitable scoring plan (rubric):
Essay Item-Writing Rules
(Continued)
3. Do not allow a choice of which items to
answer.
4. Evaluate all responses one item at a time.
5. Vary the student order when reading
responses.
6. Decide on the weight grammar and
vocabulary will carry beforehand.
7. Conceal the identity of students, if
possible.
8. Use multiple scorers, when possible.
Multiple-Choice Item Flaws
Examples
M-C Item Flaws
Which best describes what happens when
work is done?
A.
B.
C.
D.
A force operates through a distance.
A force is exerted.
Energy is destroyed.
Potential energy is changed to kinetic energy.
[A]Flaw: Using stereotyped phrases. Item
can be answered correctly based on recall
of verbal information as well as through
understanding of the principal involved.
M-C Item Flaws
Which of the following has helped most to
increase the length of human life?
A.
B.
C.
D.
Fast driving.
Avoidance of overeating.
Wider use of vitamins.
Wider use of inoculation.
[D]Flaw: highly implausible distracter.
Choice “A” is unreasonable, reducing the
item to a three-choice item.
M-C Item Flaws
Horace Greeley is known for his
A.
B.
C.
D.
advice to young men not to go west.
discovery of anesthetics.
editorship of the New York Times.
humorous anecdotes.
[C]Flaw: Verbal trick in distracter: choice “A”
inserts the word not into a phrase
otherwise attributable to Horace Greeley.
M-C Item Flaws
Slavery was first started
A.
B.
C.
D.
at Jamestown settlement.
at Plymouth settlement.
at a settlement in Massachusetts.
a decade before the Civil War.
[A]Flaw: Non-parallel distracters. Choices
“A” and “B” give specific places, “C”
designates a more general area, “D”
specifies a time. This ambiguity makes
more than one choice correct.
M-C Item Flaws
In purifying water for a city water supply, one
process is to have the impure water seep through
layers of sand and fine and course gravel. Here
many impurities are left behind. Below are four
terms, one of which will describe this process
better than the others. Select the correct one.
A. Sedimentation
C. Chlorination
B. Filtration
D. Aeration
[B]Flaw: Stem includes an “instructional
aside.”
M-C Item Flaws
While ironing her formal, Jane burned her
hand accidentally on the hot iron. This
was due to a transfer of heat by
A. conduction.
B. radiation.
C. conversion.
D. absorption.
[A]Flaw: Stem includes “window dressing.”
The introduction implies a practical
problem when the item only involves
knowledge of technical terms.
M-C Item Flaws
In the definition of a mineral, which of the
following is incorrect?
A.
B.
C.
D.
It was produced by geologic processes.
It has distinctive physical properties.
It contains one or more elements.
It has a variable chemical composition.
[D]Flaw: Uses a negative in the stem; tends
to be confusing. These types of items are
rarely found outside the classroom.
M-C Item Flaws
Which event is more important in
American history?
A.
B.
C.
D.
Braddock’s defeat.
Burr’s conspiracy.
Hayes-Tilden contest.
Webster-Hayne debate.
Flaw: No best answer. Who’s to say which
is more important. Even experts would
not agree.
M-C Item Flaws
The population of Denmark is about
A.
2 million.
B.
15 million.
C.
4 million.
D.
7 million.
Flaw: Unnatural sequence of responses. It
would be better to order from 2 million to
15 million.
M-C Item Flaws
The balance sheet report for the Ajax
Canning Company would reveal (A) the
company’s profit for the previous fiscal
year, (B) the amount of money owed to
its creditors, (C) the amount of income
tax paid, or (D) the amount of sales for
the previous fiscal period.
[A]Flaw: Placing distracters in tandem with
the item stem.
M-C Item Flaws
Which is the best definition of a vein?
A.
B.
C.
D
A blood vessel carrying blood going to the
heart.
A blood vessel carrying blue blood.
A blood vessel carrying impure blood.
A blood vessel carrying blood away from the
heart.
[A]Flaw: Needless repetition in the
distracters.
End
Download