Item Writing Workshop

advertisement
Chris Orem
Jerusha Gerstner
Christine DeMars
 Presentation


and Discussion
Item writing guidelines
Examples
 Develop,
Evaluate, and Revise Items
Test blueprint


A test blueprint is a table of specifications that weights
each objective according to how important it is or how
much time is spent covering that objective, links
objectives to test items and summarizes information
Essential for Competency Testing
BLUEPRINT TIPS
-List objectives in a table
-Identify length of test
-Designate number of items per objective
-Evaluate the importance of each objective and assign
items accordingly
-Often assign one point per item
Objective
Weighting
Items
Objective 1
25%
10
Objective 2
25%
10
Objective 3
25%
10
Objective 4
25%
10
Writing Items
Type of Item
Construction
Scoring
True/False
Difficult
Easy
Matching
Easy
Easy
Completion
Easy
Difficult
Multiple Choice
Difficult
Easy
Essay
Easy
Difficult
Some items are more appropriate when testing
different kinds of knowledge; or when tapping into
different kinds of cognitive processes.
 We’ll focus on multiple choice items because
recruiting faculty to score open-ended items would
be difficult.

http://testing.byu.edu/info/handbooks/betteritems.pdf
Content
Style/format
Writing the
stem
Writing the
distracters
Content
 Focus
on a single problem when writing an item
 Use new situations to assess application


Avoids memorization exercises
Allows for synthesis and evaluation
 Keep

content of items independent
students shouldn’t be able to use one item to
answer another, although a set of items may tap
into a shared scenario
 Avoid
opinion-based items
 Address
a mix of higher-order and lowerorder thinking skills.
Bloom
Haladyna
Evaluation
Problem Solving
Synthesis
Evaluating
Analysis
Higher
Order
Skills
Predicting
Application
Comprehension
Knowledge
Lower
Order
Skills
Defining
Recalling
 Try
item stems such as "If . . ., then what
happens?", "What is the consequence of . . .?", or
"What would happen if . . .?" (predicting)
 Ask
students to make a decision based on
predetermined criteria, or to choose criteria to
use in making a decision, or both (evaluating).
 Require
the student to use combinations of
recalling, summarizing, predicting, and
evaluating to solve problems.
Style/format






Avoid excess words – be succinct
Use specific, appropriate vocabulary
Avoid bias (age, ethnicity, gender, disabilities)
Write stems and options in third person
Underline or bold negative or other important
words
Have others review your items
Writing the stem




The stem should clearly state the problem
 Place the main idea of the question in the stem, not the item
options
Keep the stem as short as possible
Don’t provide clues to correct answer in stem (e.g., grammatical
clues)
 If the stem is a complete sentence, end with a period and begin
all response options with upper-case letters.
 If the stem is an incomplete sentence, begin all response options
with lower case letters.
Use negative stems rarely
Writing the distracters






Make sure there is only one correct answer for each item
Develop as many effective plausible options as possible,
but three are sufficient (Rodriguez, 2005)
 It is better to have fewer options than to write BAD
options to meet some quota!
Vary the location of the correct answer when feasible (Flip
a coin), or put options in logical order (e.g. chronological,
numerical)
Avoid excessive use of negatives or double negatives
Keep options independent
Keep options similar (in format)
 Length and wording
Writing the distracters


DO use as distracters:
 common student misconceptions (perhaps from open-ended
responses from previous work)
 words that “ring a bell” or “sound official”
 responses that fool the student who has not mastered the
objective
DO NOT use as distracters:
 responses that are just as correct as the right answer
 implausible or silly distracters

Use “all of the above” and “none of the above” sparingly

Don’t use “always” or “never”

Don’t give clues to the right answer
The best way to increase the reliability of a
test is to:
A. increase the test length
B. removing poor quality items
C. Tests should be readable for all test takers.
What’s wrong with this item?
Stem should state the problem
California:
A). Contains the tallest mountain in the United States
B). Has an eagle on its state flag.
C). Is the second largest state in terms of area.
*D). Was the location of the Gold Rush of 1849.
What is the main reason so many people moved to California in 1849?
A). California land was fertile, plentiful, and inexpensive.
*B). Gold was discovered in central California
C). The east was preparing for a civil war.
D). They wanted to establish religious settlements.
Bleeding of the gums is associated with
gingivitis, which can be cured by the sufferer
himself by brushing his teeth daily.
A. true
B. false
What’s wrong with this item?
More than One Possible Answer
The United States should adopt a foreign policy based on:
A). A strong army and control of the North American continent.
B). Achieving the best interest of all nations.
C). Isolation from international affairs.
*D). Naval supremacy and undisputed control of the world’s sea lanes.
According to Alfred T. Mahan, the United States should adopt a foreign
policy based on:
A). A strong army and control of the North American continent.
B). Achieving the best interest of all nations.
C). Isolation from international affairs.
*D). Naval supremacy and undisputed control of the world’s sea lanes.
Following these rules does not
guarantee that items will perform
well empirically. Testing companies,
using paid item-writers and detailed
writing guidelines, ultimately only use
1/3 (or fewer) of the items
operationally.
It is normal and expected to have to
revise items after pilot-testing .
ITEM ANALYSIS STEPS:
•Item difficulty
- proportion of people who answered the item correctly
- an item should not be too easy or too difficult
- problems arise from poor wording, trick questions, or
speediness
• Item Discrimination
- correlation of item and total test, should be higher than 0.2
- item as an indicator of the overall test score
- the higher the better
- can be (but shouldn’t be) negative
• Distractor Analysis
- frequency of response selections
- item is problematic if people with a high overall score are
selecting incorrect responses frequently
 Ultimately,
you want to know whether
students are achieving your objectives
 Tests are used to indirectly measure these
knowledge, skills, attitudes, etc.
 Items you write are a sample of all
possible items to measure that objective

the more you write (and the better your items
are!) the more reliably you can measure the
objective
 You
want to be sure that the items you
create are measuring achievement of the
objective, and NOT test-wiseness, reading
ability, or other factors




Haladyna, T. M. (1999). Developing and validating
multiple-choice test items. Mahwah, NJ: Lawrence
Erlbaum Associates.
Rodriguez, M. C. (2005). Three options are optimal for
multiple-choice items: A meta-analysis of 80 years of
research. Educational Measurement: Issues and Practice,
24, 3-13.
Downing & Haladyna (2006). Handbook of test
development. Mahwah, NJ: Lawrence Erlbaum Associates.
Burton S.J. et al (1991). How to prepare Multiple-Choice
Test Items: Guidelines for University Faculty. Brigham
Young University Testing Services.
Download