# Item Analysis - Schreyer Institute for Teaching Excellence

```Item Analysis:
Improving Multiple
Choice Tests
Crystal Ramsay
September 27, 2011
Schreyer Institute
for Teaching Excellence
things:
To interpret statistical indices
provided by the university’s
Scanning Operations
To differentiate between
well-performing items and
poor-performing items
poor performing items
We give tests for 4 primary reasons.
To find out if students learned
what we intended
To separate those who
learned from those who
didn’t
To increase learning and
motivation
To gather information for
Multiple choice items are comprised of 4 basic
components.
Stem
The rounded filling of an internal angle
between two surfaces of a plastic
molding is known as the
Options
A. rib.
B. fillet.
C. chamfer.
D. Gusset plate.
Distracters
Key
An item analysis focuses on 4 major pieces of
information provided in the test score report.
Test Score Reliability
Item Difficulty
Item Discrimination
Distracter information
Test score reliability is an index of the likelihood that
scores would remain consistent over time if the same
test was administered repeatedly to the same learners.
Reliability
coefficients range
from .00 to 1.00.
Ideal score
reliabilities are &gt;.80.
Higher reliabilities =
less measurement
error.
Now look at the test score reliability from your exam.
Item Difficulty is the percentage of students who
Represented in the Response
Table as KEY-%
RESPONSE TABLE - FORM A
ITEM
NO. OMIT
%
1
0
2
0
3
0
A
%
0
79
4
B
%
18
0
7
C
%
82
0
89
D
%
0
21
0
E
%
0
0
0
KEY- %
C
A
C
82
79
89
ITEM
EFFECT
0.22
0.23
-0.12
Ranges from 0% to 100%
ITEM
NO.
Easier items have
higher item
difficulty values.
More difficult
items have lower
item difficulty
values.
RESPONSE TABLE –FORM A
KEY -
%
ITEM
EFFECT
0
C
96
0.18
0
0
A
100
0.00
0
95
E
95
-0.11
OMIT
A
B
C
D
E
%
%
%
%
%
%
4
0
0
4
96
0
5
0
100
0
0
6
0
0
0
5+
ITEM
NO.
RESPONSE TABLE –FORM A
OMIT
A
B
C
D
E
%
%
%
%
%
%
8
0
0
43
0
57
9
0
7
4
0
10
0
5
12
27
ITEM
EFFECT
KEY -
%
0
D
57
0.46
75
14
D
75
-0.19
31
25
D
31
0.10
What is an ‘ideal’ item difficulty statistic depends on
2 factors.
Number of
alternatives for
each item
question
Sometimes we include very easy or very difficult
items on purpose.
Did I deliberately pose difficult items to challenge my
students’ thinking?
Did I deliberately pose easy items to test basic
information or to boost students’ confidence?
Now look at the item difficulties from your exam.
Which items were
easier for your
students?
Which items were
more difficult?
Item Discrimination is the degree to which students with high
overall exam scores also got a particular item correct.
Represented as Item Effect
because it tells how well an
item ‘performed’
RESPONSE TABLE - FORM A
ITEM
NO. OMIT
%
1
0
2
0
3
0
A
%
0
79
4
B
%
18
0
7
C
%
82
0
89
D
%
0
21
0
E
%
0
0
0
KEY- %
C
A
C
82
79
89
ITEM
EFFECT
0.22
0.23
-0.12
Ranges from -1.00 to 1.00
and should be &gt;.2
A wellperforming item
ITEM
NO. OMIT
8
A poorperforming item
A
B
C
D
E
%
%
%
%
%
%
0
0
43
0
57
0
ITEM
NO. OMIT
6
RESPONSE TABLE –FORM A
KEY -
%
D
57
0.46
KEY -
%
ITEM
EFFECT
E
95
-0.11
RESPONSE TABLE –FORM A
A
B
C
D
E
%
%
%
%
%
%
0
0
0
5+
0
95
ITEM
EFFECT
What is an ‘ideal’ item discrimination statistic
depends on 3 factors.
Item Difficulty
Test heterogeneity
Item characteristics
Item difficulty
Very easy or very
difficult items
will have poor
ability to
discriminate
among students.
Yet…
Very easy or very
difficult items may
still be necessary to
sample content
taught.
Test heterogeneity
A test that assesses
many different
topics will have a
lower correlation
with any one
content-focused
item.
Yet…
A heterogeneous item
pool may still be
necessary to sample
content taught.
Item quality
A poorly written
item will have little
ability to
discriminate
among students.
and…
There is no substitute
for a well-written item
or for testing what you
teach!
Now look at the item effects from your exam.
Which items on
performed ‘well’?
Did any items
perform ‘poorly’?
Distracter information can be analyzed to determine which
distracters were effective and which ones were not.
RESPONSE TABLE - FORM A
ITEM
NO. OMIT
%
1
0
2
0
3
0
A
%
0
79
4
B
%
18
0
7
C
%
82
0
89
D
%
0
21
0
E
%
0
0
0
KEY- %
C
A
C
82
79
89
ITEM
EFFECT
0.22
0.23
-0.12
Now look at the distracter information for items from
Whether to retain, revise, or eliminate items depends
on item difficulty, item discrimination, distracter
Item Difficulty
Item
Discrimination
Distracters
Instruction
Ultimately, it’s a
judgment call that
you have to make.
What if I have a relatively
short test or I give a test
in a small class? I might
not use the testing
service for scoring. Is
there a way I can
understand how my
items worked?
Yes.
Item 1
Top 1/3
Bottom 1/3
A
B*
C
10
3
D
1
4
Item 2
Top 1/3
Bottom 1/3
A*
8
B
C
2
3
D
Item 3
Top 1/3
Bottom 1/3
A
5
2
B
C*
1
4
D
4
4
Item 4
Top 1/3
Bottom 1/3
A*
10
9
B
C
D
7
1
1. Which item is the easiest?
2. Which item shows negative (very bad) discrimination?
3. Which item discriminates best between high and low scores?
4. In Item 2, which distracter is most effective?
5. In Item 3, which distracter must be changed?
2
From: Suskie, L. (2009).
Assessing student
learning: A common
sense guide (2nd ed.). San
Francisco: Jossey-Bass.
Even after you consider reliability, difficulty, discrimination, and
distracters, there are still a few other things to think about…
Multiple course sections
Student feedback
Other item types
