Elena Kardanova
University of Ostrava
Czech republic
26-31, March, 20122
The family of Rasch measurement models is a way to make sense of the world.
Benjamin D. Wright
The simplest models that provide parameter invariance
Include minimal number of parameters
Parameters have simple interpretation, can be easily estimated (on the interval scale with estimate of precision)
Can be applied to all item types which use in educational and psychological tests
Theory of item and examinee analysis is well developed
All specific testing problems can be easily solved
Dichotomous Rasch Model
Partial Credit Model
Rating Scale Model
Binomial and Poisson Models
Many-Facet Rasch Model
Multidimensional Rasch Model
The number of response categories: two vs. more than two
The structure of response alternatives in polytomous items: common vs. individual
The number of attempts to an item: one attempt vs. more than one
The number of examinee parameters: one ability vs. more than one
The number of factors influencing the examinee performance: only item difficulty vs. plus additional factors
Winsteps (Dichotomous Model, PCM,
RSM, Binomial and Poisson Models)
ConQuest (all models except Binomial and
Poisson Models)
Facets (Many-Facet Model)
Other IRT software (depends on the software)
ni
1 /
n
, )
exp(
n
n
i
) i
)
(
P(X ni n
=1,…,
=1/
θ n
,
δ i
) is the probability that an examinee n
N ) with ability θ n answers item i ( i
=1,…,
I ) with difficulty δ i correctly.
The model is called one-parameter because the probabilty P ni is a function of difference ( θ n
-
δ i
).
The model is also called logistic because the function is logistic
δ i
– the point on the ability scale where the probability of a correct response is
0.5. The greater the value of this parameter, the greater the ability that is required for an examinee to have a 50% chance of getting the item correct; hence, the harder the item.
In theory δ i parameter can vary from ∞ to +∞, but typically values of δ i vary from about -3 to +3.
δ
1
δ
2
δ
3
ICCs differ only in their location along the ability scale, they don’t cross (are parallel).
Item difficulty is the only item characteristic that influences examinee performance.
All items are equally discriminating.
The lower asymptote of the ICC is zero: examinees of very low ability have zero probability of correctly answering the item (no guessing).
An ability level of any examinee is defined as logarithm chance for this examinee to answer correctly an item with 0 difficulty:
n ln
1
P n 1
P n 1
,
A difficulty level of any item is defined as logarithm chance to answer correctly this item by an examinee with 0 ability:
i
ln
1
P
1 i
P
1 i
ln
1
P ni
P ni
n
i
Log odds that a person passes an item is just difference between examinee ability level and item difficulty.
Item and examinee parameters are completely separated, making it possible to estimate examinee ability independently of item difficulty, and to estimate item difficulty independently of examinee ability.
Item and examinee parameters lie on the same linear scale.
The unit of measurement on this scale is one logit (shortening of log-odds unit – the unit of logarithm chances).
Comparisons between objects must be invariant over the specific conditions under which they were observed :
- comparisons between persons must be invariant over the specific items used to measure them,
- comparisons between items must be invariant over the specific persons used to calibrate them.
Only Rasch models guarantee this property.
Invariant-Person Comparisons: the same differences are observed regardless of the items
Consider the Rasch model predictions for log odds ratio for two persons with abilities θ
1 and θ
2 for an item with difficulty δ i
: ln
1
P
1 i
P
1 i
1 i
, ln
1
P
2 i
P
2 i
2 i
Subtracting the differences yields the following: ln
1
P
1 i
P
1 i
ln
1
P
2 i
P
2 i
(
1 i
)
(
2
i
)
1
2
Thus, the difference in log odds for any item is simply the difference between the two abilities: the item difficulty δ i dropped out of the equation.
So, the same difference in performance between the two persons is expected, regardless of item difficulty.
Invariant-Item Comparisons: differences between items don’t depend on the particular persons used to compare them
Consider two items with difficulties δ
1 and the log odds of two items for any person n :
δ
2 and the following two equations for ln
1
P n 1
P n 1
n
1
, ln
1
P n
2
P n 2
n
2
Subtracting the differences yields: ln
1
P n 1
P n 1
ln
1
P n
2
P n 2
(
n
1
)
(
n
2
)
2
1
The ability level dropped out of the equation. So, the expected difference in performance for any examinee is the difference between item difficulties.
Other IRT Models (2PL and 3PL) fail to meet
“specific objectivity”:
For example, comparison of two persons in the framework of 2PL model yields the following: ln
1
P
1 i
P
1 i
a i
(
1
i
), ln
1
P
2 i
P
2 i
a i
(
2
i
)
And further ln
1
P
1 i
P
1 i
ln
1
P
2 i
P
2 i
a i
(
1
i
)
a i
(
2
i
)
a i
(
1
2
)
The right part of this equation contains a discrimination parameter a i of the item. So, unlike the Rasch model, the expected difference in performance does not depend only on abilities; it is proportional to their difference with the proportion a i depending on the particular item.
Total number of parameters to be estimated in dichotomous Rasch model is N+I, where N is the number of examinees, I is the number of items .
Methods of mathematical statistics are used for parameter point estimation. Most estimation methods employ some form of the method of maximum likelihood (without distributional assumptions or with distributional assumptions regarding the parameters).
Under Rasch model raw scores are sufficient statistics for both items and persons measures. It means that all examinees with the same raw score will get the same ability estimate. Similarly for items. Due to this property, all measures can be estimated simultaneously.
π ni 0 and
π ni 1
– probabilities of getting by an examinee score 0 and 1 for item i.
In dichotomous case
π ni 1
= P ni and
π ni 0
= 1-
π ni 1
= 1P ni
.
A simple extension of dichotomous Rasch model: one or more intermediate levels of performance are allowed.
Different levels of performance are labelled 0 (no steps taken), 1, 2 , …, m (the highest level of performance possible).
In order to reach the highest category m , an examinee must complete m steps consecutively, getting 1 point for each of them. Each step can be taken only if the previous step has been completed.
Difficulty of each step doesn’t depend on difficulties of other steps.
m=
Performance levels: 0 (absolutely correct, superior quality) ,1 (particular correct, good quality) and 2
(incorrect, poor quality).
An item has an intermediate scoring level which allows to award an additional point for particular completed item.
Such item has three possible categories and two steps.
The probability of completing each step can be described by a Rasch model:
P ni 1
exp
n
i 1
n
i 1
, P ni 2
exp(
n
n
i
2
) i 2
)
P ni 1
- probability of person n scoring 1 rather 0 on item
P ni 2
θ n
- probability of person n scoring 2 rather 1 on item
- ability level of examinee n i
δ i 1 and δ i 2
– step difficulties in item i . i
nik
exp (
n j k
0
ij
)
,
l m i
0 l exp (
n
j
0 ij
)
π
θ n nik is the probability of examinee n with ability to get score k for item i . k is the count of the completed item steps. k =0,1,…, m i steps.
, where m i is the number of item
Category Probability Curves for Two-Step Item for the case δ i 1
>
δ i 2
When the second step is easier than the first, the probability curve for the middle response category doesn’t dominate on any part of the ability scale.
Even though the second step is easier than the first, the defined order of the response categories requires that this easier second step be undertaken only after the harder first step has been successfully completed.
ln
nik
(
1)
n
ik
For any step k log odds for this examinee is only defined by the difficulty of the step
ik
These operating curves have the same slope (so don’t cross) and differ only in their location on the ability continuum.
Item Characteristic Curve for two-step item
ICC for polytomous item represents an expected score on the item as a function of examinee ability level
Unlike ICCs in the dichotomous Rasch model, ICCs of different polytomous items are not parllel, they can cross
Can be considered as a particular case of PCM when all items have the common response format (for example,
Likert scale)
Usually is used to collect attitude data
Each item is provided with a stem (or statement of attitude) and a few response alternatives where a respondent is required to chose one, indicating the extent to which the statement in the stem is endorsed
Thus, all items have m response alternatives and they are the same for all items . Completing of the k -th step can be considered as choosing the k -th alternative over the ( k -1)-th in response to the item.
Has 4 or 5 categories: Strongly Disagree , Disagree , Undecided (or
Neutral ) - may be omitted, Agree , Strongly Agree:
SD D N A SA
Response alternatives are ordered to represent a respondent’s increasing inclination towards the concept questioned
A person who chooses to Agree with a statement on an attitude questionnaire can be considered to have chosen Disagree over
Strongly Disagree (1-st step taken), and also Neutral over Disagree
(2-nd step taken), and also Agree over Neutral (3-rd step taken), but to have failed to choose Strongly Agree over Agree (4-th step not taken).
All responses are coded as 1,2,3,4,5, where the higher number indicates a higher degree of agreement with the statement.
Consider two statements from the test of computer anxiety :
I am so afraid of computers I avoid using them SD D N A SA
I am afraid that I’ll make mistakes when I use my computer SD D N A SA
It is more than likely that the first stem indicates much higher levels of computer anxiety that does the second stem. Indeed, the children who respond SA on the “mistakes” stem might endorse N on the
“avoid using” stem. And we should use :
I am so afraid of computers I avoid using them SD D N A SA
I am afraid that I’ll make mistakes when I use my computer SD D N A SA
The first item can be considered as more difficult than the second item. So each item can be accorded a difficulty estimate (location of the item on the variable axis)
As the same set of rating points is used with every item, it is usually thought that the relative difficulties of the steps in each item should not vary from item to item.
The pattern of item steps around an item location is supposed to be determined by the fixed set of threshold parameters, that is fixed set of rating points used with all items.
These threshold parameters are estimated once for the entire item set.
Difficulty of any step can be resolved into two components :
ik
i k
δ ik
- difficulty of completing the k -th step or choosing the k -th alternative in the response to the item i
δ i
τ k
the location of item i (item difficulty )
– the location of the k
– th step in each item relative to that item’s location (threshold parameter for k th step)
The only difference between items is the difference in their location on the variable (or difference in their difficulty). The pattern of item steps around this location is described by the threshold parameters τ k
, k = 1,…, m , that is fixed set of rating points used with all items.
Probabilities of passing each threshold can be described by a Rasch model (for two twostep items):
P ni 1
exp(
1 exp( n
n
(
( i
i
1
))
1
))
,
P nj 1
exp(
n
(
j
1
))
n
(
j
1
))
,
P ni 2
exp(
1 exp( n
n
(
( i
i
2
))
2
))
P nj 2
exp(
n
(
j
2
n
(
j
))
2
))
P nik
- probability of person n scoring k rather k1 (choosing the k th alternative over ( k 1 )-th) in response to the item i; k= 1,2 .
θ n
δ i
τ i 1
- ability level of examinee and
τ i 2 n
– the location of item i on the variable axis (item difficulty)
– threshold parameters in item i .
Item Operating Curves (Step Characteristics
Curves) for two Rating Scale Items with
Three Response Categories
nik
exp j k
0
(
n
(
i
j
))
,
l m
0 exp j l
0
(
n
(
i
j
))
π
θ n nik is the probability of examinee n with ability to get score alternative). k for item i (to chose the k -th k =0,1,…, m , where m is the number of item steps in any item.
ln
nik
(
1)
n
(
i
k
)
For any step k (or the k -th response category) log odds of choosing the category over the previous adjacent one for this examinee is only defined by the difficulty of the item δ i difficulty of the k -th step τ k and