Modeling item responses when different subjects employ different

advertisement
PSYCHOMETR1KA--VOL. 55, NO. 2, 195-215
JUNE 1990
MODELING ITEM RESPONSES WHEN DIFFERENT SUBJECTS EMPLOY
DIFFERENT SOLUTION STRATEGIES
ROBERT J. M1SLEVY
EDUCATIONAL TESTING SERVICE
NORMAN VERHELST
CITO
(NATIONAL INSTITUTE FOR EDUCATIONAL MEASUREMENT)
ARNHEM, THE NETHERLANDS
A model is presented for item responses when different subjects employ different strategies, but only responses, not choice of strategy, can be observed. Using substantive theory to
differentiate the likelihoods of response vectors under a fixed set of strategies, we model
response probabilities in terms of item parameters for each strategy, proportions of subjects
employing each strategy, and distributions of subject proficiency within strategies. The probabilities that an individual subject employed the various strategies can then be obtained, along
with a conditional estimate of proficiency under each. A conceptual example discusses response
strategies for spatial rotation tasks, and a numerical example resolves a population of subjects
into subpopulations of valid responders and random guessers.
Key words: differential strategies, EM algorithm, item response theory, linear logistic test
model, mixture models.
Introduction
The standard models of item response theory (IRT), such as the 1-, 2-, and 3parameter normal and logistic models, characterize subjects in terms of their propensities to make correct responses. Consequently, subject parameter estimates are
strongly related to simple percent-correct scores (adjusted for the average item difficulties, if different subjects are presented different items). Item parameters characterize
the regression of a correct response (x = 1 as opposed to 0) on this overall propensity
toward correctness.
These models lend themselves well to tests in which all subjects employ the same
strategy to solve the items. Comparisons among estimates of subjects' ability parameters are meaningful comparisons of their degrees of success in implementing the strategy. Item parameters reflect the number or complexity of the operations needed to
solve a given item (Fischer, 1973).
The same models can prove less satisfactory when different subjects employ different strategies. The validity of using scores that convey little more than percentcorrect to compare subjects who have used different strategies must first be called into
question. And item parameters keyed only to a generalized propensity toward correctThe first author's work was supported by Contract No. N00014-85-K-0683, project designation NR
150-539, from the Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval
Research. We are grateful to Murray Aitkin, Isaac Bejar, Neil Dorans, Norman Frederiksen, and Marilyn
Wingersky for their comments and suggestions, and to Alison Gooding, Maxine Kingston, Donna Lembeck,
Joling Liang, and Kentaro Yamamoto for their assistance with Example 2.
Requests for reprints should be sent to Robert J. Mislevy, Educational Testing Service, Princeton, NJ
08541.
0033-3123/90/0600-9534 $00.75/0
© 1990 The Psychometric Society
195
196
PSYCHOMETRIKA
ness will not reveal how a particular kind of item might be easy for subjects who follow
one line of attack, but difficult for those who follow another.
Extensions of IRT to multiple strategies have several potential uses. In psychology, such models would provide a rigorous framework for testing alternative theories
about cognitive processing (e.g., Carter, Pazak, & Kail, 1983). In education, estimates
of how students solve problems could be more valuable than how many they solve, for
the purposes of diagnosis, remediation, and curriculum revision (Messick, 1984). And
even when a standard IRT model would provide reasonable summaries and meaningful
comparisons for most subjects, an extended model allowing for departures along predetermined lines, such as random responding, would reduce estimation biases for the
parameters in the standard model.
In contrast to standard IRT models, and, for that matter, to the true score models
of classical test theory, a model that accommodates alternative strategies must begin
with explicit statements about the processes by which subjects arrive at their answers.
For example, items may be characterized in terms of the nature, number, and complexity of the operations required to solve them under each strategy that is posited. As
we shall see, the psychological theory that must underlie an attempt to model mixed
strategies will probably be lacking for most conventional tests. The most profitable uses
of the methods we propose will be in applications in which a relatively strong psychological or educational theory has been or can be developed, and the practical decisionmaking problem concerns strategy usage.
The recent psychometric literature contains a few implementations of these ideas.
Tatsuoka (1983) has studied performance on mathematics items in terms of the application of correct and incorrect rules, locating response vectors in a two-dimensional
space, where the first dimension is an estimated ability parameter from a standard IRT
model and the second is an index of lack of fit from that model. Paulson (1985),
analyzing similar data but with fewer rules, uses latent class models to relate the
probability of correct responses on an item to the features it exhibits and the rules that
subjects might be following to solve it. Yamamoto (1987) combines aspects of both of
these models, positing subpopulations of IRT respondents and of nonscalable respondents associated with particular expected response patterns. Samejima (1983) and Embretson (1985) offer models for alternative strategies in situations where subtask results
can be observed in addition to the overall correctness or incorrectness of an item.
The present paper describes a family of multiple-strategy IRT models that apply
when each subject belongs to one of a number of exhaustive and mutually-exclusive
classes that correspond to item-solving strategies, and the responses from all subjects
in a given class accord with a standard IRT model. It is further assumed that for each
item, its parameters under the IRT model for each strategy class can be related to
known features of the item through psychological or substantive theory.
The following section gives a general description of the model. A conceptual example illustrates some of the key ideas. A two-stage estimation procedure is then
presented. The first stage estimates structural parameters: basic parameters for test
items, proportions of subjects following each strategy, and proficiency distributions
within each. The second stage estimates posterior distributions for individual subjects:
the probability that they belong to each strategy class, and a conditional distribution of
their ability corresponding to each class. A numerical example resolves subjects into
classes of valid responders and random guessers. The final section discusses prospects
of the approach for educational and psychological testing.
ROBERT J. MISLEVY AND NORMAN VERHELST
197
The Response Model
This section lays out the basic structure of a mixture of constrained item response
models. Discussion will be limited to dichotomous items for convenience, but extensions to polytomous, continuous, and vector-valued observations are straightforward.
We begin by briefly reviewing the general form of an IRT model. The probability
of response x 0 (I if correct, 0 if not) from Subject i to i t e m j is given by an IRT model
as
p(xolOi,/3 s) = [f(0i,/3s)]x0[1 -f(Oi,/3s)] 1 -xo
(1)
where Oi and/3j are (possibly vector-valued) parameters associated with subject i and
item j respectively, and f is a known, twice-differentiable, function whose range is the
unit interval. Under the usual IRT assumption of local independence, the conditional
probability of the response pattern xi = (xil . . . . .
Xin) of subject i to n items is the
product of n expressions like (1):
p(x;lO;, 13)= 12I p(xolOi, ~s).
j=l
It may be possible to express item parameters as functions of some smaller number
of more basic parameters at = ( a l , . . . , aM) that reflect the effects of M salient
characteristics of items; that is,/3j =/3fiat). An important example of this type is the
linear logistic test model (LLTM; Fischer, 1973; Scheiblechner, 1972). Under the
L L T M , the item response function is the one-parameter logistic (Rasch) model, or
p(xij]Oi,
[3j(at))=
exp [xo(Oi -/3j)]
1 + exp (Oi - f3j)
and the model for item parameters is linear:
M
Qsm m = Q/at.
m=l
The elements of at are contributions to item difficulty associated with the M characteristics of items, presumably related to the number or nature of processes required to
solve them. The elements of the known vector Qj indicate the extent to which item j
exhibits each characteristic. (To isolate the indeterminacy of origin in the L L T M ,
Fischer wrote 13 = Q'at + c l , where 1' = (1 . . . . .
t) and c is an arbitrary constant. This
is subsumed in the form used in this paper by incorporating 1 into Q and c into at. The
indeterminacy can be resolved by enforcing a constraint such as E/3 = 0 or E(O) = 0.)
Fischer (1973), for example, modeled the difficulty of the items in a calculus test in
terms of the number of times an item requires the application of each of seven differentiation rules; Qjm was the number of times that rule m must be employed to solve
item j.
Consider now a set of items that may be answered by means of K different strategies. It need not be the case that all are equally effective, nor even that all generally
lead to correct responses. Not all strategies need be available to all subjects. We make
the following assumptions:
198
PSYCHOMETRIKA
1. Each subject is applying the same one of these strategies for all the items in the
set. (The final section discusses how to relax this assumption to handle strategy-switching.)
2. The responses of a subject are observed but the strategy the subject has employed is not.
3. The responses of subjects following strategy k conform to an item response
model of a known form.
4. Substantive theory associates the observable features of items with the probabilities of success for members of each strategy class. The relationships may be known
fully, or only partially as when the Q matrices in LLTM-type models are known but the
basic parameters are not.
Define a K-dimensional subject parameter dp to indicate strategy usage, letting the
k-th element in dpi take the value one if subject i follows strategy k, and zero if not.
Extending the notation introduced above, we write the probability of response pattern
xi, conditional on the subject parameters dpi and 0i, as
p(xil*i, Oi, or) = I-I
k
[fk(Oik, fljk)]x~[1 -- fk(Oik, fljk)] 1-x0
1
,
(2)
where fljk =- ~jk(or) gives the item parameter(s) for item j under strategy k and 0i (Oil . . . . .
OiK) gives the proficiencies of subject i under the K strategies. For brevity,
let 13k ~__ (/31k(~t). . . . .
/3nk(~t)) denote the vector of parameters of all n items with
respect to strategy k.
It will be natural in certain applications to partition basic parameters for items in
accordance with strategy classes; that is, at = (at 1. . . . . et/¢). When the strategies can
be defined through K versions of the LLTM, as in Example I below, the differential
expectations of success on item j under the various strategies are conveyed by the K
different vectors Qjk, k = I . . . . .
K, that relate the item to each of the strategies:
~ j k "~- ~
m
ajkmOtkm=
QjkOtk •
Here the item difficulty parameter for item j under strategy k is a weighted sum of
elements in ~tk, the basic parameter vector associated with strategy k. The weights Qjkm
indicate the degree to which each of the features m, as relevant under strategy k, are
present in item j.
Example 1: Alternative Strategies f o r Spatial Tasks
The items of some tests intended to measure spatial visualization ability admit to
solution by nonspatial analytic strategies (French, 1965; Lohman, 1979; Pellegrino,
Mumaw, & Shute, 1985). Consider items in which subjects are shown a drawing of a
three-dimensional target object, and asked whether a stimulus drawing could be the
same object after rotation in the plane of the picture. In addition to rotation, one or
more key features of the stimulus may differ from the those of the target. A subject can
solve the item either by rotating the target mentally the required degree and recognizing
the match (Strategy 1), or by employing analytic reasoning to detect feature matches
without performing rotation (Strategy 2).
Consider further a hypothetical three-item test comprised of such items. Each item
will be characterized by (a) rotational displacement of 60, 120, or 180 degrees, and by
(b) the number of features that must be matched. Table I lists the characteristics of the
items in the hypothetical test.
199
ROBERTJ. MISLEVY AND NORMAN VERHELST
TABLE i
Item Features
Item
rotational displacement
salient features
1
60 degrees
3
2
120 degrees
2
3
180 degrees
1
Each subject i is characterized by two vectors. In the first, ~bi -= (t~il, t~i2), where
the value 1 if subject i employs strategy k and 0 if not. In the second, 0 i
(0il, 0i2), where Oik characterizes the proficiency of subject i if the subject employs
strategy k. Only one of the elements of 0 i is involved in producing subject i's responses,
but we do not know which one.
Suppose that for subjects employing a rotational strategy, probability of success is
given by the Rasch model:
ckik takes
P(Xo]Oil, fljl,
exp
~1 = 1) =
[Xu(Oil
1 + exp
(Oil
--
fljl)]
- ~jl)"
Here 0il is the proficiency of subject i at solving tasks by means of the rotational
strategy, and/3jl is the difficulty of item j under the rotational strategy.
It is well established that the time required to solve mental rotation tasks is often
linearly related to rotational displacement (e.g., Cooper & Shepard, 1973). To an approximation, so are log-odds of success (Tapley & Bryden, 1977). For the sake of the
example, suppose that under the rotational strategy, item parameters take the following
form:
~jl = QjllO~ll + o~12,
where Q511 encodes the rotational displacement of item j - - 1 for 60 degrees, 2 for 120
degrees, and 3 for 180 degrees--and all is the incremental increase in diffculty for each
increment in rotation; and ~12 is a constant term, with Qjl2 --= 1 implicit for all j. If
a n = I and oq2 = - 2 , the item parameters/3jl that are in effect under Strategy 1 would
be as shown in the second column of Table 2. (A subsequent section shows how to
estimate these quantities.)
A Rasch model will also be assumed for subjects employing Strategy 2, the analytic
strategy, but here the item parameters depend on the number of features that must be
matched:
flj2 = Qj21~21 + O~22,
where Qv21 is the number of salient features, a21 is the incremental contribution to item
difficulty of an additional feature, aEe is a constant term, and Qa22 ~ 1 implicitly for all
200
PSYCHOMETRIKA
TABLE 2
Item Difficulty Parameters
Item
Strategy i
Strategy 2
1
-I.0
2.0
2
0.0
0.5
3
1.0
-i.0
items. If azl = 1.5 and ct22 = - 2 . 5 , the item parameters in effect under Strategy 2 are
the values in the third column of Table 2.
Note that the test has been constructed so that items that are relatively hard under
one strategy are relatively easy under the other. Inferring strategy choice from observed response patterns is possible only if at least some patterns are more likely under
some strategies than others.
The response pattern (011), for example, has a correct answer to an item that is
easy under Strategy 2 but hard under Strategy 1, and an incorrect answer to an item that
is hard under Strategy 2 but easy under Strategy I. Figure 1 plots the likelihood functions for x = (0ll) under both strategies; that is, p[x ---- (011)10k, ~bk = 1] for k = I, 2 as
a function of 01 and of 02, respectively. The maximum of the likelihood under Strategy
2 is about eight times higher than the maximum under Strategy 1.
We can draw inferences about individual subjects if we know the proportions of
people who choose each strategy, or ~rk = P(4~k = 1), and the distributions of proficiency among those using each strategy class, or 9k(Ok) = P(Oklchk = 1). We shall
discuss in the sequel how to estimate these quantities, but for now, assume some
illustrative values, and examine the implications that follow for measuring individuals.
Suppose that (i) 01 and 02 both follow standard normal distributions among the subjects
that employ Strategies 1 and 2, respectively, and (ii) three times as many subjects use
Strategy 1 as use Strategy 2 m t h a t is, ~r1 = 3/4 and 7rz = 1/4. This joint prior distribution, illustrated in Figure 2, can be written as
p(Ok =
0, 4~k = 11o) = ~rkVk(0).
Combining this prior via Bayes theorem with the likelihood induced by an observed response pattern x, produces a joint posterior density for ~b and Ok given thk =
1 for k = I, 2:
p(Ok =
O, 4~k = llx, ~ , a ) ~ p[xl4~k = 1, Ok = O, 13g(CO]~rkgk(O),
where
p[xl4'k
= 1, Ok = 0 , 13k(~)] =
1-I exp {xu[O-/3jk(C0]}
j
I + [0 --/3ik(,~)]___"
(3)
ROBERT J. MISLEVYAND NORMAN VERHELST
0.45
I THETA, STRATEGY=l)
I THETA, STRATEGY=2) F ~ . ~
DIAMOND = P ( X = 0 1 1
PLUS
= P(X=Oll
0.40
\
/
/i
#
/
0.35-
L
201
\
/
I
I
0.30"
\
/
I
/
I
K
E 0.25.
L
i
l
/
/
I
H
0 0.20'
0
D
/
0.15'
\
/I
/
/
/
\
#
/
0.10'
/
O.OS ~
,~ n
t - ~ - i
-5
;
, - .
-4
. - ,
i - ~ - 3
-3
. - t
-2
,
~
\
/
"+,,
\
I
"4-
~ - i
-t
s - i
¸
•
,
,
•
0
•
. - i
I
•
i
, - •
2
•
~
•
,
3
•
~
• - .
4
, - |
•
S
THETA
FIGURE 1.
Likelihood Function.
The constant of proportionality required to normalize (3) is the reciprocal o f the marginalization o f the right side, namely
p(xla'r, (x) = ~/~ 7rk f p[xkbk = 1, Ok = O, ~k(ot)]gk(O) dO.
The posterior distribution induced by x = (011) is shown in Figure 3. Marginalizing
with respect to Ok amounts to summing the area under the curve for strategy k, and
gives the posterior probability that ~bk = l - - t h a t is, that the subject has employed
strategy k. The resulting values for this response pattern are P(4h = llx -- (011)) -- .28
and P(~b2 = llx = (011)) = .72. The prior probabilities that favored Strategy 1 have been
revised substantially to favor Strategy 2. The conditional posterior for 01 is given 4)1 =
1 has a mean and standard deviation o f about .32 and .80. Corresponding values for the
distribution of Oz given 4)2 = 1 are .50 and .81. We note in passing that these posteriors
are not generally normal distributions, even though the priors were. The posteriors can
in fact be quite skewed, as occurs with response patterns that have all correct or
incorrect answers.
202
PSYCHOMETRIKA
0.40DIAMOND = P(THETA
PLUS
= P(THETA
I STRATEGY=l)
I STRATEGY=2)
0.35.
D
E
0.2iS
N
S
! 0.20
T
Y
0.15
0.10"
0.05 -
,
~.s S
,,~of
0.00 ,-~
-S
-4
-3
-2
-~
0
1
2
3
4
THETA
FIGURE
2.
Prior Distribution.
Parameter Estimation
Example 1 showed how to draw inferences about the strategy usage and proficiencies of subjects if the basic parameters for items and the strategy population proportions and distributions were known. In practice these quantities will have to be estimated. Sometimes it will be possible to obtain calibrating samples of subjects whose
strategy usage is known with certainty. In these cases, standard IRT methodology can
be used to estimate the structural parameters of the problem--the basic parameters a
for items, the proportions ~rk of subjects employing each strategy, and the parameters
of the distributions 9/,((9) of subjects employing each strategy. The item parameters
need not even be described in terms of salient item features in this situation; that is,/3's
need not be modeled in terms of a's. In other cases, obtaining certain knowledge of
subjects' strategy usage will not be convenient or even possible. It then becomes
§
203
ROBERT J. MISLEVY AND NORMAN VERHELST
0.40
DIAMOND = P(THETA,
PLUS
= P(THETA,
STRATEGY=I I X=011)
STRATEGY=2 ! X=011)
0.36
0.30'
D
E
N
S
!
0.26
/
I
O.iO
"
l
I
T
Y
"
I
I
•,.
i
',
l
÷
•
l
0.16
/
'.,
0.10
//
0.06 -
j
0.00 !
-6
i
|
|
-4
!
i
i
i
-3
i
i
!
i
-2
i
i
|
i
-1
i
|
i
|
|
i
0
i
i
1
i
|
i
!
2
i
i
|
|
3
i
|
i
i
i
4
THETA
F i o u ~ 3.
Posterior Distribution.
necessary to estimate a's, ~r's, and gk's directly from the responses of subjects from an
unknown mixture of strategy classes, as discussed in the following section. In either
case, the resulting estimates of these structural parameters can be treated as known
true values so that "empirical Bayes" inferences can be drawn about individual examinees; additional details on this second stage of estimation are presented subsequently.
Estimating Structural Parameters in the Mixture Case
Equation (2) gives the conditional probability of the response vector x given 0 and
~b, o r p ( x l 0 , 4,, a ) . Consider a population in which strategies are employed in proportions 7r/~ and within-strategy proficiencies have densities gk(0kl~Tk), characterized by
possibly unknown parameters r/k, among the subjects using them. The marginal probability of x, or the probability of observing x from a subject selected at random from the
mixed population, is
i
i
|
5
204
PSYCHOMETRIKA
o) =
p(xJ, ,
f p(xtok, 6k
= 1, a)gk(0kl'Tk)
dot.
(4)
Let g denote the vector of all structural parameters, (t~, 'rr, -q). The likelihood for
g induced by observing the response vectors X =- (xl . . . . .
x N) of N subjects is the
product over subjects of terms like (4). Maximum likelihood estimates for g are obtained by maximizing the likelihood function, or, equivalently, the logarithm of it.
Bayes modal estimates can be obtained by similar numerical procedures after multiplying the likelihood by a prior distribution for g. The log likelihood is
N
A= ~
logp(xilg )
i=1
= ~i log ff'~k zrk
f
p(xl0k,
d~k= 1, oL)gk(0kl~k)
dOk.
(5)
Note that the proficiency distributions gk appearing in (4) and (5) pertain only to the
subjects who use the strategies. It may be the case that (01 . . . . .
OK) has a joint
distribution in the population at large, but because we assume that only one component
is involved in the response process of any individual, only the margins of this joint
distribution are estimable--just as would be the case if each subject's strategy class
were known.
Let S be the vector of first derivatives and H the matrix of second derivatives, of
A with respect to g. Under regularity conditions, the maximum likelihood estimates
solve the likelihood equation S = 0, and a large-sample approximation of the matrix of
estimation errors is given by the negative inverse of H evaluated at ~.
A standard numerical approach to solving likelihood equations is to use some
variation of Newton's method. Newton-Raphson iterations, for example, improve a
provisional estimate g0 by adding the correction term - H - 1 S , evaluated at g0. Fletcher-Powell iterations avoid computing and inverting H by using an approximation of H-1
that is built up from changes in S from one cycle to the next.
These solutions have the advantage of rapid convergence if starting values are
reasonable; often fewer than I0 iterations are necessary. S and H can be difficult to
work with, however, and all parameters must usually be dealt with simultaneously
because the off-diagonal elements in H needn't be zero. For these reasons, computationally simpler but slower-converging solutions based on Dempster, Laird, and Rubin's (1977) EM algorithm are more typically employed in mixture problems (Titterington, Smith, & Makov, I985). The solution described below for the present problem
uses discrete representations for the gks, so the relatively simple "'finite mixtures" case
of EM obtains (see section 4.3 of Dempster, Laird, & Rubin).
Suppose that for each k, subject proficiency under Strategy k can take only the L k
values ®kl . . . . . ®kL~. The density 9k is thus characterized by these points of support
and by weights associated with each, y k ( O k l [ ' q k ) . Define the subject variable 0i --(~ill .....
qtiKLr), a vector of length L1 + " " • + L K of indicator variables: the element
tPikl is 1 if the proficiency of subject i under strategy k is Ok/and 0 if not. There are a
total of K ones in d~i, one for each strategymthough again, only the one associated with
the strategy used by subject i played a role in producing x i.
The feature of the actual log likelihood (5) that makes it difficult to solve is that we
do not observe subjects' d~ and 0 values (or equivalently, ~b and ~ values). In Dempster,
R O B E R T J. M I S L E V Y A N D N O R M A N V E R H E L S T
205
Laird, and Rubin's terminology (1977), this is an "incomplete data" problem. Obtaining MLEs for ~ would be much simpler if these values were observed along with x's, in
a corresponding "complete data" problem. Were this the case, the log-likelihood would
be
A* = E E ~)ik E ~ikt log p[x~lOk= Okl,
i
k
(~k = 1, [~k(Ct)]
l
-4- E E ~ik E ~likl log yk(Oktlnk)
i k
t
+ E E ~)ik log rrk.
/
(6)
k
The basic parameter for items, or, appears only in the first term on the right, so
maximizing with respect to a must address that term only. This amounts to estimating
item parameters when subject abilities are known. When ot consists of distinct subvectors for each strategy, each subvector could be estimated separately using data from
only the subjects in the corresponding strategy group. The subpopulation parameters ~1
appear in only the second term, separating them from ot and ¢r in ML estimation. They
too lead to smaller separate subproblems if ~i consists of distinct subvectors for each
strategy. The population proportions ~ appear in only the last term. Unless they are
further constrained, their ML estimates are simply observed proportions.
The values of O may be either specified a priori, as in Mislevy (1986), or estimated
from the data, as in de Leeuw and Verhelst (1986). In the latter case, they are additional
elements of ~. Their likelihood equations have contributions from both the first and
second terms of (6), but the equations for these points of support under strategy k
involve data from only those subjects using strategy k. Their cross second derivatives
with points corresponding to other strategies are zero, although their cross derivatives
with elements of ot and ~1 that are involved with the same strategy are generally not.
The M-step of an EM solution requires solving a maximization problem of exactly
the type of (6), with one exception: the unobserved values of each ~bi and Oi are
replaced by their conditional expectations given xi and provisional estimates of 6, say
6 °. The E-step calculates these conditional expectations as follows. Denote by Iig t the
following term in the marginal likelihood associated with subject i, strategy k, and
proficiency value Ok/within strategy k:
Iikt = p[xilOk = ~)kl, (f~k ---- 1, [~k(t~)Jgk(okl'qk)~rk,
and let I°t be a provisional estimate obtained using 6 ° rather than 6. Provisional conditional expectations of the parameters of individual subjects are then obtained as
toot = E(Oiulxi ' ~ = ~o)
I iOkl
q
and
0 '
E Iikl'
(7)
206
PSYCHOMETRIKA
4,i° = E(4,;klx;, g = t °)
Ei°,,
l'
= E E to,,, .
(8)
k' l'
The EM formulation makes it clear how each subject contributes to the estimation
of the parameters in all strategy classes, even though only one of them was relevant to
the production of any one subject's responses. Each subject's data contribute to the
estimation for each strategy class in proportion to the probability that that strategy was
the one the subject employed, given the observed response pattern.
Although the EM solution has the advantages of conceptual and computational
simplicity, it may converge slowly. Its rate of convergence depends on how well x
determines subjects' 0 and • values. This in turn is determined by how greatly the
relative likelihoods of response patterns differ from one strategy class to another.
Accelerating procedures such as those described by Ramsay (1975) and Louis (1982)
can be used to hasten convergence of the EM solution. The closer the starting values
are to the MLEs, the better--not just for quicker convergence, but also for better
chances of convergence to the global maximum. There is no guarantee against local
maxima in such mixture problems. One general approach to obtaining initial values is
to sort subjects according to their most plausible strategy choices, and estimate zr's,
a's, and 9's as if these temporary assignments were correct.
Empirical Bayes Inference for Individual Subjects
If the structural parameters ~ are accurately estimated, the posterior density of the
parameters of subject i is approximated by
p(Oik = O, ~)ik = IlXi, ~) o¢ p[xil4~k = 1, 0, [~k(&)]Crkgk(Ol41k),
where the reciprocal of the normalizing constant is obtained by integrating the expression on the right over 0 within each k, then summing over k. The posterior probability
that subject i used strategy k is approximated by
P(¢~ik = I[X,, ~) = _f p(Oik = O, ~ik : llx/, ~) dO.
A subject's posterior conditional mean and variance for a given strategy class are
approximated by
Oik = f__~(0ik_ = 0, 4,,, = llxi, ~) dO
,o~ /
11x~,~) - '
and
If discrete representations have been employed for the 9k's, approximations based on
(7) and (8) can be used.
Empirical Bayes estimates more closely approximate true Bayes estimates as
structural parameters are more precisely estimated, so they are most suitable with large
ROBERT J. MISLEVY AND NORMAN VERHELST
207
samples of subjects. Tsutakawa and Soltys's (1988) study of IRT ability estimation
suggests that when ~ is not accurately determined in the present problem, point estimates of 0 and d~ may not be far off but their accuracy will be overstated. A full
Bayesian solution, which takes uncertainty about ~ into account, may be preferred in
those circumstances. The interested reader is referred to Tsutakawa and Soltys for an
illustration of one suitable approximation and references to others.
Example 2: A Mixture o f Valid Responders and Random Guessers
The Rasch model can sometimes provide a reasonably good summary of subjects'
responses on multiple-choice tests, especially if the items are fairly easy for the subjects, or if hard items have attractive distractors that usually lead low proficiency
subjects to incorrect responses. But if some unmotivated subjects simply respond at
random to all items, their responses will bias the estimation of the item parameters that
would pertain to the majority of the subjects.
In this example we consider a two-class model, under which a subject responds
either in accordance with the Rasch model or guesses totally at random. For subjects
in the latter class, probabilities of correct response are fixed at the reciprocals of the
number of response alternatives to the items. Using the procedures described above, it
is possible to free estimates of the item parameters that pertain to the valid responders
from biases due to random guessers--even though it is not known with certainty who
the guessers are--and to estimate the proportions of valid responders and random
guessers in the sample.
The marginal probability of response pattern x in this situation is the two-class
mixture
2
/'(xil~) = Y~ e ( x i l r k = 1, ~)~k,
k=t
where Strategy 1 corresponds to the Rasch model and Strategy 2 corresponds to random guessing. The composition of ~ can be described as follows. It includes first the
strategy proportions 7r1 and 7r2. For the Rasch class, the basic parameters a l are item
difficulty parameters bj f o r j = 1. . . . . n. Suppose the distribution 9t of proficiencies
of subjects following the Rasch model is discrete, with L points of support O =
(O1 . . . . . OL) and associated weights to = (to1, . . . , tOL). The marginal probability of
response pattern x under Strategy 1 is thus
e(xltkl = I, a s , O, to) = ~ tot 1~ exp [xj(Oi - bj)]
t
j 1 + exp (®l - bj)"
Under the random guessing strategy, the basic parameters ot2 are the probabilities cj of
responding correctly to each itemj. If we assume these probabilities take the values of
the reciprocals of the number of item choices, they are known constants. All subjects
following this strategy are assumed to have the same probabilities of correct response
on a given item, so no distribution 92 is required. For such subjects, the probability of
response pattern x is simply
e(x14~2 = 1, OrE) = l-[ c]J(1 - cj) 1 -xj
J
The data for the following numerical example were gathered in a field trial of a test
designed to measure the reading proficiencies of college students. Since their perfor-
208
PSYCHOMETRIKA
mances had no bearing on their academic records, the subjects had no external reason
to try their best. And indeed, proctors reported that a few subjects appeared to be
marking the answer sheet without having opened their test booklets. From a total
sample of about 2180 subjects, 1906 provided responses to all the items. This example
concerns their right/wrong responses to twelve four-choice items. Our objective is
neither to provide a complete analysis of these particular data nor to promote the use
of the two-class model, but to offer a numerical illustration of a simple mixed-strategies
model. We focus our attention on a comparison of results from the Rasch IRT model
and the two-class model described above, adding a few comments concerning the
three-parameter logistic IRT model.
The two-class model is a special case of both our mixed-strategies model and
Yamamoto's (1987) hybrid model for IRT and latent class responders. The estimates
presented here were obtained with a computer program written by Yamamoto, using
the EM estimation discussed in a preceding section. For both the Rasch and the twoclass solution, a ten-point discrete characterization o f g l was employed, over equallyspaced points O1 = - 4 and ®10 = +4. The Rasch solution was comprised of item
parameters and of estimated population weights at each point of support. The two-class
solution also had these parameters, which now pertained just to subjects following the
Rasch model, plus an additional parameter for the proportion of subjects in the Rasch
class, as opposed to the hypothesized random guessing class. Probabilities of correct
response were fixed at .25 for the random guessing class.
Resulting values of - 2 log A for the Rasch model and the two-class model are 2752
and 2606 respectively. The chi-square approximation for - 2 log A is not trustworthy
because most realized responses patterns were observed only once or twice, but the
degrees of freedom would be 942 and 941. The difference in chi-squares is also not to
be taken seriously as a chi-square, since the Rasch model is obtained as a boundary
solution of the two-class model, with 7r1 = 1. Nevertheless, improving the chi-square
by 146 with a single additional parameter must be considered worthwhile.
In some respects the solutions do not differ radically. The two-class solution gives
an MLE of .9548 for 7r1, so it is estimated that fewerthan five percent of the subjects
are guessing at random. The item parameter estimates are related monotonically, as
seen in Table 3, but the differences are meaningful: distances between items are quite
similar under the two models at the hard end of the scale, but become increasingly
spread out as the items become easier. This is because differences between Rasch item
parameter estimates are nearly equivalent to differences between logits of the items'
proportions of correct response, and if the two-stage model is correct, differences in
proportions-correct are attenuated by the constant noise of the random responders. A
simple example makes this clear. The observed proportions of correct response to
Items 1 and 2 are .863 and .809; corresponding logits are -1.844 and -1.443, for a
difference of .401. Under the two-class model, however, the observed proportion correct is the weighted average of the Rasch proportion correct and the guessing rate, the
expected value of which is .25. The weights are the proportions of subjects in each
class. Thus, Pobs = PRasch 7rl + .25 ~r2. Substituting in the estimated values .9548 and
.0452 for ~'l and ~'2 and solving for PRasch, We can approximate Rasch proportions
correct as .934 and .829. The corresponding logits are -2.650 and - 1.579, for a difference of 1.071.
Perhaps more interesting are implications for measuring individuals. Total score is
a sufficient statistic for ability under the Rasch model, so everyone with the same score
receives the same ability estimate whenever the Rasch model is assumed to hold. In the
Rasch model solution this is always the case. In the two-class solution, everyone with
the same score receives the same ability estimate conditional on membership in the
ROBERTJ. MISLEVYAND NORMANVERHELST
209
TABLE 3
Item P a r a m e t e r
Estimates from the R a s c h
and
Two-Class Models
Item Difficulties
Item
Rasch
Model* Two-Class** Difference
i
2
3
4
5
-2.473
-1.907
-1.794
-i.i02
-.541
6
-.241
7
.021
.503
.824
.912
.978
1.276
8
9
i0
Ii
12
Slope
.431
270
391
231
283
673
326
046
468
787
880
943
218
-3
-2
-2
-i
-
-
i
.797
.484
.437
.181
.132
.085
.067
.017
.037
.032
.035
.058
• 411
s e t by standardizing the s u b j e c t
distribution.
*Scale
* * S c a l e s e t by standardizing
Rasch-class
distribution.
the e s t i m a t e d
Rasch class--but the posterior probabilities of belonging to the Rasch class can vary
considerably among subjects with the same score. This variation depends on just which
items they answered correctly, a feature of their data that has no relevance for ability
estimation under the Rasch model. (As we shall discuss below, however, these considerations do play a role in thoughtful applications of the Rasch model.) Table 4
illustrates this phenomenon. Subjects A, B, and C all have scores of three, the expected
score of a random guesser, and their posterior means for 0 are the same; but their
posterior probabilities of being a guesser differ substantially--about .1, .3, and .6.
Using ability estimates to compare these subjects with subjects who have high scores
and high probabilities of being Rasch responders would seem warranted for Subject A,
questionable for Subject B, and clearly inappropriate for Subject C--a consideration
that should be taken into account in, say, assigning educational treatments.
An alternative treatment of multiple-choice data uses the three-parameter logistic
(3PL) IRT model, or
g(xi
=
exp [ x i a i ( O - bj)]
110, aj, bj, c i) = cj + (1 - c j ) 1 + e x p [aj(O - bj)]"
210
PSYCHOMETRIKA
TABLE 4
Posterior Probabilities and Ability Estimates
for Selected Response Patterns
Two-Class Model
Response Pattern
Easy
Hard
Rasch
Rasch
Prob.
Estimate
Estimate
Guesser
3-Parameter Model
Estimate
PSD*
A
110000000001
-1.028
-1.174
.091
-1.164
.575
B
010010001000
-1.028
-1.174
.314
-1.263
.611
C
010000001010
-1.028
-1.174
.587
-1.451
.610
*PSD = Posterior Standard Deviation
The 3PL can be obtained from the Rasch model by adding a slope parameter aj and a
lower asymptote parameter cj for each item. The lower asymptote is related to the
possibility of chance success, as it allows even low ability subjects a chance of responding correctly. We fit the Rasch model and the 3PL to the data described above
with Mislevy and Bock's (1983) BILOG program. The Rasch solution was essentially
the same as the one obtained with Yamamoto's program. The 3PL solution employed
BILOG's mild default Bayesian prior distributions for a and c parameters, and obtained
the item parameter estimates shown in Table 5. The difference between the Rasch
model and the 3PL in terms of - 2 log A was 130, similar to that obtained when going
from the Rasch model to the two-class solution, but at the cost of 23 additional parameters rather than just one!
Unlike the Rasch model, the 3PL can assign different proficiency estimates to
response patterns with the same score. The posterior mean 0's for Subjects A, B, and
C appear in Table 4. For these subjects, higher 3PL ability estimates are associated with
better conformance to the model; the 3PL effectively discounted their unexpected
correct responses to hard items. The difference among the estimates, however, is trivial
in comparison to their precision. Unlike the two-class model's differential posterior
probabilities of being a guesser, these slightly different 3PL estimates indicate merely
a slight difference in degree in proficiency rather than profound difference in kind.
These results call to mind current work in IRT on subject fit statistics or "caution
indices" (e.g., see Chapter 4 of Hulin, Drasgow, & Parsons, 1983). The idea is to detect
aberrant response patterns that can signal atypical patterns of knowledge, misunderstanding of directions, or idiosyncratic effects of item content with particular subjects.
A high value on a caution index suggests that the subject's score may not be suitable for
comparing one subject's performance with other subjects who gave more typical response patterns. But even if a subject's score is not used subsequently, the subject's
aberrant response pattern may have distorted the estimation of IRT item parameters
and thereby contaminated comparisons among subjects whose responses did accord
ROBERTJ. MISLEVY AND NORMAN VERHELST
TABLE
211
5
Item Parameter Estimates from the Three-Parameter Model
Item
a
b
1
2
3
4
5
6
7
8
9
i0
ii
12
.745
.723
1.002
1.087
688
907
979
529
965
721
.800
1.079
-1.522
-1.201
-.933
-.409
-.026
.332
.625
1.361
1.216
1.992
1.604
1.642
e
209
176
189
207
171
203
259
256
199
282
233
258
with the model. This concern leads to a sometimes controversial practice often carried
out in practical applications of the Rasch model with multiple-choice tests: deleting
poorly fitting subjects from item parameter estimation. A similarity between this procedure and our two-class model is that both procedures partition the data into a block
where the simple Rasch model fits well and a block in which it does not. A difference
is that the partitioning effected by subject trimming is explicit and based on observed
variables; with the two-class model, it is implicit and based on a latent variable. Each
subject's data are effectively trimmed in proportion to weight of the evidence that it
does not fit the Rasch model.
Discussion
Theories about the processes by which subjects attempt to solve test items play no
formal role in standard test theory, including conventional item response theory. Only
a data matrix of correct and incorrect responses is addressed, and items and subjects
are parameterized strictly on the basis of propensities toward correct response. When
all that is desired is a simple comparison of subjects in terms of a general propensity of
this nature, IRT models suffice and in fact offer many advantages over classical truescore test theory.
Situations for which standard IRT models prove less satisfactory involve a need
either to better understand the cognitive processes that underlie item response, or to
employ theories about such processes to provide more precise or more valid measurement. Extensions of item response theory in this direction are exemplified by the linear
logistic test model (Fischer, 1973; Scheiblechner, 1972), Embretson's (1985) multicomponent models, Samejima's (1983) model for multiple strategies, and Tatsuoka's (1983)
"rule space" analyses.
The approach offered in this paper concerns situations in which different persons
may choose different strategies from a number of known alternatives, but overall pro-
212
PSYCHOMETRIKA
ficiencies provide meaningful comparisons among persons employing the same strategy. We suppose that strategy choice is not directly observed but can be inferred,
though without certainty, from response patterns on theoretical bases. Assuming that
substantive theory allows us to differentiate our expectations about response patterns
under different strategies, and that a subject applies the same strategy on all items, it is
possible to estimate the parameters of IRT models for each strategy. It is further
possible to calculate the probabilities that a given subject has employed each of the
alternative strategies, and estimate the subject's proficiency under each possible strategy.
Assuming that a subject uses the same strategy on all items may not suffice for all
applications, since switching strategies from one item to another is sometimes an option
(Kyllonen, Lohman, & Snow, 1984). In a technical sense, our approach can allow for
strategy-switching by incorporating additional strategy classes that are combinations of
different strategies for different items. Based on Just and Carpenter's (1985) finding that
subjects sometimes apply whichever strategy is easier for a given problem, we might
define three strategy classes for items like those in Example 1:
1. Always apply the rotational strategy;
2. Always apply the analytic strategy;
3. Apply whichever strategy is better suited to an item.
ff items were constructed to run from easy to hard under the rotational strategy and
hard to easy under the analytic, subjects using the third " m i x e d " strategy would find
them easy, then harder, then easier again.
There are limitations to how far these ideas can be pressed in applications with
binary data. A simulation study reported in Mislevy and Verhelst (1987) showed that a
Rasch model fit a four-item test acceptably well with a sample of 1200 subjects when the
true model was the Rasch/guessing mixture. In one way or another, more information
would be needed to attain a sharper distinction between strategy classes and, correspondingly, more power to differentiate among competing models for the data. One
source of information is more binary items. Fifty items rather than four, including some
that are very hard under the Rasch strategy, would do. A different source of information available in other settings would be to draw from richer observational possibilities,
such as levels of correctness, response latencies, eye-fixation patterns, or choices of
response alternatives that are differentially attractive under different strategies.
Differentiating the likelihood of response patterns under different strategies is the
key to successful applications of the approach. The items in the test must be constructed to maximize strategy differences, as by including items that are hard under one
strategy but easy under another. Most tests in current use with standard test theory are
not constructed with this purpose in mind; indeed, they are constructed so as to minimize differentiation among strategies, since it lowers the reliability of overall-propensity scores. When strategy class decisions are of interest, useful information is less
likely to be obtained with an existing conventional test than with a newly constructed
test that highlights differential patterns related to strategy choice.
In addition to the applications used in the preceding examples, a number of other
current topics in educational and psychological research are amenable to expression in
terms of mixtures of IRT models. We conclude by mentioning three.
Hierarchical development. Wilson's (1984, 1989) "saltus" model (Latin for
"leap") extends the Rasch model to developmental patterns in which capabilities increase in discrete stages, by including stage parameters as well as abilities for persons,
and stage parameters as well as difficulties for items. Examples would include Piaget's
213
ROBERT J. MISLEVY AND NORMAN VERHELST
1
2
3
4
5
6
hard
easy
Item difficultles-highest stage
i
2
3
4
5
6
Item difficulties-middle stage
1
2
3
4
5
6
Item difficulties-lowest stage
FIGURE 4.
SaltusExampM:ThreeStages, Common O ~ e t .
(1960) innate developmental stages and Gagnr's (1962) learned acquisition of rules.
Suppose that K stages are ordered in terms of increasing and cumulative competence.
In our notation, d~ would indicate the stage membership of a subject. In the highest
stage, item responses follow a Rasch model with parameters/3j. Rasch models fit lower
stages as well, but the item parameters are offset by amounts that depend on which
stage the item can first be solved. Our basic parameters ot would correspond to the item
parameters for the highest stage and the offset parameters for particular item types at
particular lower stages. Figure 4 gives a simple illustration in which items associated
with higher stages have an additional increment of difficulty for subjects at lower stages.
In applications such as Siegler's (1981) balance beam tasks, subjects at selected lower
stages tend to answer certain types of higher-stage items correctly, albeit through
flawed reasoning. In these cases, the offset serves to give easier item difficulty parameters to those items in those stages.
Mental models for problem solving. In introducing their experimental study of
mental models for electricity, Gentner and Gentner (1983) state
Analogical comparisons with simple or familiar systems often occur in people's
descriptions of complex systems, sometimes as explicit analogical models, and
sometimes as implicit analogies, in which the person seems to borrow structure
214
PSYCHOMETRIKA
from the base domain without knowing it. Phrases like "current being routed along
a conductor" and "stopping the flow" of electricity are examples (p. 99).
Mental models are important as a pedagogical device and as a guide to problemsolving. Inferring which models a person is using, based on a knowledge of how conceivable analogues help or hinder the solution of certain types of problems, provides a
guide to subsequent training. In Gentner and Gentner's experiment, the problems concerned simple electrical circuits with series and parallel combinations of resistors and
batteries. Popular analogies for electricity are flowing waters (Strategy l) and "teeming
crowds" of people entering a stadium through a few narrow turnstiles (Strategy 2). The
water flow analogy facilitates battery problems, but does not help With resistor problems; indeed, it suggests an incorrect solution for the current in circuits with parallel
resistors. The teeming crowd analogy facilitates problems on the combination of resistors, but is not informative about combinations of batteries. If a Rasch model holds for
items within strategies, Gentner and Gentner's hypotheses correspond to constraints
on the order of item difficulties with the two strategies. If each item type were replicated
a sufficient number of times, it would be possible to make strong inferences about which
model a particular subject was using, to plan subsequent instruction.
Changes in intelligence over age. An important topic in the field of human development is whether, and how, intelligence changes as people age (Birren, Cunningham,
& Yamamoto, 1983). Macrae (no date) identifies a weakness of most studies that employ psychometric tests to measure aging effects: total scores fail to reflect important
differences in the strategies different subjects bring to bear on the items they are
presented. Total score differences among age and educational-background groups on
Raven's matrices test were not significant in the study she reports. But analyses of
subjects' introspective reports on how they solved items revealed that those with
academically oriented background were much more likely to have used the preferred
"algorithmic" strategy over a "holistic" strategy than those with vocationally oriented
backgrounds. Since the use of algorithmic strategies was found to increase probabilities
of success differentially on distinct item types, this study would be amenable to IRT
mixture modeling. Inferences could then be drawn about problem-solving approaches
without resorting to more expensive and possibly unreliable introspective evidence.
The key here, as in any other application in which a mixture of strategies is to be
resolved, is to develop tasks that offer different amounts of resistance to subjects using
different strategies.
References
Birren, J. E., Cunningham, W. R., & Yamamoto, K. (1983). Psychology of adult development and aging.
Annual Review of Psychology, 34, 543-575.
Carter, P., Pazak, B., & Kail, R. (1983). Algorithms for processing spatial information. Journal o f Experimental Child Psychology, 36, 284-304.
Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the rotation of mental images. In W. G.
Chase (Ed.), Visual information processing (pp. 76-176). Orlando, FL: Academic Press.
de Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal
of Educational Statistics, 11, 183-196.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM
algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38.
Embretson, S. E. (1985). Multicomponent latent trait models for test design. In S. E. Embretson (Ed.), Test
design: Developments in psychology and psychometrics (pp. 195-218). Orlando, FL: Academic Press.
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 36, 359-374.
ROBERT J. MISLEVY AND NORMAN VERHELST
215
French, J. W. (1965). The relationship of problem-solving styles to the factor composition of tests. Educational and Psychological Measurement, 25, 9-28.
Gagnr, R. M. (1962). The acquisition of knowledge. Psychological Review, 69, 355-365.
Gentner, D., & Gentner, D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In
D. Gentner & A. L. Stevens (Eds.), Mental models (pp. 99-129). Hillsdale, NJ: Earlbaum.
Hulin, C. L., Drasgow, F., & Parsons, C. K. (1983). Item response theory: Application to psychological
measurement. Homewood, IL: Dow Jones-Irwin.
Just, M. A., & Carpenter, P. A. (1985). Cognitive coordinate systems: Accounts of mental rotation and
individual differences in spatial ability. Psychological Review, 92, 137-172.
Kyllonen, P. C., Lohman, D. F., & Snow, R. E. (1984). Effects of aptitudes, strategy training, and task facets
on spatial task performance. Journal of Educational Psychology, 76, 130-145.
Lohman, D. F. (1979). Spatial ability: A review and reanalysis of the correlational literature (Technical
Report No. 8). Stanford, CA: Stanford University, Department of Education, Aptitude Research Project.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the
Royal Statistical Society, Series B, 44, 226-233.
Macrae, K. S. (n.d.). Strategies underlying psychometric test responses in young and middle-aged adults of
varying educational background. Unpublished manuscript, La Trobe University, Australia.
Messick, S. (1984). The psychology of educational measurement. Journal of Educational Measurement, 21,
215-237.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
Mislevy, R. J., & Bock, R. D. (1983). BILOG: Item analysis and test scoring with binary logistic models.
Mooresville, IN: Scientific Software.
Mislevy, R. J., & Verhelst, N. (1987). Modeling item responses when different subjects employ different
solution strategies (Research Report RR-87-47-ONR). Princeton, NJ: Educational Testing Service.
Paulson, J. (1985). Latent class representation of systematic patterns in test responses (ONR Technical
Report). Portland, OR: Portland State University.
Pellegrino, J. W., Mumaw, R. J., & Shute, V. J. (1985). Analysis of spatial aptitude and expertise. In S. E.
Embretson (Ed.), Test design: Developments in psychology and psychometrics (pp. 45-76). Orlando,
FL: Academic Press.
Piaget, J. (1960). The general problems of the psychological development of the child. In J. M. Tanner & B.
Inhelder (Eds.), Discussions on child development: Vol. 4 (pp. 3-27). The fourth meeting of the World
Health Organization Study Group on the Psychobiological Development of the Child, Geneva, 1956.
Ramsay, J. O. (1975). Solving implicit equations in psychometric data analysis. Psychometrika, 40, 361-372.
Samejima, F. (1983). A latent trait model for differential strategies in cognitive processes (Technical Report
ONR/RR-83-1). Knoxville, TN: University of Tennessee.
Scheiblechner, H. (1972). Das lernen und 16sen komplexer denkaufgaben [The studying and solving of
complex conceptual problems]. Zeitschrift far Experimentelle und Angewandte Psychologie, 19, 476506.
Siegler, R. S. (1981). Developmental sequences within and between concepts. Monograph of the Society for
Research in Child Development, 46 (2, Serial No. 189).
Tapley, S. M., & Bryden, M. P. (1977). An investigation of sex differences in spatial ability: Mental rotation
of three-dimensional objects. Canadian Journal of Psychology, 31, 122-130.
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response
theory. Journal of Educational Measurement, 20, 345-354.
Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis offinite mixture distributions. Chichester: Wiley & Sons.
Tsutakawa, R. K., & Soltys, M. J. (1988). Approximation for Bayesian ability estimation. Journal of Educational Statistics, 13, 117-130.
Wilson, M. R. (1984). A psychometric model of hierarchical development. Unpublished doctoral dissertation,
University of Chicago.
Wilson, M. R. (1989). Saltus: A psychometric model of discontinuity in cognitive development. Psychological Bulletin, 105, 276-289.
Yamamoto, K. (1987). A hybrid model for item responses. Unpublished dissertation, University of Illinois.
Manuscript received 10/26/87
Final version received 5/15/89
Download