Item Response Theory Parameter Estimation with Response Times as

LSAC RESEARCH REPORT SERIES
ƒ Item Response Theory Parameter Estimation with
Response Times as Collateral Information
Wim J. van der Linden
Rinke H. Klein Entink
Jean-Paul Fox
University of Twente, Enschede, The Netherlands
ƒ Law School Admission Council
Research Report 06-04
October 2006
A publication of the Law School Admission Council
The Law School Admission Council (LSAC) is a nonprofit corporation whose members are more than 200 law schools in the
United States, Canada, and Australia. Headquartered in Newtown, PA, USA, the Council was founded in 1947 to facilitate
the law school admission process. The Council has grown to provide numerous products and services to law schools and to
more than 85,000 law school applicants each year.
All law schools approved by the American Bar Association (ABA) are LSAC members. Canadian law schools recognized by
a provincial or territorial law society or government agency are also members. Accredited law schools outside of the United
States and Canada are eligible for membership at the discretion of the LSAC Board of Trustees.
© 2009 by Law School Admission Council, Inc.
All rights reserved. No part of this work, including information, data, or other portions of the work published in electronic
form, may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and
recording, or by any information storage and retrieval system, without permission of the publisher. For information, write:
Communications, Law School Admission Council, 662 Penn Street, Box 40, Newtown, PA 18940-0040.
This study is published and distributed by LSAC. The opinions and conclusions contained in this report are those of the
author(s) and do not necessarily reflect the position or policy of LSAC.
i
Table of Contents
Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IRT and RT Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Population and Domain Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Identifiability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bayesian Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
3
4
4
Different Sources of Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Empirical Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1
Executive Summary
The testing industry is always keen to give test takers more precise scores and to reduce the cost of item calibration.
This research investigates a method for achieving this goal by making use of response time (RT) information in the item
calibration process. Item calibration is the process whereby statistics used to describe the properties of the test questions
(items) are calculated. Such calculations can become costly in that large samples of test takers are generally required to
assure that the statistics are calculated with adequate precision. Reducing the number of test takers required for
calibration amounts to reducing calibration cost. One of the ways of doing so is to make use of test-taker RTs in addition
to their item responses (i.e., correct versus incorrect). In computer-based testing, RTs are automatically recorded, and
this extra information is thus free.
A statistically proper way of making use of RTs is through a hierarchical model that involves both traditional item
response theory (IRT) and RT parameters. (Note that IRT is a mathematical model that is used to analyze test data.)
When estimating test takers’ abilities or calibrating the test items, it then becomes possible to “borrow” from the
information in the RTs.
The model used in this project was the hierarchical framework for speed and accuracy developed in an earlier
project for the Law School Admission Council. It is shown that when estimating the IRT parameters, the information in
the RTs allows us to infer an empirical prior distribution for each IRT parameter from the RTs. Unlike the typical
common prior for all person or item parameters in traditional IRT estimation, this prior is individual and better reflects
the true parameter values as well as our prior uncertainty about them. A simulation study showed that when using RTs as
collateral information in this way, under realistic conditions, a reduction in the estimation error for the ability parameters
of about 25% is possible.
Abstract
The notion of collateral information about a parameter of interest from observations collected for other parameters is
analyzed and applied to the problem of estimating the parameters in a hierarchical item response theory (IRT) model.
The analysis is then extended to the case in which the response times on the items can be used as collateral information.
The improvement of parameter estimation afforded by this extra information is shown in an empirical example.
Introduction
One of the main advantages of hierarchical modeling is the possibility of borrowing information on a parameter of
interest from data collected for other parameters. This borrowing is realized through the presence of a common
distribution of the parameters for the units of analysis in the statistical model for the data. The posterior distribution of
the parameter then typically compromises between the distribution and the likelihood associated with the data. When a
point estimate of the parameter is required, and the distribution is a member of a parametric family, the estimate
compromises between a statistic and some of the hyperparameters that characterize the common distribution. In doing so,
it tends to strike a profitable balance between ignoring the data on the other parameters (separate estimates) and the more
reckless assumption that all parameters are identical (pooled estimates). The profit typically occurs in the form of a more
favorable tradeoff between a higher efficiency of the inference at the cost of a less serious increase in bias. The profit is
reflected in lower mean-squared error (MSE) of the estimates.
One of the first examples in test theory demonstrating this principle is the classical true-score estimate based on
Kelley’s regression function,
E (T | X = x) = ρ XX ′ x + (1 − ρ XX ′ )μT ,
(1)
where X is the observed score of the test taker, T the true score, μT the mean true score in the population of test
takers, and ρ XX ′ the reliability of the test (Lord & Novick, 1968, sect. 3.7). An estimate of the test taker’s true score, τˆ,
is obtained by substituting estimates of μT and ρ XX ′ derived from the marginal distribution of the observed scores into
(1). The estimate compromises between X = x as a direct estimate of τ and the estimate of the population parameter
μT . In the representation in (1), the weights are ρ XX ′ and 1 − ρ XX ′ . But, using a well-known variance partition in
classical test theory, the estimate can also be shown to be equivalent to the precision-weighted average of x and μˆT
(Novick & Jackson, 1974, Eq. 9.5.11).
As discussed extensively in Novick and Jackson (1974, sect. 9.5), the Kelley estimate illustrates the more formal
problem of estimating many means simultaneously. Later examples of the same principle are the estimation of multiple
regressions in m groups from normal data in Novick, Jackson, Thayer, and Cole (1972) and the estimation of
proportions in m groups from binomial data in Novick, Lewis, and Jackson (1973). An instructive empirical application
of the estimation of multiple regressions can be found in the often-cited study by Rubin (1981) of the effects of coaching
schools on the scores on the Scholastic Aptitude Test (SAT).
2
From a Bayesian perspective, assuming exchangeability, it seems obvious to use a population distribution of the
parameter as a prior distribution for an individual observation. In fact, the difference between empirical two-level
hierarchical modeling and this empirical Bayes approach in statistical inference (Carlin & Louis, 2000, chap. 3) is only a
matter of motivation and interpretation; the formal structures of both approaches are entirely identical. For example, in
the Bayesian tradition, the empirical prior is motivated by the assumption of exchangeability of the units of analysis in
the sample rather than actual sampling from a predefined population. However, from the Bayesian perspective it would
be more natural to repeat the idea of prior information and benefit from the introduction of prior distributions for the
hyperparameters.
To emphasize that in the two-level hierarchical modeling and empirical Bayes tradition the data for the other
parameters from which the information is borrowed can be collected simultaneously, Novick and Jackson (1974, sect.
9.5) introduced the term collateral information. This term avoids the more temporal connotation in the Bayesian use of
the term prior information, which seems to suggest that the information should always be present before any data on the
parameter of interest is collected. In this paper, we combine the modeling of an empirical two-level structure of data for a
population of test takers and a domain of items, but the entire approach is Bayesian in that a third level with prior
distributions for the hyperparameters is adopted. To emphasize the empirical nature of the first two levels, we also refer
to the extra information as collateral. Our use of the term information differs from the way the term is used elsewhere in
scientific endeavors, where it is usually taken to mean that a variable can be predicted from other variables. This other
use requires knowledge of a joint distribution of the variables. But collateral information in the hierarchical sense is
already available if the units of analysis can be assumed to be sampled from a marginal distribution. If the assumption
holds, then as soon as we collect data on any of the parameters we also get information on all other parameters (e.g., on
their range of possible values).
The following is an illustration of the use of collateral information for a more modern example from test theory.
Suppose the interest is in estimating the ability parameter θ in a response model for which the item parameters are
already known. Also assume that θ is from a population with a normal distribution N (μθ , σθ2 ) , where the mean and
variance have already been estimated. Estimates of θ j for a single test taker j that capitalize on this information are
based on the distribution
f (θ j | u j , μˆθ , σˆθ2 ) ∝ f (θ j ; u j ) f (θ j | μˆθ , σˆθ2 ),
(2)
where u j = (u1 j , … , unj ) are the responses by j on the n items in the test, f (θ j ; u j ) is the likelihood associated with
the responses, and μ̂θ and σ̂ θ2 are the available estimates of the population parameters. For example, the mean of this
distribution is generally known to have a smaller MSE than an estimate based on the likelihood f (θ j ; u j ) only. The
decrease is due to the information in f (θ j | μˆ θ , σˆθ2 ) , which tells us where the parameters are concentrated and how much
they are dispersed. The same principle can be shown to hold for the estimation of the item parameters, but in this case the
distribution of the parameters over the domain of possible items has to be used. The increase in the bias of the ability and
item-parameter estimates that generally has to be paid for a more favorable estimation error is toward the means of the
population and domain distributions.
In (2), we assumed that the population parameters μθ and σθ2 were already estimated. This was only because we
wanted to emphasize that the estimate of θ is actually based on empirical information about the population distribution.
Also, we assumed that the item parameters were already known. Both assumptions are not necessary, though. The same
borrowing of information is obtained if we fit a hierarchical model with parameters for the items, persons, population,
and domain simultaneously. In fact, the information then becomes truly collateral. (In this simultaneous case, typically
some of the parameters in the model are not identifiable and the notion of collateral information becomes more subtle.
For example, a common choice in IRT is to use μθ = 0 and σθ2 = 1 as part of the set of identifiability constraints. In
(2), we would then have to replace the population density by f (θ j | 0,1) . But, even though we no longer have to estimate
any of its parameters, the population distribution is still empirical, and its presence allows us to derive information on
one parameter from information on others!)
The question addressed in this research is how to extend the use of collateral information in parameter estimation in
item response. Further improvement of parameter estimation is always a concern within the testing industry. It enables
testing organizations to give test takers better scores and reduce the costs of item calibration. More specifically, in this
research we explored the use of the response times (RTs) on the items as an additional source of information on the
person or item parameters of interest. The extra information in the RTs on the IRT parameters is collateral within
persons: As a computer records the test taker’s responses, it automatically records the time it takes to produce the
response. Now that computer-based testing has become a more dominant mode of testing, it would be imprudent to
ignore RTs as a free but potentially powerful source of information on the IRT parameters.
3
Hierarchical Model
To profit fully from the information on the IRT parameters in the RTs, we adopt a model for the RTs as well as
specify common distributions for all person and item parameters. The result is a hierarchical framework with the IRT
and RT models as first-level components and separate population and domain models for the IRT and RT parameters as
second-level components.
IRT and RT Models
In principle, the first-level model for the responses of test takers j = 1, …, N on the items i = 1, …, n could be any
regular IRT model. In the empirical example below, we use the two-parameter normal-ogive (2PNO) model, which gives
the probability of a correct response on item i by person j as
P(U ij = 1; θ j , ai , bi ) = Φ(ai (θ j − bi )),
(3)
where Φ(⋅) denotes the normal distribution function and ai and bi are the discrimination and difficulty parameters of
item i, respectively. RT distributions are often approximated well by lognormal distributions (for a review of
alternatives, see Schnipke & Scrams, 1997). Therefore, analogous to the IRT model in (3), the RTs are modeled with a
speed parameter τ j for test taker j and time intensity and discrimination parameters, βi and αi , respectively, for item
i. Let Tij denote the RT of test taker j on item i. The model posits
f (tij ; τ j , αi , βi ) =
{
2
1
αi
exp − ⎡⎢⎣ αi (ln ti j − (βi − τ j )) ⎤⎥⎦
2
tij 2π
}
(4)
(van der Linden, 2006).
Notice that, except for the difference in sign, which is due to the negative relationship between time and speed, the
two parameter structures are parallel. This parallelness supports the interpretation of the joint distributions of the
parameters modeled at the second level.
Population and Domain Models
The population model specifies the joint distribution of the person parameters θ and τ . We assume that the
distribution is bivariate normal distribution,
(θ , τ ) ∼ MVN ( μP , Σ P ),
(5)
μP = (μθ , μτ )
(6)
where
and covariance matrix
⎛σ2
Σ P = ⎜⎜ θ
⎜⎝ σθτ
σθτ ⎞⎟
⎟⎟.
στ2 ⎟⎠
(7)
Likewise, the item parameters ai , bi , αi , and βi that represent the effects of the items in the domain on the RTs are
assumed to have a multivariate normal distribution,
(a, b, α, β ) ∼ MVN ( μI , Σ I ),
(8)
μI = (μa , μb , μα , μβ ),
(9)
where
and covariance matrix Σ I has all variances and covariances of the item parameters as elements.
4
Identifiability
This hierarchical model is not yet fully identifiable. But identifiability is obtained if we set μP = 0 and σθ2 = 1. If
the dataset identifies some of the parameters only weakly, the additional restriction στ2 = 1 can be used to obtain more
stable estimates.
Bayesian Estimation
We estimated the model parameters using Bayesian estimation with data augmentation and a Gibbs sampler. For the
data augmentation, see Albert (1999) or Johnson and Albert (1999); for a discussion of Gibbs sampling, see Gelman,
Carlin, Stern, and Rubin (2004, chap. 11) or Gelfand and Smith (1990). The method is based on normal inverse-Wishart
priors for the mean vectors and covariance matrices for the multivariate models in (5) and (8), which have the convenient
property of conjugacy (Gelman et al. 2004, sect. 3.6). For a derivation of all full conditional posterior distributions in the
Gibbs sampler, see van der Linden (2007).
A version of the hierarchical framework with a structural model for the population (i.e., a group structure and
covariates for both types of person parameters) and the same type of Bayesian parameter estimation is given in Klein
Entink, Fox, and van der Linden (2009).
Different Sources of Information
We demonstrate the same principle as in (2) but this time for a test taker j with response vector u j = (u1 j , …, unj )
and a vector of RTs t j = (t1 j, …, tnj ). To maintain the analogy, we assume that the population mean μP and covariance
matrix Σ P have already been estimated. The distribution of θ j given all known quantities can now be written as
ˆ P ) = f (θ j , τ j | u j , t j , μˆ P , Σ
ˆ P )d τ j
f (θ j | u j , t j , μˆ P , Σ
∫
∝ ∫ f (u j , t j ; θ j , τ j ) f (θ j , τ j | μˆ P , Σˆ P )d τ
(10)
ˆ P )d τ ,
= ∫ f (u j ; θ j ) f (t j ; τ j ) f (θ j , τ j | μˆ P , Σ
where the last step follows from conditional independence. Factorizing f (θ j , τ j | μˆ P , Σˆ P ) and using the fact that
f (t j | τ j ) f (τ j | μˆ P , Σˆ P ) is proportional to the posterior density of τ j given t j , we obtain
ˆ P ) ∝ f (u j ; θ j ) f (θ j | τ j , μˆ P , Σ
ˆ P ) f (t j ; τ j ) f (τ j | μˆ P , Σˆ P )d τ
f (θ j | u j , t j , μˆ P , Σ
∫
ˆ P ) f (τ j | t j ) d τ .
∝ f (u j ; θ j ) ∫ f (θ j | τ j , μˆ P , Σ
(11)
The integral is proportional to the posterior predictive density of θj given t j . Hence,
f (θ j | u j , t j , μˆ P , Σ P ) ∝ f (u j ; θ j ) f (θ j | t j , μˆ P , Σ P ).
(12)
This result has a simple form and is entirely analogous to (2). It shows that, when the RTs are used, θ j is estimated
from the same likelihood f (u j ; θ j ) associated with the response vector u j but that the population distribution of θ in
(2) is replaced by the posterior predictive distribution of θ j given t j .
In fact, the result shows the role of three different types of information on θ j :
1.
The information directly available in u j in the first factor, f (u j ; θ j ), of (12); that is, the regular likelihood
associated with the response vector.
2.
The information summarized in the estimates μˆ P and Σ̂ P in the second factor. This information is derived
from the vectors of responses and RTs of the entire sample of test takers. These estimates generalize the role of
μ̂θ and σ̂ θ2 in (2).
5
3.
The information due to the conditioning on the vector of RTs t j . Observe that this information turns the
common posterior distribution for all test takers in (2) into an individual distribution for j . The role of the RTs
is therefore expected to be important. They not only lead to a further decrease of the posterior uncertainty about
θ j but can also be expected to locate the distribution closer to the true ability level.
Analogous effects of the RTs can be shown to hold for the estimation of the item parameters. To demonstrate their
ˆ P , θ j , and t j replaced by the estimates μˆ I and Σ
ˆI,
presence, the same argument can be followed but with μˆ P , Σ
item-parameter vector ξ i = (a i , bi ) , and RT vector t i , respectively. Again, it is not necessary to estimate some of the
parameters prior to θ j and ξ i . The extra sources of information are retained if the full hierarchical model is fitted to all
response data and RTs simultaneously. In fact, some of the hyperparameters vanish because of the identifiability
constraints, and the others are just integrated out of the posterior distributions of θ j and ξ i in the Bayesian approach
outlined above.
Empirical Example
We conducted a simulation study to illustrate empirically the effects of the use of the collateral information in the
RTs on the estimation of θ. The responses and RTs were generated under the hierarchical model in (3)–(9) with a
bivariate normal population model, with μθ = μτ = 0, σθ = στ = 1, and σθτ = .5. The correlation between θ and τ is
substantial but not unusual; in a study with an empirical dataset, van der Linden (1999) found a correlation between RT
and θ equal to −.59. The item parameters were distributed as follows: a ∼ U (.9,1.1) , b ∼ N (0,1), α ∼ U (1, 2),
β ∼ N (0,1), σaα = 0, and σbβ = .3. Noninformative priors for the hyperparameters were used. We generated 50
datasets for 500 test takers and 30 items each.
The parameters of the response model in (3) were the parameters of interest. They were estimated under two
different conditions:
1.
From the generated responses only with the marginal distributions of the person parameters θ j and the item
parameters ai and bi in (5) and (8) as empirical priors
2.
From the responses and the RTs using the full hierarchical model in (3)–(9)
Each time, the Gibbs sampler was run for 10,000 iterations. A burn-in of 500 iterations was sufficient to reach
convergence. Autocorrelation after more than 10 iterations appeared to be negligible.
We evaluated the results using the MSE for the mean of the posterior distributions (estimated a priori [EAP]
estimates) of θ as criterion. The MSEs for the two conditions, MSE1 and MSE2, were defined as
MSE1 (θ j | θ j ) = E[(θ j − θˆ j )2 | u j ] = ∫ (θ j − θˆ j )2 f (θ | u j )dθ
(13)
MSE2 (θ j | θ j ) = E[(θ j − θˆ j )2 | u j , t j ] = ∫ (θ j − θˆ j )2 f (θ | u j , t j )dθ .
(14)
and
The posterior distributions of θ j in these two expression are those in (2) and (12) without the hyperparameters. For
a Gibbs sampler, the expressions in (13) and (14) can easily be calculated as
1
M
∑(θ j − θ (j m) ) 2 ,
(15)
m
where m = 1, …, M are the iterations of the sampler after burn-in.
The MSEs for the conditions with the responses only were generally higher than those for the conditions with the
responses and RTs (Figure 1). As a rough estimate, the increase due to ignoring the RTs was approximately 30–50%.
6
FIGURE 1. MSE of estimates of θ without (IRT model) and with RTs (IRTRT model)
An increased efficiency of the estimators also implies a more precise fit of the estimated model to the response data.
We evaluated the fit using the MSE for the predicted data, u. That is, we compared
MSE1 (u) = E[(u − u)2 | u] = ∑(u − u) f (u | u )
(16)
u
and
MSE2 (u) = E[(u − u )2 | u, t ]∑(u − u) f (u | u, t ),
(17)
u
where u = (u j ), t = (t j ) , and f (u | u) and f (u | u, t) are the predictive probabilities of u given u and ( u, t),
respectively. Analogous to (15), we calculated these expressions from the output of the Gibbs samples as
1
(u ji − u (jim ) )2 .
MNn ∑∑∑
m j i
(18)
The results are shown in Figure 2. Again, the use of the RTs had a favorable effect. This time the differences
between the MSEs for the two conditions were relatively smaller than they were in Figure 1, but they tended to be more
distinct.
7
FIGURE 2. MSE of predicted responses for estimation without
(IRT model) and with RTs (IRTRT model)
Discussion
Since its inception, test theory has been hierarchical; the randomness in an observed score of an individual test taker
has always been distinguished from the randomness in his or her true score due to sampling from a population. In
addition, for its statistical inference, test theory has been an early adopter of Bayesian methodology. It therefore seems
only natural to broaden its vertical type of modeling by a horizontal extension that allows it to profit from the collateral
information on the response parameters present in the times it takes to produce the responses.
Nevertheless, we admit that it may sound somewhat gratuitous to say that in order to better estimate the θ s of test
takers, we should also estimate their τ s. But this is basically what the research in this paper suggests. In fact, the reverse
also holds and may sound even more surprising: To better estimate how fast test takers are operating, we should not
forget to estimate their abilities as well.
References
Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling, Journal of
Education Statistics, 17, 251–269.
Carlin, B. P., & Louis, T. A. (2000). Bayes and empirical Bayes methods for data analysis. Boca Raton, FL: Chapman &
Hall.
Gelfand, A. E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the
American Statistical Association, 85, 398–409.
Gelman, A., Carlin, J. B, Stern, H., & Rubin, D. B. (2004). Bayesian data analysis. London: Chapman & Hall.
Johnson, V. E., & Albert, J. H. (1999). Ordinal data modeling. New York: Springer.
Klein Entink, R. H., Fox, J,-P., & van der Linden, W. J. (2009). Modeling of responses and response times with person
covariates. Psychometrika, 74, 21–48.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Novick, M. R., & Jackson, P. H. (1974). Statistical methods for educational and psychological research. New York:
McGraw-Hill.
Novick, M. R., Jackson, P. H., Thayer, D. T., & Cole, N. S. (1972). Estimating multiple regressions in m groups: A
cross validation study. British Journal of Mathematical and Statistical Psychology, 25, 33–50.
Novick, M. R., Lewis, C., & Jackson, P. H. (1973). The estimation of proportions in m groups. Psychometrika, 38, 19–
46.
8
Rubin, D. B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6, 377–401.
Schnipke, D. L., & Scrams, D. J. (1997). Modeling item response times with a two-state mixture model: A new method
of measuring speededness. Journal of Educational Measurement, 34, 213–232.
Schnipke, D. L., & Scrams, D. J. (2002). Exploring issues of examinee behavior: Insights gained from response-time
analyses. In C. N. Mills, M. Potenza, J. J. Fremer, & W. Ward (Eds.), Computer-based testing: Building the
foundation for future assessments (pp. 237–266). Hillsdale, NJ: Lawrence Erlbaum Associates.
van der Linden, W. J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological
Measurement, 23, 21–29.
van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and
Behavioral Statistics, 31, 181–204.
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika,
72, 287–308.