Multidimensional Adaptive Testing with Optimal Design Criteria for

advertisement
Multidimensional Adaptive Testing
with Optimal Design Criteria for Item
Selection
Joris Mulder & Wim J. Van Der Linden
1
• The choice of criterion (D-optimality, Aoptimality,…) should consider the goal of
testing.
• A different optimal design criterion for item
selection in MAT seems more appropriate.
2
Motivation
• To find the matches between the different cases of
MCAT and the performance of optimal design criteria.
• To investigate the preference of the optimality criteria
for items in the pool with specific patterns of
parameter values:
 Will the criterion for selection in a MCAT program with nuisance
abilities select only items that are informative about the
intentional abilities?
 Are there any circumstances in which they also select items that
are mainly sensitive to nuisance ability
• To report some features of Fisher information matrix
and its use in adaptive testing that have been hardly
noticed
3
Response Model
4
Fisher Information
• Each element in the matrix has a common
factor:
• When selecting the kth item, the information
is:
on which different optimality criteria apply.
5
Item Information Matrix in MIRT
• The information matrix include: (1) Function
of g, and (2) matrix aa’
• Reparameterize g (θ; ai , bi , ci ) into a one~(θ; a , b , c )
g
dimensional function
by
i
i
i
substituting
6
• Where a is the Euclidean norm of ai.
• The ability value is determined by solving:
i
• The results:
 ~
g ( ; a i , bi , ci )  0

7
Item selection criteria for MAT
• Three cases of multidimensional testing:
• (1) All abilities are intentional
• (2) Some ability are intentional and others are
nuisance
• (3) All abilities are intentional, but the interest
is only in a specific linear combination of them
8
All abilities Intentional
• D-Optimality (Segall, 1996)
which can be expresses as
• The criterion tends to select items with a large
discrimination parameter for the ability with a
relatively large (asymptotic) variance for its
current estimator (minimax mechanism)
9
• Items with large discrimination parameters for
more than one ability are generally not
informative. Consequently, the criterion of Doptimality tends to prefer items that are
sensitive to a single ability over items sensitive
to multiple abilities (trade-off effect).
• Segall (1996) proposed a Bayesian version of
D-optimality for MCAT.
10
• A-Optimality: minimize the trace of the
inverse of the information matrix
• This results contains the determinant of the
information matrix as an important factor. And
will similar to that of D-optimality.
• Can easily extend to a Bayesian version
11
• E-Optimal: maximized the smallest eigenvalue of the
information matrix.
• May behave unfavorably because the contribution of
an item with equal discrimination parameters to the
test information vanishes when the sampling variance
of the ability estimator have become equal to each
other. This fact contradicts the fundamental rule that
the average sampling variance of ability should always
decrease after a new observation. Using E-optimality
for item selection in MCAT may result in occasionally
bad item selection and its use not recommended.
12
Graphical
Example
13
5.25
5.2
D
5.15
5.1
5.05
5
4
2
4
2
0
0
-2
theta2
-2
-4
-4
theta1
Item 1: a=(0.5,0) Col
Item 2: a=(0.64,0.64) B&W
14
Nuisance Abilities
15
• Both Ds-optimality and As-optimality
generally selects items that highly
discriminate with respect to the intentional
ability. However, when the amount of
information about the nuisance abilities is
relatively low (that, determinant of nuisance
ability is small), an item that highly
discriminates with respect to the nuisance
abilities is often preferred.
16
Composite ability
• C-optimality prefer items with discrimination
parameters that reflect the weights of
importance in the composite ability. Thus, items
that with
is generally more informative.
17
Labda=[1 1], a1=[0.5 1]’, a2=[0.8 1]’, labda*a1=1.5 > labda*a2=0.8.
Labda=[1 1], a1=[0.5 1]’, a2=[0.8 0.8]’, labda*a1=1.5 < labda*a2=1.6
18
Simulation Study
• Two dimensions MACT
• Item pool: 200 items generated from a1~N(1,0.3),
a2~N(1,0.3), b~N(0,3) and 10c~Bin(3,0.5).
• Stopping rule: 30 items
• For each combination theta1 =-1,0,1 and theta2=1,0,1, a total of 100 adaptive test administration
were simulated.
• Bias and MSE were compared between different
criterion optimality
• Random selection was served as baseline.
19
20
Theta1 and theta2 intentional
A-optimality and D-optimality resulted more accurate ability
estimation than E-optimality (which is even worse than R).
21
theta1 intentional and theta2 a
nuisance
• Ds-optimality selects items that minimize the
asymptotic variance of the intentional theta1
(the MSE of theta1 is smaller than that of
theta1 when theta1 and theta 2 are both
intentional). However, the MSE for the theta2
is much larger.
22
Composite ability
When equal weights, C-optimality with weights (1/2,1/2)
yielded the highest accuracy for composite ability, however,
larger MSE for separate abilities.
23
c(1,0)
When unequal weights (3/4,1/4), the Ds-optimal was
similar to c(3/4,1/4).
24
Average Values of Optimality Criteria
Except for E-optimality, each of the criterion produced the
smallest average value for the specific quantity optimized by
the criteria.
25
Conclusions
• When all abilities are intentional, both A-optimality
and D-optimality result in the most accurate estimation
for the separate abilities. The most informative items
measure mainly one ability. Both criteria tend to
“minimax”.
• When one of the abilities is intentional and the others
are nuisance, item selection based on Ds-optimality (or
As-optimality) result in the most accurate estimates for
the intentional ability. Items that measure only the
intentional ability are generally most informative.
When the current inaccuracy of the estimator of a
nuisance ability becomes too large relative to that of
the intentional abilities, an item that sensitive to the
nuisance ability will occasionally preferred.
26
• For composition abilities, c-optimality with
weights lambda proportional to the coefficients
in the composite ability results in the most
accurate estimation of ability. The criterion has a
preference for items when the proportion of the
discrimination parameters reflects the weights in
the combination.
• Content control and exposure rate should be
considered.
• CAT for ipsative tests
27
Download