The_DIF-Free-Then-DIF_Strategy_for_the_Assessment_of_Differential

advertisement
The DIF-Free-Then-DIF Strategy for
the Assessment of Differential Item
Functioning
1
Establish common metric for DIF
assessment
• Equal-mean-difficulty (EMD)
• All-other-item (AOI)
• Constant-item (CI)
2
Equal-mean-difficulty (EMD)
• the mean item difficulties of the test for the two groups are
constrained to be equal.
• Software: ConQuest, BILOG-MG PARSCALE
• EMD is true when (a) no DIF item, or (b) multiple DIF items
but some favor the reference group and others favor focal
group, and the DIF amount are identical. i.e., ASA ≒ 0.
When ASA >>0, EMD is severely violated.
•
• When applied EMD method, overall, the DIF amount in a
test is forced to be balanced between groups. Therefore,
half items will be classified as favoring the reference group
while another half as favoring the focal group. i.e., DTF=0.
3
All-other-item (AOI)
• all but the studied item in the test are treated
as DIF-free and served as a matching variable
•
• Software: Winsteps, IRTLRDIF, CFA-based DIF
detection
•
• AOI is true when (a) the test is perfect, i.e., no
any DIF item, or (b) the studied item is the
only DIF item in the test
4
Constant-item (CI)
• select a subset of items to serve as anchor to establish
a common metric for DIF assessment
•
• The CI method yields appropriate DIF assessment even
when the test contains 40% DIF items. In addition, the
longer the anchors, the higher the power rate of
detecting DIF. Moreover, four anchors are generally
enough to yield a high power rate.
•
• The superiority of the CI method is verified in the CFAbased DIF assessment (MIMIC method) and IRT-based
DIF detection technique.
5
Scale purification procedure
• Three major steps:
• calibrate item parameters separately for each
group, compute the linking coefficients (a slope
and intercept) to place the reference & focal
groups on a common metric and assess each item
for DIF.
• re-link group matric using only those items not
identified as having DIF in the previous step, and
assess each item for DIF.
• repeat step b until the same items are identified
as having DIF in consecutive iterations.
6
• DIF assessment method with scale purification
procedure provide a substantial improvement
over EMD and AOI method. But it will come
worse when they are high percentage DIF
items (>20%, constant DIF)
7
Implementing the DFTD Strategy
• Woods’ Rank-based (RB) method to select DIFfree items
• assess item 1 for DIF using all other items as
anchors, and repeat this step until the last item is
assessed.
• compute the ratio of the likelihood ratio statistic
(LR) to the number of free parameters for each
item (df), i.e., LR/df.
• rank order the items based on the ratio LR/df.
• designated the desired numbers of items
(generally 4 items) that have the smallest ratios.
8
• Findings:
• RB method has very high rate of accuracy in selecting
clean anchors in large DIF items and large sample size.
 unknown on small sample size
• the rate of accuracy decreased when the percentage of
DIF items increased.  need purification procedure
• the designated anchors yielded more accurate DIF
assessment than the AOI method in terms of Type I
error rate and power rates.  support DFTD strategy.
9
• Rank-based scale purification method (RB-S): to add a
scale purification procedure to the RB method.
• assess item 1 for DIF using all other items as anchors
and repeat this step until the last item is assessed.
• remove those items that are identified as having DIF
from the anchors and re-assess all items for DIF using
the new anchors;
• repeat step b until the same set of items are identified
as having DIF in two consecutive iterations
• designed the desired numbers of items that have the
smallest ratios (LR/df).
10
Simulation Study one
• Design:
(1) method: RB vs RB-S;
(2) impact: r ~ N(0,1); f ~ N(0,1) or N(-1,1)
(3) sample size: 250/250, 500/500, 1000/1000
(4) test length: 20, 40
(5) DIF items Percentage: 10%, 20%, 30%, 40%
(6) DIF patterns: constant and balanced
11
•
•
•
•
•
100 replications
Data gen model: 2PL logistic model
DIF magnitude: ~ N (0.6, 0.01). ASA
Software: IRTLRDIF
Dependent Variable: Rate of accuracy, Type I
error rate, Power,
12
Results
• Rate of Accuracy:
• (a) Both of RB & RB-S methods yields a rate of accuracy
higher than the chance level;
• (b) the larger the sample size and the longer the test,
the higher the rate of accuracy;
• (c) RB& RB-S methods yield similar rates of accuracy
when ASA was very close to zero (e.g., under the
balanced DIF pattern).
• (d) RB-S method yield a rate of accuracy higher than
that of RB method when the ASA was far away from
zero (e.g., high percentage of DIF under the constant
pattern)  support DFTD strategy.
13
Relationship between the ASA and the rate of accuracy for the
RB and RB-S methods for large (R1000/F1000; 1c) in study 1
14
Result
• Type I error & power rate
• (a) when ASA was close to zero, RB & RB-S
method yield a well-controlled Type I error as
well as high power rate.
• (b) when ASA was far away from zero, RB
method yield an inflated Type I error rate and
a deflated power rate whereas RB-S method
maintain a good control over Type I error rate
and power rate.
15
Simulation Study two
• Purpose: compare RB-S method and AOI, AOI-S, PA
• RB-S method: use RB-S method to select four anchor and
then assess the other items for DIF with the designated
anchors (purification procedure, DFTD).
• AOI method: all other items were treated as anchors when
assessing the studied item for DIF (no purification, no
DFTD)
• AOI-S method: the AOI method with scale purification
procedure (purification procedure, no DFTD)
• PA: pure anchor method, in which four clean anchors were
selected by design and then the other items were assessed
for DIF.
16
• Design:
(1) method: RB vs RB-S;
(2) impact: r ~ N(0,1); f ~ N(0,1) or N(-1,1)
(3) DIF item percentage: 10%, 20%, 30%, 40%
(4) DIF pattern: constant, balanced
(5) sample size: 500/500
(4) test length: 20 items
• Dependent variable: Type I, power
17
Result
• (a) When ASA was very close to zero, all methods
yield a well controlled Type I error rate and the
AOI and AOI-S method yield a power rate higher
than that yielded by RB-S and PA methods.
• (b) When ASA was large, only RB-S and PA
method maintain control over Type I error rate
and power rate, whereas AOI method suffered
from the large ASA seriously, followed by AOI-S
method.
18
Relationship between the ASA and the mean Type I error rate
for the AOI, AOI-S, PA, and RB-S methods in study 2
19
Conclusions and Discussions
• Limitation of IRTLRDIF in implement CI
method.
• Some other methods to select DIF-free item.
e.g., the iterative constant-item method (Shih
& Wang, 2009);
• Nonuniform DIF;
• Rash model, 3PL, GPCM,…
• Implement DFTD to IRT/nonIRT based DIF
assess method (MH, LR)
20
ASA
• Average Signed Area
• 𝐴𝑆𝐴 ≡ 𝑆𝐴𝑖 𝐿
• SA𝑖 is the signed area measured between the
two items’ characteristic curves for two
groups on item 𝑖.
• 𝑆𝐴𝑖 = 𝑏𝑖𝐹 − 𝑏𝑖𝑅 (under Rasch, 2PL)
where 𝑏𝑖𝐹 and 𝑏𝑖𝑅 are the difficulties of item 𝑖
of the focal and reference group, respectively.
21
• ASA ≒0: test do not contain DIF item or when
tests contain multiple DIF items but their
effects are cancelled out between groups
(balanced pattern)
• ASA > 0: the test as a whole favors the
reference group;
• ASA < 0: the test as a whole favors the focal
group;
22
Download