Uploaded by Ana Ćosić Pilepić

DIFferent Approaches to Cross-Cultural Validation

advertisement
DIFferent Approaches to
Cross-Cultural Validation
Ana Ćosić Pilepić, Tamara Mohorić, Vladimir Takšić
Faculty of Social Sciences and Humanities
University of Rijeka
1
TERMINOLOGY
• measurement bias – one of the threats to validity of interpretations and
inferences based on psychological testing
• low construct validity of test – test contains items measuring other
constructs beside the one it was intended to measure  potential threat of
bias towards a specific group of participants
• item bias – probability for success on the item is different for examinees
with the same level of ability
2
TERMINOLOGY
• IMPACT – true group differences in the probability of answering correctly
on the item (basis for analyzing impact – a valid test)
• differential item functioning (DIF) – somewhat neutral term to refer to
differences in the statistical properties of an item between groups of
examinees of equal ability
• DIF ≠ item bias
• DIF is a necessary but not sufficient condition for item bias
3
CHARACTERISTICS COMMON TO ALL
DIF METHODS
• focal and reference group
• matching variable – examinees in different groups are matched on
ability/latent trait/proficiency (basis for detection of DIF and not impact) :
internal or external
• predictive validity approach vs. internal criteria approach (DIF indices)
• internal criteria: total test score or other items
• purification step – used to remove DIF items that might contaminate the matching
criterion; remaining DIF-free items (anchor items) can then be used in ability matching
4
DIFFERENT CLASSIFICATIONS OF DIF
METHODS
•
•
•
•
•
null DIF hypothesis: observed-score and latent variable null DIF
the studied item score: dichotomous items vs. polytomous items methods
comparison of two or more groups
parametric vs. nonparametric methods
Mapuranga, 2008.
5
CTT
IRT
• STAND
• smoothed
STAND
• DIF dissection
• M-H
procedure
• CMH
• Cox's β
• hierarchical
logistic
regression
• logistic mixed
model
• mixture model
• HGLM
• DFIT
• TestGraf
• ScramsMcLeod
• MIMIC model
• Lagrangian
multiplier tests
• RCML
• McDonald's
CTT&IRT
• SIBTEST
• kernel
smoothed
SIBTEST
• MULTISIB
No test theory
• Liu-Agresti
estimator
• logistic
regression
6
CLASSICAL TEST THEORY
METHODS
7
Traditional methods
•
•
•
•
the transformed item difficulty index (delta plot method)
adjustment: residualized TID indices - Shepard et al. (1985)
analysis of variance
correlational methods:
• rank-order correlation of p-values
• item-test point-biserial correlations
• exploratory factor analysis
8
Traditional methods
• methods based on the concept of differential item difficulty
• classical indices are sample dependent
• classical item p-values confound item difficulty with group mean differences and
item discrimination (Angoff, 1982; Hunter, 1975., Lord 1977)
• whenever two groups are not equal on the trait being measured, highly
discriminating items will appear to be biased because they do a better job of making
the distinction between low-scoring and high-scoring groups
• frequent type I and type II errors
9
Standardization (STAND)
• Dorans & Kolick, 1983
• the idea is to compute the difference between the proportion of examinees,
from both focal and reference groups, who answer the item correctly at each
score level, and with more weight attached to score levels with more
examinees
• unsigned proportion difference and signed proportion difference (standardized p-difference)
• it requires large sample sizes and offers no significance test
10
Mantel – Haenszel procedure
• the odds of correctly endorsing an item for the focal group relative to the
reference group; large differences  DIF is present
• odds ratios of success at each ability level are estimated and then averaged
over all ability levels  Mantel-Haenszel odds ratio index (𝛼𝑀𝐻)
• 𝛽𝑀𝐻 - logit transformation of 𝛼𝑀𝐻
• Mantel-Hanszel delta difference (MH D – DIF ) – classification of items
(negligable DIF, slight to moderate DIF, moderate to large DIF)
11
ITEM RESPONSE THEORY
METHODS
matching variable is an estimated ability level or latent trait, θ
12
BASIC CONCEPTS OF IRT
- item characteristic curve
- 3 parameters: item difficulty (b),
item discrimination (a) and
guessing factor (c)
- 1 parameter logistic model (1PL;
Rasch model), 2 parameter
logistic model (2PL), 3
parameter logistic model (3PL)
13
DIF detection
• basic idea: if DIF is present, reference and focal group will have different
ICCs
• multitude of different approaches for identification of differences in
parameters between groups are present:
• comparison of item parameters across groups (Linacre & Wright, 1986; Lord, 1980);
• estimating the area between the ICCs for the two groups (Raju, 1988);
• or improvement in fit for the model can be tested, comparing fit with and without
separate group parameter estimates (Thissen, Steinberg, & Wainer, 1993)
14
Pros and cons
•
•
•
•
•
sample independent parameters
more precise than CTT methods
conceptually clear, but computationally demanding
graphical representation (ICC) offers an easy visual inspection tool
big samples– 3PL model >1000 examinees per group
15
MIXED METHODS
SIBTEST, kernel smoothed SIBTEST, MULTISIB
16
Simultaneous item bias test (SIBTEST)
• nonparametric method to detect test bias or differential test functioning
(DTF)
• conceptually similar to standardization, but it offers a significance test AND
matching variable is a latent trait, not an observed score
• allows for evaluation of DIF amplification or cancellation effects across items
within a testlet or bundle
• allows for detection of both uniform and non-uniform DIF
17
NO TEST THEORY METHODS
Logistic regression, Liu-Agresti estimator
18
Logistic regression
• Swaminathan and Rogers (1990)
• a link between contingency table methods (odds ratio methods) and IRT
methods
• tries to predict the success on a specific item based on total test score only
(no DIF); based on total test score and group membership (uniform DIF) or
based on total test score, group membership and their interaction term (nonuniform DIF)
• small sample sizes
19
CONCLUSION
20
• Use of specific method must be based on different considerations:
- studied instrument
- studied items
- sample size
- resources (money, time)
- ...
21
PURPOSE OF INTENDED STUDY:
• To test the efficacy of different methods of DIF
• To compare different test theory frameworks to determine whether there is any
difference in the information given by the different approaches
• To compare different DIF indices
• To examine whether there are problematic items with respect to DIF in different
conditions (sex, culture)
• Sample: > 5000 examinees from 10 different countries (cross-cultural study)
• Instrument: ESCQ (Likert scale)
22
THANK YOU FOR YOUR
ATTENTION!
Please give any comment or suggestion regarding the doctoral dissertation
research design...

23
Download