Fundamentals of IRT

EPSY 546: LECTURE 3 GENERALIZABILITY THEORY AND VALIDITY George Karabatsos 1 GENERALIZABILITY THEORY 2 TRUE SCORE MODEL • Recall the true score model: X  n  Tn  en X+n Observed Test Score of person n, Tn True Test Score (unknown) en Random Error (unknown) 3 TRUE SCORE MODEL • Recall the true score model: X  n  Tn  en • One may view that the true score model narrowly defines error. 1 variable, simple ANOVA: Between (true score) var + Within (random error) var. 4 GENERALIZABILTY THEORY • Generalizability Theory extends the true score model by acknowledging that multiple factors affect the measurement variance. – Multivariable ANOVA: The observed test response is a function of 2 or more variables, their interactions, and random measurement error. 5 G-THEORY MODEL (example) Xnjt = + + + + + + +  n –  j –  t –  nt – n – t +  nj – n – j +  tj – t – j +  residual Grand mean Person n’s effect Item j’s effect Time t’s effect Person  Time effect Person  Item effect Time  Item effect Three way interaction, and error 6 G-THEORY VARIANCE PARTITION Systematic Persons 2P Measurement Error (facet contributions) Items 2I Time 2T Person  Time 2 PT Person  Item 2 PI Time  Item 2 TI 3-way inter + error 2PIT, error 7 G-THEORY OF DECISIONS • Relative decisions: Decisions based on the rank ordering of persons (e.g., college admission, pass-fail testing). • Variance contributing to measurement error for relative decisions: 2Relat = 2PI + 2PT + 2PIT,error (all variance components associated with the interaction of persons) 8 G-THEORY OF DECISIONS • Absolute decisions: Decisions based on the level of the observed score, without regard to the performance of others. (e.g., driver’s license). • Variance contributing to measurement error for absolute decisions : 2Abs = 2T + 2I + 2PI + 2PT + 2IT + 2PIT,error (all variance components associated with the facets, which introduce “constant” effects to absolute decisions) 9 GENERALIZABILITY COEFFICIENT  E  2 P   2 with:  2 P 2 Decision 2 Decision  , 2 Relat or  2 Abs • Indicates how accurately the observed test scores allows us to generalize about persons’ behavior in a designed universe of situations (Cronbach, 1972). 10 STUDIES • G-Study (Generalizability Study): Aims to estimate the variance components underlying a measurement process by defining the universe of admissible observations as broadly as possible. 11 STUDIES • D-Study (Design Study): Using G-study results to address “what if” questions about variation in measurement design (Thompson & Melancon, 1987). This helps pinpoint sources of error to specify protocol modifications to obtain the desired level of generalizability. 12 EXAMPLES OF G- THEORY • Nice illustrations are offered in: Webb, Rowley, & Shavelson (1988) and Crowley, Thompson, & Worchel (1994) 13 VALIDITY 14 TEST VALIDITY • VALIDITY: A test is valid if it measures what it claims to measure. • Types: Face, Content, Concurrent, Predictive, Construct. 15 TEST VALIDITY • Face validity: When the test items appear to measure what the test claims to measure. • Content Validity: When the content of the test items, according to domain experts, adequately represent the latent trait that the test intends to measure. 16 TEST VALIDITY • Concurrent validity: When the test, which intends to measure a particular latent trait, correlates highly with another test that measures that trait. • Predictive validity: When the scores of the test predict some meaningful criterion. 17 TEST VALIDITY • Construct validity: A test has construct validity when the results of using the test fit hypotheses concerning the theoretical nature of the latent trait. The higher the fit, the higher the construct validity. 18 MESSICK’S UNIFIED CONSTRUCT VALIDITY – Content: Item content relevance, representativeness, and technical quality (includes face). – Substantive: Theoretical rationales for the observed consistencies in the test responses. – Structural: Fidelity of scoring structure to the structure of the content domain. – Generalizability: The extent to which the score properties and interpretations generalize over population groups, settings, and tasks. – External: Concurrent/convergent, discrim., pred. – Consequential: refers to the (potential and actual) 19 consequences of test use.

Fundamentals of IRT

Related documents

Products

Support

Fundamentals of IRT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib