IRT basics: Theory and parameter estimation Wayne C. Lee, David Chuah, Patrick Wadlington, Steve Stark, & Sasha Chernyshenko 1 Overview How do I begin a set of IRT analyses? What do I need? Software Data What do I do? On-line! Input/ syntax files Examination of output 2 “Eye-ARE-What?” Item response theory (IRT) Set of probabilistic models that… Describes the relationship between a respondent’s magnitude on a construct (a.k.a. latent trait; e.g., extraversion, cognitive ability, affective commitment)… To his or her probability of a particular response to an individual item 3 But what does that buy you? Provides more information than classical test theory (CTT) Classical test statistics depend on the set of items and sample examined IRT modeling not dependent on sample examined Can examine item bias/ measurement equivalence and provide conditional standard errors of measurement 4 Before we begin… Data preparation Raw data must be recoded if necessary (negatively worded items must be reverse coded such that all items in the scale indicate a positive direction) Dichotomization (optional) Reducing multiple options into two separate values (0, 1; right, wrong) 5 Calibration and validation files Data is split into two separate files Calibration sample for estimating IRT parameters Validation sample for assessing the fit of the model to the data Data files for the programs that we will be discussing must be in ASCII/ text format 6 Investigating dimensionality The models presented make a common assumption of unidimensionality Hattie (1985) reviewed 30 techniques Some propose the ratio of the 1st eigenvalue to the 2nd eigenvalue (Lord, 1980) On-line we describe how to examine the eigenvalues following Principal Axis Factoring (PAF) 7 PAF and scree plots If the data are dichotomous, factor analyze tetrachoric correlations Dominant first factor Assume continuum underlies item responses 8 Two models presented The Three Parameter Logistic model (3PL) For dichotomous data E.g., cognitive ability tests Samejima's Graded Response model For polytomous data where options are ordered along a continuum E.g., Likert scales Common models among applied psychologists 9 The 3PL model Three parameters: a = item discrimination b = item extremity/ difficulty c = lower asymptote, “pseudo-guessing” Theta refers to the latent trait 10 Effect of the “a” parameter Small “a,” poor discrimination 11 Effect of the “a” parameter Larger “a,” better discrimination 12 Effect of the “b” parameter Low “b,” “easy item” 13 Effect of the “b” parameter Higher “b,” more difficult item “b” inversely proportional to CTT p 14 Effect of the “c” parameter c=0, asymptote at zero 15 Effect of the “c” parameter “low ability” respondents may endorse correct response 16 Estimating 3PL parameters DOS version of BILOG (Scientific Software) Easier to estimate parameters for a large number of scales or experimental groups Data file must be saved as ASCII text Multiple files in directory, but small size overall ID number Individual responses Input file (ASCII text) 17 BILOG input file (*.BLG) Title line AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; 18 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Data File Name Parameters Characters in ID field File for missing 19 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Requested files for: Scoring, Parameters, Covariances 20 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); Number of items >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Sample size 21 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); FORTRAN >INPUT SAMPLE=99999; (4A1,10A1) statement for >TEST TNAME=AGR; reading data >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Name of scale/ measure 22 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); Estimation specifications >INPUT SAMPLE=99999; (not the default for BILOG) (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; 23 BILOG input file (*.BLG) AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT >GLOBAL DFN='AGR2_CAL.DAT', NIDW=4, NPARM=3, OFNAME='OMIT.KEY', SAVE; >SAVE SCO = 'AGR2_CAL.SCO', PARM = 'AGR2_CAL.PAR', COV = 'AGR2_CAL.COV'; >LENGTH NITEMS=(10); >INPUT SAMPLE=99999; (4A1,10A1) >TEST TNAME=AGR; >CALIB NQPT=40, CYC=100, NEW=30, CRIT=.001, PLOT=0; >SCORE MET=2, IDIST=0, RSC=0, NOPRINT; Scoring: Maximum likelihood, no prior distribution of scale scores, no rescaling 24 Phase one output file (*.PH1) CLASSICAL ITEM STATISTICS FOR SUBTEST AGR NUMBER ITEM NAME TRIED NUMBER RIGHT ITEM*TEST CORRELATION PERCENT LOGIT/1.7 PEARSON BISERIAL --------------------------------------------------------------------1 0001 1500.0 1158.0 0.772 0.72 0.535 0.742 2 0002 1500.0 991.0 0.661 0.39 0.421 0.545 3 0003 1500.0 1354.0 0.903 1.31 0.290 0.500 4 0004 1500.0 1187.0 0.791 0.78 0.518 0.733 5 0005 1500.0 970.0 0.647 0.36 0.566 0.728 6 0006 1500.0 1203.0 0.802 0.82 0.362 0.519 7 0007 1500.0 875.0 0.583 0.20 0.533 0.674 8 0008 1500.0 810.0 0.540 0.09 0.473 0.594 9 0009 1500.0 1022.0 0.681 0.45 0.415 0.542 10 0010 1500.0 869.0 0.579 0.19 0.426 0.538 --------------------------------------------------------------------- Can indicate problems in parameter estimation 25 Phase two output file (*.PH2) CYCLE 12: LARGEST CHANGE = 0.00116 -2 LOG LIKELIHOOD = 15181.4541 CYCLE 13: LARGEST CHANGE = 0.00071 [FULL NEWTON STEP] -2 LOG LIKELIHOOD = 15181.2347 Check for convergence CYCLE 14: LARGEST CHANGE = 0.00066 26 Phase three output file (*.PH3) Theta estimation Scoring of individual respondents Required for DTF analyses 27 Parameter file “b” (specified, *.PAR) “c” “a” AGREEABLENESS CALIBRATION FOR IRT TUTORIAL. >COMMENT 1 10 10 0001AGR 111 1.130784 1.533393 0.101834 0.185726 0002AGR 211 0.360630 0.870309 0.087236 0.097709 0003AGR 311 1.474175 0.743095 0.108974 0.084487 0004AGR 411 1.196368 1.256263 0.087856 0.114710 0005AGR 511 0.544388 1.403904 0.071490 0.133486 0006AGR 611 0.892399 0.777440 0.093109 0.082096 0007AGR 711 0.174395 1.369223 0.083777 0.159712 (32X,2F12.6,12X,F12.6) -0.737439 0.135455 -0.414371 0.098866 -1.983831 0.250499 -0.952323 0.123613 -0.387767 0.080438 -1.147869 0.152846 -0.127368 0.085084 0.652148 0.078989 1.149018 0.129000 1.345723 0.153003 0.796012 0.072684 0.712300 0.067727 1.286273 0.135828 0.730341 0.085190 0.147203 0.053688 0.132796 0.054461 0.197127 0.087578 0.090901 0.042937 0.056774 0.026086 0.173882 0.075829 0.088135 0.032376 28 PARTO3PL output 0001AGR 0002AGR 0003AGR 0004AGR 0005AGR 0006AGR 0007AGR 0008AGR 0009AGR 0010AGR 111 211 311 411 511 611 711 811 911 1011 1.130784 0.360630 1.474175 1.196368 0.544388 0.892399 0.174395 0.042231 0.441586 0.104452 1.533393 0.870309 0.743095 1.256263 1.403904 0.777440 1.369223 0.979045 0.839144 0.879683 a (*.3PL) -0.737439 -0.414371 -1.983831 -0.952323 -0.387767 -1.147869 -0.127368 -0.043135 -0.526234 -0.118738 b 0.652148 1.149018 1.345723 0.796012 0.712300 1.286273 0.730341 1.021403 1.191691 1.136773 0.147203 0.132796 0.197127 0.090901 0.056774 0.173882 0.088135 0.056546 0.129646 0.101087 c 29 Scoring and covariance files Like the *.PAR file, specifically requested *.COV - Provides parameters as well as the variances/covariances between the parameters Necessary for DIF analyses *.SCO - Provides ability score information for each respondent 30 Samejima's Graded Response model Used when options are ordered along a continuum, as with Likert scales v = response to the polytomously scored item i k = particular option a = discrimination parameter b = extremity parameter 31 Sample SGR Plot “High option” “Low option” Low discrimination (a=0.4) 32 Sample SGR Plot Better discrimination (a=2) 33 Running MULTILOG MULTILOG for DOS Example with DOS batch file INFORLOG with MULTILOG INFORLOG is typically interactive Process automated with batch file and an input file (described on-line) *.IN1 (parameter estimation) *.IN2 (scoring) 34 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 Title line 01234 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) 35 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 Number of items, examinees, 1111111111 characters in the ID field, single 2222222222 group 3333333333 4444444444 5555555555 (4A1,10A1) 36 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 SGR model 01234 Number of options 1111111111 for each item 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) 37 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; Number of cycles >END; for estimation 5 01234 1111111111 2222222222 3333333333 End of command 4444444444 syntax 5555555555 (4A1,10A1) 38 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 Five characters 1111111111 2222222222 Denoting five options 3333333333 4444444444 5555555555 (4A1,10A1) 39 The first input file (*.IN1) CALIBRATION OF AGREEABLENESS GRADED RESPONSE MODEL >PRO IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >EST NC=50; >SAVE; >END; 5 01234 1111111111 2222222222 Recoding of options 3333333333 for MULTILOG 4444444444 5555555555 (4A1,10A1) 40 The second input file (*.IN2) SCORING AGREEABLENESS SCALE SGR MODEL >PRO SCORE IN RA NI=10 NE=1500 NCHAR=4 NG=1; >TEST ALL GR NC=(5,5,5,5,5,5,5,5,5,5); >START; Y Scoring >SAVE; >END; 5 12345 1111111111 2222222222 3333333333 4444444444 5555555555 (4A1,10A1) Yes to INFORLOG (parameters in a separate file) 41 Running MULTILOG Run the batch file *.IN1 *.LS1 (*.lis file renamed as *.ls1) ensure that the data were read in and the model specified correctly also provides a report of the estimation procedure with the estimated item parameters Things of note… 42 “a” includes a 1.7 scaling factor 0ITEM 1: 5 GRADED CATEGORIES P(#) ESTIMATE (S.E.) A 1 1.99 (0.12) B( 1) 2 -3.03 (0.18) B( 2) 3 -2.35 (0.11) Frequencies for B( 3) 4 -0.98 (0.06) each option B( 4) 5 2.01 (0.10) 0 @THETA: -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 I(THETA): 1.08 1.04 1.05 0.81 0.49 0.35 0.47 0.79 0.99 0 OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN CATEGORY(K): 1 2 3 4 5 Collapsing OBS. FREQ. 21 44 277 1050 108 options OBS. PROP. 0.01 0.03 0.18 0.70 0.07 EXP. PROP. 0.01 0.03 0.19 0.70 0.07 43 Scoring output *.IN2 *.LS2 Last portion of the file contains the person parameters (estimated theta, standard error, the number of iterations used, and the respondent's ID number). 44 What now? Review Data requirements for IRT Two models: 3PL (dichotomous), SGR (polytomous), more on-line! MODFIT Can plot IRF’s, ORF’s Model-data fit: Input parameters, validation sample 45