CDM Cognitive Diagnosis Modeling with Mplus User Guide 2 The CDM program is dedicated to of F. Porch, who was instrumental in the development and parameterization of many cognitive diagnosis models. 3 CDM End User License Agreement © 2006 by Jonathan Templin (jtemplin@ku.edu). All rights reserved. By installing and using the CDM program you indicate your acceptance of the following end user license agreement. 1. The user of this program has purchased a licensed, registered copy of the Mplus software program by Muthén and Muthén (available currently at http://www.statmodel.com). 2. The end user of this program understands that this program writes script for Mplus to conduct all analyses provided herein, and, therefore does not function as a stand-alone program to estimate such models. 3. This software is provided by Jonathan Templin “as is” and there is no warrantee for the software. 4. In no event shall Jonathan Templin be liable for any direct, indirect, incidental, or other damages (including, but not limited to loss use and/or data) arising from the use of this software. 5. Redistributions in binary7form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 4 Table of Contents CDM – Cognitive Diagnosis Modeling Using Mplus......................................................... 5 Computer Requirements.................................................................................................. 5 CDM Installation................................................................................................................. 5 CDM Program Execution.................................................................................................... 5 CDM Input File................................................................................................................ 7 *Qmatrix Section Syntax............................................................................................. 8 *DataFile Section Syntax.............................................................................................8 *CDM Section Syntax................................................................................................. 9 The DINA Model .................................................................................................. 10 The DINO Model .................................................................................................. 10 The NIDA Model .................................................................................................. 11 The NIDO Model .................................................................................................. 11 The Conjunctive RUM...........................................................................................12 The Compensatory RUM....................................................................................... 12 *Program Section Syntax...........................................................................................13 CDM Program Output........................................................................................................15 CDM_output.csv............................................................................................................ 15 CDMexam_out.csv........................................................................................................ 15 References.......................................................................................................................... 16 5 CDM – Cognitive Diagnosis Modeling Using Mplus The CDM program is a freely available piece of software to assist users in estimation of cognitive diagnosis models with Mplus. The program creates Mplus input files (script of Mplus code), runs the Mplus program from the command prompt, and culls the appropriate information from the Mplus output files, thereby reducing the amount of time needed to estimate common models for cognitive diagnosis with the Mplus package. Because the CDM program does not estimate any models directly, but provides the script to do such estimation with Mplus, therefore having a licensed, registered copy of Mplus is a prerequisite for use of the CDM program. Computer Requirements To use the CDM program, the user must: • • Have a licensed and registered copy of Mplus on the same machine as the CDM program. Be using the Microsoft Windows XP (any version) operating system. CDM Installation Special installation of the CDM program is not necessary, simply put a copy of the executable into the folder where you wish the analysis files to be placed. CDM Program Execution The CDM program is a stand-alone (binary) executable that can be run either from the DOS prompt or by double clicking the program in a Microsoft Windows explorer window. The program requires a single, external, text-based input file (as described in the CDM Input File section of this user guide). To open a DOS prompt in Windows XP: 1. Go to the Run menu (commonly found under the Windows XP Start Menu). 2. Type the word ‘cmd’ in the Run menu dialog box and click OK (as shown in Figure 1). 6 Figure 1: Windows XP Run Dialog Box Once the DOS Prompt Appears, change the directory of the prompt to the directory where the CDM program (and subsequent analysis files) should be output. To do this: 1. Type “cd path_to_CDM” at the DOS prompt (without quotes, where path_to_CDM is the path to the CDM program). As shown in Figure 2 below, the path is C:\Documents and Settings\Jonathan Templin\Desktop\CDM Example: Figure 2: Changing Directories in the DOS Prompt To run the CDM program from the DOS prompt: 1. Type “cdm0.0 input_file_name” at the DOS prompt (without quotes, where input_file_name is the name of the CDM input file). 7 CDM Input File The CDM program uses an text-based input file to direct the subsequent script creation and analyses performed in Mplus. The full syntax of the input file is listed in Table 1. The optional input is listed in brackets (with the program defaults being displayed within the brackets). Table 1: CDM Input File Syntax *Qmatrix file = Qmatrix File Name attributes = Number of Attributes [format = *] *DataFile file = Data File Name items = Number of Items [format = *] *CDM model = Model name [startmethod = random] [randomseed = 0] *Program mpluspath = Path to Mplus [analysis = all] [inputfile = cdm_DATE_TIME.imp] [itemfile = cdm_DATE_TIME_out.dat] [examfile = cdm_DATE_TIME_exam.dat] [covfile = cdm_DATE_TIME_cov.dat] A total of four sections are required to run the CDM program: Qmatrix, DataFile, CDM, Program. The beginning of each section is denoted by an asterisk prior to each of these words. The details of each section are provided below. A set of example files are provided for analyses with several of the cognitive diagnosis models. The general syntax for any option is the name of the option (case sensitive), followed by an equals sign, followed by the option setting. 8 *Qmatrix Section Syntax The Qmatrix section of the CDM input file provides information on the Q-matrix used for the analysis. Two options are required in this section: the Q-matrix file name and the number of attributes. An optional input is the format of the Q-matrix file. file = Qmatrix File Name The file option provides the Q-matrix file name. The Q-matrix file name can include a path to the Q-matrix file (if not in the current folder), but as a default, the CDM program will check the current folder for the Q-matrix file. The type of file name must follow typical Windows conventions, without having any of the following characters: \, /, :, *, ?, “, <, >, or |. The Q-matrix file must be text-based, and each entry must be either a one or a zero. The Q-matrix file must have the attributes (skills) listed in the columns, with each row representing the indicators of the attributes needed for each item. attributes = Number of Attributes The attributes option provides the number attributes (or columns in the Q-matrix). This value must be numeric. Currently the maximum number of attributes is bounded only by the system memory of the machine on which the CDM program is executed. [format = *] [OPTIONAL] The format option allows the user to specify format of the Q-matrix file input. If not specified, the CDM program will expect the Q-matrix to be free-formatted. Free-formatted files have spaces, commas, or tabs separating the data in each column. The format statement takes the form of general FORTRAN format statements. For more information on FORTRAN format statements please refer to other sources. One common type of format is to not have any spaces between the columns of the Qmatrix. In this case (with four attributes), one could specify the input format by using the following format statement: [format = 4i1]. *DataFile Section Syntax The DataFile section of the CDM input file provides information on the data file used for the analysis. Two options are required in this section: the data file name and the number of item. An optional input is the format of the data file. 9 file = Data File Name The file option provides the data file name. The data file name can include a path to the data file (if not in the current folder), but as a default, the CDM program will check the current folder for the data file. The type of file name must follow typical Windows conventions, without having any of the following characters: \, /, :, *, ?, “, <, >, or |. The data file must be text-based, and each entry must be either a one or a zero. The data file must have the items listed in the columns, with each row representing an examinee’s set of responses for each item. items = Number of Items The items option provides the number items (or columns in the data file). This value must be numeric. Currently the maximum number of items is bounded only by the system memory of the machine on which the CDM program is executed. [format = *] [OPTIONAL] The format option allows the user to specify format of the data file input. If not specified, the CDM program will expect the data file to be free-formatted. Freeformatted files have spaces, commas, or tabs separating the data in each column. The format statement takes the form of general FORTRAN format statements. For more information on FORTRAN format statements please refer to other sources. One common type of format is to not have any spaces between the columns (items) of the data file. In this case (with forty items), one could specify the input format by using the following format statement: [format = 40i1]. *CDM Section Syntax The CDM section of the CDM input file provides information on the type of cognitive diagnosis model to run, along with the method for finding starting values. One option is required – the name of the cognitive diagnosis model. model = Model name The model option specifies the name of the cognitive diagnosis model. Currently, six models for cognitive diagnosis are able to be estimated by the program (as described subsequently), and the model name must be specified exactly as shown in the user guide. 10 Types of Model Name: The DINA Model (Macready and Dayton, 1977; Haertel, 1989; Junker and Sjitsma, 1999) model = DINA The DINA model (“Diagnostic Inputs, Noisy And” gate) is a conjunctive model for cognitive diagnosis. The model specifies two possible response probabilities per item, one representing examinees who have mastered all Q-matrix specified attributes for the item (called the “slip” parameter) and one representing examinees who are lacking mastery of one or more of the Q-matrix specified attributes for the item (called the “guess” parameter). To separate examinees into these two groups, the model uses the following dichotomous latent variable: ξ ij = q jk K ∏ α ik . (1) k= 1 Here, the qjk is the Q-matrix entry for the jth item and the kth attribute, and αik is the binary indicator that examinee i has mastered attribute k. The latent variable, ξ ij , can take two values: one (for an examinee who has mastered all “needed” attributes for the item) and zero (for an examinee who has failed to master at least one of the “needed” attributes for the item). Conditional on ξ ij , the DINA model specifies the probability of a correct response to item j as: P ( X ij = 1 | ξ ij ) = (1 − s j ) ij g j ξ (1− ξ ij ) . (2) The DINO Model (Templin and Henson, 2006) model = DINO The DINO model (“Diagnostic Inputs, Noisy Or” gate) is the compensatory analog of the DINA model for cognitive diagnosis. The model specifies two possible response probabilities per item, one representing examinees who have mastered one or more Qmatrix specified attributes for the item (called the “slip” parameter) and one representing examinees who are lacking mastery of every of the Q-matrix specified attributes for the item (called the “guess” parameter). To separate examinees into these two groups, the model uses the following dichotomous latent variable: 11 ω ij = 1 − K ∏ (1 - αik ) q jk . (3) k=1 Here, the qjk is the Q-matrix entry for the jth item and the kth attribute, and αik is the binary indicator that examinee i has mastered attribute k. The latent variable, ω ij , can take two values: one (for an examinee who has mastered at least one “needed” attributes for the item) and zero (for an examinee who has failed to master each of the “needed” attributes for the item). Conditional on ω ij , the DINO model specifies the probability of a correct response to item j as: P ( X ij = 1 | ω ij ) = (1 − s j ) ϖ ij gj (1− ϖ ij ) . (4) The NIDA Model (Maris, 1999) model = NIDA The NIDA model (“Noisy Inputs, Deterministic And” gate) is a conjunctive model for cognitive diagnosis. The model specifies two parameters per attribute, one representing examinees who have mastered the attribute (called the “slip” parameter) and one representing examinees who are lacking mastery the attribute (called the “guess” parameter). The set of attribute parameters for the NIDA model are the same for each item (item discrimination is equal for all items). P( X ij = 1 | α ij ) = ∏ [(1 − s ) K j k=1 α ij gj (1− α ij ] ) q jk . (5) The NIDO Model (Introduced here) model = NIDO The NIDO model (“Noisy Inputs, Deterministic Or” gate) is the compensatory analog of the NIDA model for cognitive diagnosis. The model specifies two parameters per attribute, one representing examinees who have mastered the attribute (called the “beta” parameter) and one representing examinees who are lacking mastery the attribute (called the “tau” parameter). The set of attribute parameters for the NIDA model are the same for each item (item discrimination is equal for all items). 12 K P ( X ij = 1 | α ij ) = 1 + exp ∑ (τ k + β kα k=1 −1 . ij ) q jk (6) The Conjunctive RUM (Hartz, 2002) model = RUM The RUM (or Reparameterized Unified Model) is a conjunctive model for cognitive diagnosis (without the continuous Rasch model term as included in Hartz, 2002). For an item j, the model specifies the probability that an examinee who has mastered all Qmatrix specified attributes correctly responds to an item as πj*. For each attribute an examinee lacks, the item response probability is reduced, and the parameter representing this drop is called rjk*. The item response function is then: P ( X ij = 1 | α ij ) = π * j K ∏ k= 1 rjk* (1− α ij ) q jk . (7) The Compensatory RUM (Hartz, 2002; von Davier and Yamamoto, 2004) model = COMRUM The Compensatory RUM is the compensatory analog to the RUM for cognitive diagnosis. For an item j, the model specifies the probability that an examinee who has failed to master all Q-matrix specified attributes correctly responds to an item as a function of an intercept parameter, τj. For each attribute an examinee has mastered, the item response probability is increased, and the parameter representing this increase is called βjk. The item response function is then: P( X ij = 1 | α ij ) = 1 + exp τ j + −1 ∑k = 1 ( β jkα ij q jk ) . K (8) 13 [startmethod = random] [OPTIONAL] The startmethod option provides the user the opportunity to select between two types of methods for selecting the starting values of the analysis. Due to the nature of most cognitive diagnosis models, the starting values are very important, and must be acceptable values for the model being used (parameter values ordered correctly). Because of these specific demands, the CDM program specifies the starting values for each analysis. The default version of the startmethod is the random starting value method. The random starting method randomly generates starting values for each parameter. This method can lead to very long estimation times, but can also allow the user to find locally optimal results with repeated runs of the program. The alternative startmethod option is called “subscore” (without quotes). The subscore method first calculates the subscore for each examinee and each attribute (the number of indicated items correctly answered for each attribute). For each attribute, if the examinee is above the mean, the examinee is considered a master. After each examinee’s subscorebased attribute pattern has been found, the program computes the initial estimates of the cognitive diagnosis model item parameters. The subscore method will allow for generally faster estimation times, but will lead to the same result for repeated runs (so local optima will not be detected). [randomseed = 0] [OPTIONAL] The randomseed option specifies the random seed to be used if the startmethod used is random. The default value for the random seed is zero – meaning the seed is taken from the system clock at the time of the program’s execution. The input can be any integer value. *Program Section Syntax The Program section of the CDM input file provides information on the location of the Mplus program, along with the names for the Mplus input and output files used for the analysis. One option is required in this section: the path to the Mplus program. mpluspath = Path to Mplus The mplus path option is the place where the path to the Mplus executable is found. The Mplus executable is installed (by default) in the path C:\Program Files\Mplus\Mplus.exe. If you have modified the location of the program, provide the path here. [analysis = all] 14 [OPTIONAL] The analysis option tells the CDM program the analysis sequence that is to be used. The following options are possible: • All (default) – Creates an Mplus input file, runs Mplus to estimate the cognitive diagnosis file, and extracts the cognitive diagnosis model information from the Mplus output. • File – Creates the Mplus input file only. • Estimate – Creates the Mplus input file and estimates the cognitive diagnosis model using Mplus. • Extract – Takes output from Mplus and extracts the cognitive diagnosis model estimates only. [inputfile = cdm_DATE_TIME.imp] [OPTIONAL] The inputfile option allows the user to specify the name of the input file used in the Mplus analysis. The default name for this file is generated by the CDM program and contains the date and time of the CDM program execution. [itemfile = cdm_DATE_TIME_out.dat] [OPTIONAL] The itemfile option allows the user to specify the name of the file with item parameter estimates output by Mplus. The default name for this file is generated by the CDM program and contains the date and time of the CDM program execution. [examfile = cdm_DATE_TIME_exam.dat] [OPTIONAL] The examfile option allows the user to specify the name of the file with examinee attribute pattern estimates output by Mplus. The default name for this file is generated by the CDM program and contains the date and time of the CDM program execution. [covfile = cdm_DATE_TIME_cov.dat] [OPTIONAL] The covfile option allows the user to specify the name of the file with item parameter covariance matrix estimates output by Mplus. The default name for this file is generated by the CDM program and contains the date and time of the CDM program execution. 15 CDM Program Output Upon successfully running the program, several output files are created. These output files contain the cognitive diagnosis model parameter estimates along with the estimates for the examinee parameters (posterior probability of marginal attribute mastery and posterior probability of having each attribute pattern). CDM_output.csv The cdm_output.csv file contains the cognitive diagnosis model parameter estimates and model fit statistics. CDMexam_out.csv The cdm_output.csv file contains the examinee estimates from the cognitive diagnosis model run. 16 References Haertel, E. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333-352. Hartz, S. (2002). A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practice. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272. Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2, 99-120. Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 197-212. Templin, J., & Henson, R. (2006). Measurement of psychological disorders using cognitive diagnosis models. Manuscript under review. von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement, 28, 389-406.