CDM Cognitive Diagnosis Modeling with Mplus User Guide

CDM
Cognitive Diagnosis Modeling with Mplus
User Guide
2
The CDM program is dedicated to of F. Porch, who was instrumental in the development
and parameterization of many cognitive diagnosis models.
3
CDM
End User License Agreement
© 2006 by Jonathan Templin (jtemplin@ku.edu). All rights reserved.
By installing and using the CDM program you indicate your acceptance of the following
end user license agreement.
1. The user of this program has purchased a licensed, registered copy of the Mplus
software program by Muthén and Muthén (available currently at
http://www.statmodel.com).
2. The end user of this program understands that this program writes script for
Mplus to conduct all analyses provided herein, and, therefore does not function as
a stand-alone program to estimate such models.
3. This software is provided by Jonathan Templin “as is” and there is no warrantee
for the software.
4. In no event shall Jonathan Templin be liable for any direct, indirect, incidental, or
other damages (including, but not limited to loss use and/or data) arising from the
use of this software.
5. Redistributions in binary7form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
4
Table of Contents
CDM – Cognitive Diagnosis Modeling Using Mplus......................................................... 5
Computer Requirements.................................................................................................. 5
CDM Installation................................................................................................................. 5
CDM Program Execution.................................................................................................... 5
CDM Input File................................................................................................................ 7
*Qmatrix Section Syntax............................................................................................. 8
*DataFile Section Syntax.............................................................................................8
*CDM Section Syntax................................................................................................. 9
The DINA Model .................................................................................................. 10
The DINO Model .................................................................................................. 10
The NIDA Model .................................................................................................. 11
The NIDO Model .................................................................................................. 11
The Conjunctive RUM...........................................................................................12
The Compensatory RUM....................................................................................... 12
*Program Section Syntax...........................................................................................13
CDM Program Output........................................................................................................15
CDM_output.csv............................................................................................................ 15
CDMexam_out.csv........................................................................................................ 15
References.......................................................................................................................... 16
5
CDM – Cognitive Diagnosis Modeling Using Mplus
The CDM program is a freely available piece of software to assist users in
estimation of cognitive diagnosis models with Mplus. The program creates Mplus input
files (script of Mplus code), runs the Mplus program from the command prompt, and
culls the appropriate information from the Mplus output files, thereby reducing the
amount of time needed to estimate common models for cognitive diagnosis with the
Mplus package. Because the CDM program does not estimate any models directly, but
provides the script to do such estimation with Mplus, therefore having a licensed,
registered copy of Mplus is a prerequisite for use of the CDM program.
Computer Requirements
To use the CDM program, the user must:
•
•
Have a licensed and registered copy of Mplus on the same machine as the CDM
program.
Be using the Microsoft Windows XP (any version) operating system.
CDM Installation
Special installation of the CDM program is not necessary, simply put a copy of
the executable into the folder where you wish the analysis files to be placed.
CDM Program Execution
The CDM program is a stand-alone (binary) executable that can be run either
from the DOS prompt or by double clicking the program in a Microsoft Windows
explorer window. The program requires a single, external, text-based input file (as
described in the CDM Input File section of this user guide).
To open a DOS prompt in Windows XP:
1. Go to the Run menu (commonly found under the Windows XP Start Menu).
2. Type the word ‘cmd’ in the Run menu dialog box and click OK (as shown in Figure
1).
6
Figure 1: Windows XP Run Dialog Box
Once the DOS Prompt Appears, change the directory of the prompt to the
directory where the CDM program (and subsequent analysis files) should be output. To
do this:
1. Type “cd path_to_CDM” at the DOS prompt (without quotes, where path_to_CDM is
the path to the CDM program). As shown in Figure 2 below, the path is
C:\Documents and Settings\Jonathan Templin\Desktop\CDM Example:
Figure 2: Changing Directories in the DOS Prompt
To run the CDM program from the DOS prompt:
1. Type “cdm0.0 input_file_name” at the DOS prompt (without quotes, where
input_file_name is the name of the CDM input file).
7
CDM Input File
The CDM program uses an text-based input file to direct the subsequent script
creation and analyses performed in Mplus. The full syntax of the input file is listed in
Table 1. The optional input is listed in brackets (with the program defaults being
displayed within the brackets).
Table 1: CDM Input File Syntax
*Qmatrix
file = Qmatrix File Name
attributes = Number of Attributes
[format = *]
*DataFile
file = Data File Name
items = Number of Items
[format = *]
*CDM
model = Model name
[startmethod = random]
[randomseed = 0]
*Program
mpluspath = Path to Mplus
[analysis = all]
[inputfile = cdm_DATE_TIME.imp]
[itemfile = cdm_DATE_TIME_out.dat]
[examfile = cdm_DATE_TIME_exam.dat]
[covfile = cdm_DATE_TIME_cov.dat]
A total of four sections are required to run the CDM program: Qmatrix, DataFile,
CDM, Program. The beginning of each section is denoted by an asterisk prior to each of
these words. The details of each section are provided below. A set of example files are
provided for analyses with several of the cognitive diagnosis models.
The general syntax for any option is the name of the option (case sensitive),
followed by an equals sign, followed by the option setting.
8
*Qmatrix Section Syntax
The Qmatrix section of the CDM input file provides information on the Q-matrix
used for the analysis. Two options are required in this section: the Q-matrix file name
and the number of attributes. An optional input is the format of the Q-matrix file.
file = Qmatrix File Name
The file option provides the Q-matrix file name. The Q-matrix file name can include a
path to the Q-matrix file (if not in the current folder), but as a default, the CDM program
will check the current folder for the Q-matrix file.
The type of file name must follow typical Windows conventions, without having any of
the following characters: \, /, :, *, ?, “, <, >, or |. The Q-matrix file must be text-based,
and each entry must be either a one or a zero.
The Q-matrix file must have the attributes (skills) listed in the columns, with each row
representing the indicators of the attributes needed for each item.
attributes = Number of Attributes
The attributes option provides the number attributes (or columns in the Q-matrix). This
value must be numeric. Currently the maximum number of attributes is bounded only by
the system memory of the machine on which the CDM program is executed.
[format = *]
[OPTIONAL] The format option allows the user to specify format of the Q-matrix file
input. If not specified, the CDM program will expect the Q-matrix to be free-formatted.
Free-formatted files have spaces, commas, or tabs separating the data in each column.
The format statement takes the form of general FORTRAN format statements. For more
information on FORTRAN format statements please refer to other sources.
One common type of format is to not have any spaces between the columns of the Qmatrix. In this case (with four attributes), one could specify the input format by using the
following format statement: [format = 4i1].
*DataFile Section Syntax
The DataFile section of the CDM input file provides information on the data file
used for the analysis. Two options are required in this section: the data file name and the
number of item. An optional input is the format of the data file.
9
file = Data File Name
The file option provides the data file name. The data file name can include a path to the
data file (if not in the current folder), but as a default, the CDM program will check the
current folder for the data file.
The type of file name must follow typical Windows conventions, without having any of
the following characters: \, /, :, *, ?, “, <, >, or |. The data file must be text-based, and
each entry must be either a one or a zero.
The data file must have the items listed in the columns, with each row representing an
examinee’s set of responses for each item.
items = Number of Items
The items option provides the number items (or columns in the data file). This value
must be numeric. Currently the maximum number of items is bounded only by the
system memory of the machine on which the CDM program is executed.
[format = *]
[OPTIONAL] The format option allows the user to specify format of the data file input.
If not specified, the CDM program will expect the data file to be free-formatted. Freeformatted files have spaces, commas, or tabs separating the data in each column.
The format statement takes the form of general FORTRAN format statements. For more
information on FORTRAN format statements please refer to other sources.
One common type of format is to not have any spaces between the columns (items) of the
data file. In this case (with forty items), one could specify the input format by using the
following format statement: [format = 40i1].
*CDM Section Syntax
The CDM section of the CDM input file provides information on the type of
cognitive diagnosis model to run, along with the method for finding starting values. One
option is required – the name of the cognitive diagnosis model.
model = Model name
The model option specifies the name of the cognitive diagnosis model. Currently, six
models for cognitive diagnosis are able to be estimated by the program (as described
subsequently), and the model name must be specified exactly as shown in the user guide.
10
Types of Model Name:
The DINA Model
(Macready and Dayton, 1977; Haertel, 1989; Junker and Sjitsma, 1999)
model = DINA
The DINA model (“Diagnostic Inputs, Noisy And” gate) is a conjunctive model for
cognitive diagnosis. The model specifies two possible response probabilities per item,
one representing examinees who have mastered all Q-matrix specified attributes for the
item (called the “slip” parameter) and one representing examinees who are lacking
mastery of one or more of the Q-matrix specified attributes for the item (called the
“guess” parameter). To separate examinees into these two groups, the model uses the
following dichotomous latent variable:
ξ ij =
q jk
K
∏
α ik
.
(1)
k= 1
Here, the qjk is the Q-matrix entry for the jth item and the kth attribute, and αik is the binary
indicator that examinee i has mastered attribute k. The latent variable, ξ ij , can take two
values: one (for an examinee who has mastered all “needed” attributes for the item) and
zero (for an examinee who has failed to master at least one of the “needed” attributes for
the item). Conditional on ξ ij , the DINA model specifies the probability of a correct
response to item j as:
P ( X ij = 1 | ξ ij ) = (1 − s j ) ij g j
ξ
(1− ξ ij )
.
(2)
The DINO Model
(Templin and Henson, 2006)
model = DINO
The DINO model (“Diagnostic Inputs, Noisy Or” gate) is the compensatory analog of the
DINA model for cognitive diagnosis. The model specifies two possible response
probabilities per item, one representing examinees who have mastered one or more Qmatrix specified attributes for the item (called the “slip” parameter) and one representing
examinees who are lacking mastery of every of the Q-matrix specified attributes for the
item (called the “guess” parameter). To separate examinees into these two groups, the
model uses the following dichotomous latent variable:
11
ω ij = 1 −
K
∏
(1 - αik )
q jk
.
(3)
k=1
Here, the qjk is the Q-matrix entry for the jth item and the kth attribute, and αik is the binary
indicator that examinee i has mastered attribute k. The latent variable, ω ij , can take two
values: one (for an examinee who has mastered at least one “needed” attributes for the
item) and zero (for an examinee who has failed to master each of the “needed” attributes
for the item). Conditional on ω ij , the DINO model specifies the probability of a correct
response to item j as:
P ( X ij = 1 | ω ij ) = (1 − s j )
ϖ
ij
gj
(1− ϖ
ij
)
.
(4)
The NIDA Model
(Maris, 1999)
model = NIDA
The NIDA model (“Noisy Inputs, Deterministic And” gate) is a conjunctive model for
cognitive diagnosis. The model specifies two parameters per attribute, one representing
examinees who have mastered the attribute (called the “slip” parameter) and one
representing examinees who are lacking mastery the attribute (called the “guess”
parameter). The set of attribute parameters for the NIDA model are the same for each
item (item discrimination is equal for all items).
P( X ij = 1 | α ij ) =
∏ [(1 − s )
K
j
k=1
α
ij
gj
(1− α
ij
]
) q jk
.
(5)
The NIDO Model
(Introduced here)
model = NIDO
The NIDO model (“Noisy Inputs, Deterministic Or” gate) is the compensatory analog of
the NIDA model for cognitive diagnosis. The model specifies two parameters per
attribute, one representing examinees who have mastered the attribute (called the “beta”
parameter) and one representing examinees who are lacking mastery the attribute (called
the “tau” parameter). The set of attribute parameters for the NIDA model are the same
for each item (item discrimination is equal for all items).
12

 K
P ( X ij = 1 | α ij ) =  1 + exp ∑ (τ k + β kα
 k=1

−1

.
ij ) q jk  

(6)
The Conjunctive RUM
(Hartz, 2002)
model = RUM
The RUM (or Reparameterized Unified Model) is a conjunctive model for cognitive
diagnosis (without the continuous Rasch model term as included in Hartz, 2002). For an
item j, the model specifies the probability that an examinee who has mastered all Qmatrix specified attributes correctly responds to an item as πj*. For each attribute an
examinee lacks, the item response probability is reduced, and the parameter representing
this drop is called rjk*. The item response function is then:
P ( X ij = 1 | α ij ) = π
*
j
K
∏
k= 1
rjk*
(1− α ij ) q jk
.
(7)
The Compensatory RUM
(Hartz, 2002; von Davier and Yamamoto, 2004)
model = COMRUM
The Compensatory RUM is the compensatory analog to the RUM for cognitive
diagnosis. For an item j, the model specifies the probability that an examinee who has
failed to master all Q-matrix specified attributes correctly responds to an item as a
function of an intercept parameter, τj. For each attribute an examinee has mastered, the
item response probability is increased, and the parameter representing this increase is
called βjk. The item response function is then:


P( X ij = 1 | α ij ) =  1 + exp τ j +


−1

∑k = 1 ( β jkα ij q jk )   .

K
(8)
13
[startmethod = random]
[OPTIONAL] The startmethod option provides the user the opportunity to select between
two types of methods for selecting the starting values of the analysis. Due to the nature
of most cognitive diagnosis models, the starting values are very important, and must be
acceptable values for the model being used (parameter values ordered correctly).
Because of these specific demands, the CDM program specifies the starting values for
each analysis.
The default version of the startmethod is the random starting value method. The random
starting method randomly generates starting values for each parameter. This method can
lead to very long estimation times, but can also allow the user to find locally optimal
results with repeated runs of the program.
The alternative startmethod option is called “subscore” (without quotes). The subscore
method first calculates the subscore for each examinee and each attribute (the number of
indicated items correctly answered for each attribute). For each attribute, if the examinee
is above the mean, the examinee is considered a master. After each examinee’s subscorebased attribute pattern has been found, the program computes the initial estimates of the
cognitive diagnosis model item parameters. The subscore method will allow for
generally faster estimation times, but will lead to the same result for repeated runs (so
local optima will not be detected).
[randomseed = 0]
[OPTIONAL] The randomseed option specifies the random seed to be used if the
startmethod used is random. The default value for the random seed is zero – meaning the
seed is taken from the system clock at the time of the program’s execution. The input can
be any integer value.
*Program Section Syntax
The Program section of the CDM input file provides information on the location
of the Mplus program, along with the names for the Mplus input and output files used for
the analysis. One option is required in this section: the path to the Mplus program.
mpluspath = Path to Mplus
The mplus path option is the place where the path to the Mplus executable is found. The
Mplus executable is installed (by default) in the path C:\Program Files\Mplus\Mplus.exe.
If you have modified the location of the program, provide the path here.
[analysis = all]
14
[OPTIONAL] The analysis option tells the CDM program the analysis sequence that is to
be used. The following options are possible:
•
All (default) – Creates an Mplus input file, runs Mplus to estimate the cognitive
diagnosis file, and extracts the cognitive diagnosis model information from the
Mplus output.
•
File – Creates the Mplus input file only.
•
Estimate – Creates the Mplus input file and estimates the cognitive diagnosis
model using Mplus.
•
Extract – Takes output from Mplus and extracts the cognitive diagnosis model
estimates only.
[inputfile = cdm_DATE_TIME.imp]
[OPTIONAL] The inputfile option allows the user to specify the name of the input file
used in the Mplus analysis. The default name for this file is generated by the CDM
program and contains the date and time of the CDM program execution.
[itemfile = cdm_DATE_TIME_out.dat]
[OPTIONAL] The itemfile option allows the user to specify the name of the file with
item parameter estimates output by Mplus. The default name for this file is generated by
the CDM program and contains the date and time of the CDM program execution.
[examfile = cdm_DATE_TIME_exam.dat]
[OPTIONAL] The examfile option allows the user to specify the name of the file with
examinee attribute pattern estimates output by Mplus. The default name for this file is
generated by the CDM program and contains the date and time of the CDM program
execution.
[covfile = cdm_DATE_TIME_cov.dat]
[OPTIONAL] The covfile option allows the user to specify the name of the file with item
parameter covariance matrix estimates output by Mplus. The default name for this file is
generated by the CDM program and contains the date and time of the CDM program
execution.
15
CDM Program Output
Upon successfully running the program, several output files are created. These
output files contain the cognitive diagnosis model parameter estimates along with the
estimates for the examinee parameters (posterior probability of marginal attribute mastery
and posterior probability of having each attribute pattern).
CDM_output.csv
The cdm_output.csv file contains the cognitive diagnosis model parameter
estimates and model fit statistics.
CDMexam_out.csv
The cdm_output.csv file contains the examinee estimates from the cognitive
diagnosis model run.
16
References
Haertel, E. (1989). Using restricted latent class models to map the skill structure
of achievement items. Journal of Educational Measurement, 26, 333-352.
Hartz, S. (2002). A Bayesian framework for the Unified Model for assessing
cognitive abilities: Blending theory with practice. Unpublished doctoral dissertation,
University of Illinois at Urbana-Champaign.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few
assumptions and connections with nonparametric item response theory. Applied
Psychological Measurement, 25, 258-272.
Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the
assessment of mastery. Journal of Educational Statistics, 2, 99-120.
Maris, E. (1999). Estimating multiple classification latent class models.
Psychometrika, 64, 197-212.
Templin, J., & Henson, R. (2006). Measurement of psychological disorders using
cognitive diagnosis models. Manuscript under review.
von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT
models: An extension of the generalized partial-credit model. Applied Psychological
Measurement, 28, 389-406.