SAS Proc CatMod

advertisement
SAS Proc CatMod
1.
Introduction
CATMOD provides a wide variety of categorical data analyses. Many of these are generalizations of
continuous data analysis methods. For example, analysis of variance, in the traditional sense, refers to the
analysis of means and the partitioning of variation among the means into various sources. Here, the term
"analysis of variance" is used in a generalized form to denote the analysis of response functions and the
partitioning of variation among those functions into various sources. The response functions might be
mean scores if the dependent variables are ordinally scaled. But they can also be marginal probabilities,
cumulative logits, or other functions that incorporate the essential information from the dependent
variables.
2.
Syntax
PROC CATMOD DATA= SAS-data-set
ORDER= DATA;
DIRECT variable-list;
MODEL response_effect= design_effects / options; /* required */
CONTRAST 'label' row_description, row_description,...;
BY variable-list;
FACTORS factor_description,... / options;
LOGLIN effects / option;
POPULATION variable-list;
REPEATED factor_description,... / options;
RESPONSE function / options;
RESTRICT parameter=value<...parameter=value>;
WEIGHT variable;
3.
Details.
a)
ORDER= DATA
The ORDER= DATA option specifies that variable levels are to be ordered according to the sequence
in which they appear in the input stream. If ORDER= DATA is not specified, then the variable levels
are ordered according to their internal sorting sequence (for example, numeric order or alphabetical
order).
b) DIRECT variable-list;
The DIRECT statement lists numeric variables to be treated in a quantitative, rather than qualitative,
way. If used, the DIRECT statement must precede the MODEL statement.
c) MODEL response_effect= design_effects / options;
The MODEL statement specifies dependent and independent variables and model effects.
response_effect indicates the dependent variables that determine the response categories (the columns
of the underlying contingency table). Response_effect is either a single variable or a crossed effect
having two or more variables joined by asterisks. design_effects specify potential sources of
variation (such as main effects and interactions) to be included in the model.
The following options can be specified in the MODEL statement after a slash (/):
CORRB COV COVB FREQ ML
ONEWAY PREDICT|PRED= PROB TITLE=
XPX
NODESIGN MAXITER= NOINT
NOITER NOPARM NOPROFILE
NORESPONSE ADDCELL=
AVERAGED EPSILON=
WLS|GLS
d) CONTRAST 'label' row_description, row_description,... ;
CONTRAST statements, must be preceded by the MODEL statement, and by the LOGLIN statement, if
one is used. The CONTRAST statement constructs and tests linear functions of the parameters in the
MODEL statement or effects listed in the LOGLIN statement. The 'label' term, which is required, specifies
up to twenty-four characters of identifying information. Each row_description specifies one row of the
matrix that CATMOD uses to test the hypothesis CB=0. Row_descriptions are separated by a comma.
e)
FACTORS factor_description,... / options;
The FACTORS statement identifies factors that distinguish response functions from others in the same
population. It also specifies how those factors are incorporated into the model.
It can be used whenever there is more than one response function per population and the keyword
_RESPONSE_ is used in the MODEL statement. The following three options can be specified in the
FACTORS statement after the slash (/):
PROFILE= (matrix)
f)
_RESPONSE_= effects
TITLE= 'title'
LOGLIN effects / option;
The LOGLIN statement is used to define log-linear model effects. It can be used whenever the response
functions are the standard ones (generalized logits). The effects term specifies design effects that contain
dependent variables in the MODEL statement. When the LOGLIN statement is used, the keyword
_RESPONSE_ should be specified in the MODEL statement. The LOGLIN statement cannot be specified
for an analysis that also contains the REPEATED or FACTORS statement. The following option can be
specified in the LOGLIN statement after the slash (/): TITLE= 'title'
g) POPULATION variable-list;
The POPULATION statement specifies that populations are to be formed on the basis of crossclassifications of the specified variables. If you do not specify the POPULATION statement, then
populations are formed on the basis of cross-classifications of the independent variables in the MODEL
statement.
h) REPEATED factor_description,... / options;
The REPEATED statement is used to incorporate repeated measurement factors into the model.
It can be used whenever there is more than one dependent variable and the keyword _RESPONSE_ is used
in the MODEL statement. The REPEATED statement cannot be specified for an analysis that also
contains the FACTORS or LOGLIN statement. The following three options can be specified in the
REPEATED statement after the slash (/):
PROFILE= (matrix)
i)
_RESPONSE_= effects
TITLE= 'title'
RESPONSE function / options;
The RESPONSE statement specifies functions of the response probabilities. The procedure models these
response functions as linear combinations of the parameters. If no RESPONSE statement is specified,
CATMOD uses the default standard response functions ( generalized logits ). More than one RESPONSE
statement can be specified, in which case each RESPONSE statement produces a separate analysis.
The following three options can be specified in the RESPONSE statement after a slash (/):
OUT= SAS-data-set
OUTEST= SAS-data-set
j) RESTRICT parameter=value<...parameter=value>;
TITLE= 'title'
The RESTRICT statement restricts the values of parameters to the values you specify, so that the
estimation of the remaining parameters is subject to these restrictions. The terms used in the RESTRICT
statement are as follows: parameter is the letter B followed by a number that denotes the order of the
variable in the model (for example, B1 is the first parameter). value is the value to which the parameter is
restricted
k) WEIGHT variable;
A WEIGHT statement can be used to refer to a variable containing the cell frequencies, which need not be
integers. The WEIGHT statement lets you use summary data sets containing a count variable.
Download