SAS Proc CatMod 1. Introduction CATMOD provides a wide variety of categorical data analyses. Many of these are generalizations of continuous data analysis methods. For example, analysis of variance, in the traditional sense, refers to the analysis of means and the partitioning of variation among the means into various sources. Here, the term "analysis of variance" is used in a generalized form to denote the analysis of response functions and the partitioning of variation among those functions into various sources. The response functions might be mean scores if the dependent variables are ordinally scaled. But they can also be marginal probabilities, cumulative logits, or other functions that incorporate the essential information from the dependent variables. 2. Syntax PROC CATMOD DATA= SAS-data-set ORDER= DATA; DIRECT variable-list; MODEL response_effect= design_effects / options; /* required */ CONTRAST 'label' row_description, row_description,...; BY variable-list; FACTORS factor_description,... / options; LOGLIN effects / option; POPULATION variable-list; REPEATED factor_description,... / options; RESPONSE function / options; RESTRICT parameter=value<...parameter=value>; WEIGHT variable; 3. Details. a) ORDER= DATA The ORDER= DATA option specifies that variable levels are to be ordered according to the sequence in which they appear in the input stream. If ORDER= DATA is not specified, then the variable levels are ordered according to their internal sorting sequence (for example, numeric order or alphabetical order). b) DIRECT variable-list; The DIRECT statement lists numeric variables to be treated in a quantitative, rather than qualitative, way. If used, the DIRECT statement must precede the MODEL statement. c) MODEL response_effect= design_effects / options; The MODEL statement specifies dependent and independent variables and model effects. response_effect indicates the dependent variables that determine the response categories (the columns of the underlying contingency table). Response_effect is either a single variable or a crossed effect having two or more variables joined by asterisks. design_effects specify potential sources of variation (such as main effects and interactions) to be included in the model. The following options can be specified in the MODEL statement after a slash (/): CORRB COV COVB FREQ ML ONEWAY PREDICT|PRED= PROB TITLE= XPX NODESIGN MAXITER= NOINT NOITER NOPARM NOPROFILE NORESPONSE ADDCELL= AVERAGED EPSILON= WLS|GLS d) CONTRAST 'label' row_description, row_description,... ; CONTRAST statements, must be preceded by the MODEL statement, and by the LOGLIN statement, if one is used. The CONTRAST statement constructs and tests linear functions of the parameters in the MODEL statement or effects listed in the LOGLIN statement. The 'label' term, which is required, specifies up to twenty-four characters of identifying information. Each row_description specifies one row of the matrix that CATMOD uses to test the hypothesis CB=0. Row_descriptions are separated by a comma. e) FACTORS factor_description,... / options; The FACTORS statement identifies factors that distinguish response functions from others in the same population. It also specifies how those factors are incorporated into the model. It can be used whenever there is more than one response function per population and the keyword _RESPONSE_ is used in the MODEL statement. The following three options can be specified in the FACTORS statement after the slash (/): PROFILE= (matrix) f) _RESPONSE_= effects TITLE= 'title' LOGLIN effects / option; The LOGLIN statement is used to define log-linear model effects. It can be used whenever the response functions are the standard ones (generalized logits). The effects term specifies design effects that contain dependent variables in the MODEL statement. When the LOGLIN statement is used, the keyword _RESPONSE_ should be specified in the MODEL statement. The LOGLIN statement cannot be specified for an analysis that also contains the REPEATED or FACTORS statement. The following option can be specified in the LOGLIN statement after the slash (/): TITLE= 'title' g) POPULATION variable-list; The POPULATION statement specifies that populations are to be formed on the basis of crossclassifications of the specified variables. If you do not specify the POPULATION statement, then populations are formed on the basis of cross-classifications of the independent variables in the MODEL statement. h) REPEATED factor_description,... / options; The REPEATED statement is used to incorporate repeated measurement factors into the model. It can be used whenever there is more than one dependent variable and the keyword _RESPONSE_ is used in the MODEL statement. The REPEATED statement cannot be specified for an analysis that also contains the FACTORS or LOGLIN statement. The following three options can be specified in the REPEATED statement after the slash (/): PROFILE= (matrix) i) _RESPONSE_= effects TITLE= 'title' RESPONSE function / options; The RESPONSE statement specifies functions of the response probabilities. The procedure models these response functions as linear combinations of the parameters. If no RESPONSE statement is specified, CATMOD uses the default standard response functions ( generalized logits ). More than one RESPONSE statement can be specified, in which case each RESPONSE statement produces a separate analysis. The following three options can be specified in the RESPONSE statement after a slash (/): OUT= SAS-data-set OUTEST= SAS-data-set j) RESTRICT parameter=value<...parameter=value>; TITLE= 'title' The RESTRICT statement restricts the values of parameters to the values you specify, so that the estimation of the remaining parameters is subject to these restrictions. The terms used in the RESTRICT statement are as follows: parameter is the letter B followed by a number that denotes the order of the variable in the model (for example, B1 is the first parameter). value is the value to which the parameter is restricted k) WEIGHT variable; A WEIGHT statement can be used to refer to a variable containing the cell frequencies, which need not be integers. The WEIGHT statement lets you use summary data sets containing a count variable.