1. Introduction The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models to allow the mean of the data to depend on a linear predictor through a link function, and to allow the response probability distribution to be any member of the exponential family of distributions. 2. Syntax PROC GENMOD options; Programming Statements; BY variable-list; CLASS variable-list; CONTRAST 'label' effect values / options; DEVIANCE variable= expression; FREQ variable; FWDLINK variable= expression; INVLINK variable= expression; MAKE 'table' OUT= SAS-data-set; MODEL response= effects / options; REPEATED SUBJECT=subject-effect / options; VARIANCE variable= expression; SCWGT variable; PROC GENMOD invokes the procedure. All statements other than the MODEL statement are optional. The CLASS statement, if present, must precede the MODEL statement, and CONTRAST statements must follow the MODEL statement. 3. Example The following code demonstrates how to use programming statements to define the log link function for the Poisson distribution. Note that, in this case, the code explicitly defines the same link and inverse functions you get as the defaults for the Poisson distribution. proc genmod data=claims; a = _MEAN_; class distance car age; if a > 0 then lnk = log(a); else lnk = 0; ilnk = exp(_XBETA_); fwdlink link=lnk; invlink ilink = ilnk; model c= distance car /dist=poisson offset=ln type1 type3; contrast 'TEST' distance 1 -1 0 0, distance 1 0 -1 0, distance 1 0 0 -1; run ; 4. Details a) CLASS variable-list; The CLASS statement names the classification variables to be used in the analysis. If the CLASS statement is used, it must appear before the MODEL statement. Classification variables can be either character or numeric. If character variables are used, they will be truncated to 16 characters. By default, class levels are determined from the formatted values of the CLASS variables . Different sort orders for CLASS variables may be requested with the ORDER= option in the PROC GENMOD statement. b) CONTRAST 'label' effect values ... effect values / options ; The CONTRAST statement constructs and performs likelihood ratio tests or Wald statistics for specified hypotheses concerning the model parameters. Likelihood ratio statistics are computed by default. There is no limit to the number of CONTRAST statements, but they must appear after a MODEL statement. label is twenty characters or less and is used on the printout to identify the contrast. effect is the name of an effect that appears in the MODEL statement. values are constants that are elements of the L vector associated with the effect. c) DEVIANCE variable= expression; You can specify a probability distribution other than the built-in distributions by using the DEVIANCE and VARIANCE statements. The defined variable identifies the deviance function to the procedure. The expression may be any arithmetic expression supported by the DATA step language and is used to define the dependence on the mean and the response. It may include variables defined in programming statements. The VARIANCE statement must appear when the DEVIANCE statement is used to define the deviance function. You can use the automatic variables _MEAN_ and _RESP_ to represent the mean and the response in the expression. d) FREQ variable; The variable in the FREQ statement identifies a variable in the input data set containing the frequency of occurrence of each observation. PROC GENMOD treats each observation as if it appears n times, where n is the value of the FREQ variable for the observation. If it is not an integer, the frequency value is truncated to an integer. If it is less than one or missing, the observation is not used. e) FWDLINK variable= expression; You can specify a link function other than the built-in link functions in the FWDLINK statement. The defined variable identifies the link function to the procedure. The expression may be any arithmetic expression supported by the DATA step language and is used to define the functional dependence on the mean. It may include variables defined in programming statements. Derivatives of the link function required for iterative fitting are automatically computed by the procedure. The inverse of the link function must be specified in the INVLINK statement when the FWDLINK statement is used to define the link function. You can use the automatic variable _MEAN_ to represent the mean in the expression. f) INVLINK variable= expression; If a link function is specified in the FWDLINK statement, then the corresponding inverse link function must be specified in the INVLINK statement. The defined variable identifies the inverse link function to the procedure. The expression may be any arithmetic expression supported by the DATA step language, and is used to define the functional dependence on the linear predictor. The expression may include variables defined in programming statements. You can use the automatic variable _XBETA_ to represent the linear predictor in the expression above. g) MAKE 'table' OUT= SAS-data-set <options> ; The MAKE statement is used to convert any table produced in PROC GENMOD to a SAS data set. You can customize several of the tables by using the TEMPLATE procedure prior to running PROC GENMOD. You can manage all of the tables by using the OUTPUT procedure after running PROC GENMOD. The following table names can be used; either upper or lower case is accepted. CLASSLEVELS COEFFICIENTS# CONTRAST CORR COV GEEEMPPEST GEEMODPEST GEEMODINFO GEENCOV GEERCOV GEEOBSTATS GEEWCORR ITERCONT ITERLRCI ITERPARMS ITERPARMSGEE ITERTYPE3 LAGRANGE LASTGEEGRAD LASTGRAD LASTHESS LRCI MODFIT MODINFO OBSTATS PARMEST PARMINFO TYPE1 TYPE3LR TYPE3W WALD WALDCI h) MODEL response= effects / options ; The MODEL statement specifies the response, or dependent variable and the explanatory, or independent variables. The response may be specified in the form of a single variable or in the form of events/trials for a binomial response. The specification of effects is the same as for the GLM procedure. See the documentation for PROC GLM for more information. The MODEL statement is required. An intercept term is included in the model by default. If no other effects are specified, only an intercept term is fitted. The intercept can be removed with the NOINT option. The following options can be specified in the MODEL statement after a slash: ALPHA= CONVERGE= CORRB COVB DIST= DSCALE= INITIAL= INTERCEPT= ITPRINT LINK= LRCI MAXIT= NOINT NOSCALE OFFSET= OBSTATS RESIDUALS SCALE= SCORING= SINGULAR= TYPE1 TYPE3 WALD WALDCI i) REPEATED SUBJECT=subject-effect < / options > ; The REPEATED statement is used to specify the covariance structure of multivariate responses for generalized estimating equation (GEE) model fitting in the GENMOD procedure. In addition, the REPEATED statement controls the iterative fitting algorithm used in GEEs and specifies optional output. The subject-effect identifies subjects in the input data set. It can be a single variable, and interaction effect, a nested effect, or combinations. Each distinct level of the effect identifies a different subject. Responses from different subjects are assumed to be statistically independent, and responses within subjects are assumed to be correlated. A subject-effect must be specified, and variables used in defining the subject-effect must be listed in the CLASS statement. The options control the fitting of the model and the output that is produced. The following options can be specified after a slash (/): CONVERGE= CORRW COVB INTERCEPT= INITIAL= MAXIT= MODELSE OBSTATS SORTED TYPE= WITHINSUBJECT= i) VARIANCE variable= expression; You can specify a probability distribution other than the built-in distributions by using the DEVIANCE and VARIANCE statements. The defined variable identifies the variance function to the procedure. The expression may be any arithmetic expression supported by the DATA step language and is used to define the functional dependence on the mean. You can use the automatic variable _MEAN_ to represent the mean in the expression. The DEVIANCE statement must appear when the VARIANCE statement is used to define the variance function. j) SCWGT variable; The SCWGT statement identifies a variable in the input data set to use as the exponential distribution scale parameter weight. The exponential distribution scale parameter is divided by the SCWGT variable value for each observation. This is done whether the scale parameter is estimated by the procedure or specified by the SCALE= option in the the MODEL statement. For the Poisson and binomial distributions, which are not usually defined to have a scale parameter, the SCWGT variable weights the log likelihood. The SCWGT variable need not be an integer, and if it is less than or equal to 0 or missing, the corresponding observation is not used.