Brief summary for Proc genmod in SAS

advertisement
1.
Introduction
The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972).
The class of generalized linear models is an extension of traditional linear models to allow the mean of the
data to depend on a linear predictor through a link function, and to allow the response probability
distribution to be any member of the exponential family of distributions.
2.
Syntax
PROC GENMOD options;
Programming Statements;
BY variable-list;
CLASS variable-list;
CONTRAST 'label' effect values / options;
DEVIANCE variable= expression;
FREQ variable;
FWDLINK variable= expression;
INVLINK variable= expression;
MAKE 'table' OUT= SAS-data-set;
MODEL response= effects / options;
REPEATED SUBJECT=subject-effect / options;
VARIANCE variable= expression;
SCWGT variable;
PROC GENMOD invokes the procedure. All statements other than the MODEL statement are optional.
The CLASS statement, if present, must precede the MODEL statement, and CONTRAST statements must
follow the MODEL statement.
3.
Example
The following code demonstrates how to use programming statements to define the log link function for
the Poisson distribution. Note that, in this case, the code explicitly defines the same link and inverse
functions you get as the defaults for the Poisson distribution.
proc genmod data=claims;
a = _MEAN_;
class distance car age;
if a > 0 then lnk = log(a); else lnk = 0;
ilnk = exp(_XBETA_);
fwdlink link=lnk;
invlink ilink = ilnk;
model c= distance car /dist=poisson offset=ln type1 type3;
contrast 'TEST' distance 1 -1 0 0, distance 1 0 -1 0,
distance 1 0 0 -1; run ;
4.
Details
a)
CLASS variable-list;
The CLASS statement names the classification variables to be used in the analysis. If the CLASS
statement is used, it must appear before the MODEL statement. Classification variables can be either
character or numeric. If character variables are used, they will be truncated to 16 characters. By default,
class levels are determined from the formatted values of the CLASS variables . Different sort orders for
CLASS variables may be requested with the ORDER= option in the PROC GENMOD statement.
b) CONTRAST 'label' effect values ... effect values / options ;
The CONTRAST statement constructs and performs likelihood ratio tests or Wald statistics for specified
hypotheses concerning the model parameters. Likelihood ratio statistics are computed by default. There
is no limit to the number of CONTRAST statements, but they must appear after a MODEL statement.
label is twenty characters or less and is used on the printout to identify the contrast.
effect is the name of an effect that appears in the MODEL statement.
values are constants that are elements of the L vector associated with the effect.
c)
DEVIANCE variable= expression;
You can specify a probability distribution other than the built-in distributions by using the DEVIANCE
and VARIANCE statements. The defined variable identifies the deviance function to the procedure. The
expression may be any arithmetic expression supported by the DATA step language and is used to define
the dependence on the mean and the response. It may include variables defined in programming
statements. The VARIANCE statement must appear when the DEVIANCE statement is used to define the
deviance function. You can use the automatic variables _MEAN_ and _RESP_ to represent the mean and
the response in the expression.
d) FREQ variable;
The variable in the FREQ statement identifies a variable in the input data set containing the frequency of
occurrence of each observation. PROC GENMOD treats each observation as if it appears n times, where
n is the value of the FREQ variable for the observation. If it is not an integer, the frequency value is
truncated to an integer. If it is less than one or missing, the observation is not used.
e)
FWDLINK variable= expression;
You can specify a link function other than the built-in link functions in the FWDLINK statement. The
defined variable identifies the link function to the procedure. The expression may be any arithmetic
expression supported by the DATA step language and is used to define the functional dependence on the
mean. It may include variables defined in programming statements. Derivatives of the link function
required for iterative fitting are automatically computed by the procedure. The inverse of the link function
must be specified in the INVLINK statement when the FWDLINK statement is used to define the link
function. You can use the automatic variable _MEAN_ to represent the mean in the expression.
f)
INVLINK variable= expression;
If a link function is specified in the FWDLINK statement, then the corresponding inverse link function
must be specified in the INVLINK statement. The defined variable identifies the inverse link function to
the procedure. The expression may be any arithmetic expression supported by the DATA step language,
and is used to define the functional dependence on the linear predictor. The expression may include
variables defined in programming statements. You can use the automatic variable _XBETA_ to represent
the linear predictor in the expression above.
g) MAKE 'table' OUT= SAS-data-set <options> ;
The MAKE statement is used to convert any table produced in PROC GENMOD to a SAS data set. You
can customize several of the tables by using the TEMPLATE procedure prior to running PROC
GENMOD. You can manage all of the tables by using the OUTPUT procedure after running PROC
GENMOD. The following table names can be used; either upper or lower case is accepted.
CLASSLEVELS COEFFICIENTS# CONTRAST CORR
COV
GEEEMPPEST GEEMODPEST GEEMODINFO GEENCOV
GEERCOV
GEEOBSTATS GEEWCORR
ITERCONT ITERLRCI ITERPARMS
ITERPARMSGEE ITERTYPE3
LAGRANGE LASTGEEGRAD LASTGRAD
LASTHESS
LRCI
MODFIT
MODINFO
OBSTATS
PARMEST
PARMINFO
TYPE1
TYPE3LR TYPE3W
WALD
WALDCI
h) MODEL response= effects / options ;
The MODEL statement specifies the response, or dependent variable and the explanatory, or independent
variables. The response may be specified in the form of a single variable or in the form of events/trials for
a binomial response. The specification of effects is the same as for the GLM procedure. See the
documentation for PROC GLM for more information. The MODEL statement is required.
An intercept term is included in the model by default. If no other effects are specified, only an intercept
term is fitted. The intercept can be removed with the NOINT option.
The following options can be specified in the MODEL statement after a slash:
ALPHA=
CONVERGE= CORRB
COVB
DIST=
DSCALE=
INITIAL= INTERCEPT= ITPRINT
LINK=
LRCI
MAXIT=
NOINT
NOSCALE
OFFSET=
OBSTATS
RESIDUALS SCALE=
SCORING= SINGULAR=
TYPE1
TYPE3
WALD
WALDCI
i) REPEATED SUBJECT=subject-effect < / options > ;
The REPEATED statement is used to specify the covariance structure of multivariate responses for
generalized estimating equation (GEE) model fitting in the GENMOD procedure. In addition, the
REPEATED statement controls the iterative fitting algorithm used in GEEs and specifies optional output.
The subject-effect identifies subjects in the input data set. It can be a single variable, and interaction
effect, a nested effect, or combinations. Each distinct level of the effect identifies a different subject.
Responses from different subjects are assumed to be statistically independent, and responses within
subjects are assumed to be correlated. A subject-effect must be specified, and variables used in defining
the subject-effect must be listed in the CLASS statement.
The options control the fitting of the model and the output that is produced. The following options can be
specified after a slash (/):
CONVERGE=
CORRW
COVB
INTERCEPT=
INITIAL=
MAXIT=
MODELSE
OBSTATS
SORTED
TYPE=
WITHINSUBJECT=
i)
VARIANCE variable= expression;
You can specify a probability distribution other than the built-in distributions by using the DEVIANCE
and VARIANCE statements. The defined variable identifies the variance function to the procedure. The
expression may be any arithmetic expression supported by the DATA step language and is used to define
the functional dependence on the mean. You can use the automatic variable _MEAN_ to represent the
mean in the expression. The DEVIANCE statement must appear when the VARIANCE statement is used
to define the variance function.
j)
SCWGT variable;
The SCWGT statement identifies a variable in the input data set to use as the exponential distribution scale
parameter weight. The exponential distribution scale parameter is divided by the SCWGT variable value
for each observation. This is done whether the scale parameter is estimated by the procedure or specified
by the SCALE= option in the the MODEL statement. For the Poisson and binomial distributions, which
are not usually defined to have a scale parameter, the SCWGT variable weights the log likelihood. The
SCWGT variable need not be an integer, and if it is less than or equal to 0 or missing, the corresponding
observation is not used.
Download