Brief summary of Proc probit in SAS

advertisement
SAS Proc Probit
1.
Introduction
The PROBIT procedure calculates maximum-likelihood estimates of regression and threshold parameters
for binomial and multinomial biological assay data as well as other discrete event data.
The maximum-likelihood estimates are calculated for the parameters b and 0<=C<1 in the probit equation
p=C+(1-C)F(x'b) where F is a cumulative distribution function (the normal, logistic, or Gompertz)
x is a vector of independent variables p is the probability of a response. The data set used by PROC
PROBIT must include either a response variable giving the level of response for each observation or a pair
of variables giving the number of subjects tested and the number of subjects responding for each vector of
the independent variable values.
2.
Syntax
PROC PROBIT options;
CLASS variable-list;
MODEL response = variable-list / options;
WEIGHT variable;
BY variable-list;
OUTPUT <OUT= SAS-data-set> options;
3.
/* required */
Details
a) Options that can be used in the PROC PROBIT statement:
DATA= SAS-data-set
C= rate
COVOUT
HPROB= p
INVERSECL
LACKFIT
LOG|LN
LOG10
NOPRINT
OUTEST= SAS-data-set
OPTC
ORDER= INTERNAL|FREQ|DATA|FORMATTED
b) CLASS variable-list;
This statement names the classification variables to be used in the analysis. Classification variables can be
either character or numeric. If a single response variable is given in the MODEL statement, it must also
be listed in a CLASS statement.
c) <label>:MODEL response = variable-list/options;
The MODEL statement names the variables to be used as the response and the independent variables.
Additionally the distribution to be used to model the response can be specified as well as other options.
The INVERSECL, LACKFIT, and HPROB options on the PROC statement may also be specified as
options on individual MODEL statements.
Options that can be used in the MODEL statement:
CORRB
prints the estimate of the correlation matrix.
COVB
prints the estimated covariance matrix.
ITPRINT
prints the iteration history and the final values of the gradient and hessian of the log likelihood.
INTERCEPT=value
specifies initial value for intercept. Default: 0 .
INITIAL=value
sets initial estimates for other model parameters.
CONVERGE=value
specifies the convergence criterion. Default: 1E-3 .
SINGULAR=value
specifies the value used to determine linear dependencies among the independent variables. Default: 1E12 .
MAXIT=number
sets maximum number of iterations to attempt while estimating the parameters. Default: 50 .
NOINT
suppresses the intercept parameter from the model.
DISTRIBUTION | DIST | D=
NORMAL | LOGISTIC | GOMPERTZ determines the distribution to use in modeling the response
probabilities. Default: NORMAL
d) WEIGHT variable;
The WEIGHT statement can be used with PROC PROBIT to weight each observation by the value of the
variable specified. The contribution of each observation to the likelihood function is multiplied by the
value of the weight variable. Observations with zero, negative, or missing weights are not used in model
estimation.
e) BY <DESCENDING> variables ... <NOTSORTED>;
A BY statement is used with a procedure to obtain separate analyses on observations in groups defined by
the BY variables. The data set being processed need not have been previously sorted by the SORT
procedure. However, the data set must be in the same order as though PROC SORT had sorted it unless
NOTSORTED is specified. If you have used a FORMAT or ATTRIB statement to group a continuous
variable into discrete groups, the BY statement creates BY groups based on the formatted values.
You can also ensure that variables are processed in ascending order by creating an index for one or more
variables in the SAS data set. The usages of the BY statement differ in each procedure. Please refer to
the Users' Guide for the details.
f) OUTPUT <OUT= SAS-data-set> options;
The OUTPUT statement requests the creation of a SAS data set that contains the variables in the input data
set, the fitted probabilities, the estimates of x'b and the estimates of its standard error.
The following options may be specified in the OUTPUT statement.
OUT= SAS-data-set names the output SAS data set being created. If no name is given, then the DATAn
naming convention is used.
PROB | P= name requests that a variable containing the probability estimates be added to the output data
set and be given this name.
XBETA= name requests that a variable containing the estimates of x'b be added to the output data set and
be given this name.
STD= name requests that a variable containing the standard error estimates of x'b be added to the output
Download