SAS Proc Probit 1. Introduction The PROBIT procedure calculates maximum-likelihood estimates of regression and threshold parameters for binomial and multinomial biological assay data as well as other discrete event data. The maximum-likelihood estimates are calculated for the parameters b and 0<=C<1 in the probit equation p=C+(1-C)F(x'b) where F is a cumulative distribution function (the normal, logistic, or Gompertz) x is a vector of independent variables p is the probability of a response. The data set used by PROC PROBIT must include either a response variable giving the level of response for each observation or a pair of variables giving the number of subjects tested and the number of subjects responding for each vector of the independent variable values. 2. Syntax PROC PROBIT options; CLASS variable-list; MODEL response = variable-list / options; WEIGHT variable; BY variable-list; OUTPUT <OUT= SAS-data-set> options; 3. /* required */ Details a) Options that can be used in the PROC PROBIT statement: DATA= SAS-data-set C= rate COVOUT HPROB= p INVERSECL LACKFIT LOG|LN LOG10 NOPRINT OUTEST= SAS-data-set OPTC ORDER= INTERNAL|FREQ|DATA|FORMATTED b) CLASS variable-list; This statement names the classification variables to be used in the analysis. Classification variables can be either character or numeric. If a single response variable is given in the MODEL statement, it must also be listed in a CLASS statement. c) <label>:MODEL response = variable-list/options; The MODEL statement names the variables to be used as the response and the independent variables. Additionally the distribution to be used to model the response can be specified as well as other options. The INVERSECL, LACKFIT, and HPROB options on the PROC statement may also be specified as options on individual MODEL statements. Options that can be used in the MODEL statement: CORRB prints the estimate of the correlation matrix. COVB prints the estimated covariance matrix. ITPRINT prints the iteration history and the final values of the gradient and hessian of the log likelihood. INTERCEPT=value specifies initial value for intercept. Default: 0 . INITIAL=value sets initial estimates for other model parameters. CONVERGE=value specifies the convergence criterion. Default: 1E-3 . SINGULAR=value specifies the value used to determine linear dependencies among the independent variables. Default: 1E12 . MAXIT=number sets maximum number of iterations to attempt while estimating the parameters. Default: 50 . NOINT suppresses the intercept parameter from the model. DISTRIBUTION | DIST | D= NORMAL | LOGISTIC | GOMPERTZ determines the distribution to use in modeling the response probabilities. Default: NORMAL d) WEIGHT variable; The WEIGHT statement can be used with PROC PROBIT to weight each observation by the value of the variable specified. The contribution of each observation to the likelihood function is multiplied by the value of the weight variable. Observations with zero, negative, or missing weights are not used in model estimation. e) BY <DESCENDING> variables ... <NOTSORTED>; A BY statement is used with a procedure to obtain separate analyses on observations in groups defined by the BY variables. The data set being processed need not have been previously sorted by the SORT procedure. However, the data set must be in the same order as though PROC SORT had sorted it unless NOTSORTED is specified. If you have used a FORMAT or ATTRIB statement to group a continuous variable into discrete groups, the BY statement creates BY groups based on the formatted values. You can also ensure that variables are processed in ascending order by creating an index for one or more variables in the SAS data set. The usages of the BY statement differ in each procedure. Please refer to the Users' Guide for the details. f) OUTPUT <OUT= SAS-data-set> options; The OUTPUT statement requests the creation of a SAS data set that contains the variables in the input data set, the fitted probabilities, the estimates of x'b and the estimates of its standard error. The following options may be specified in the OUTPUT statement. OUT= SAS-data-set names the output SAS data set being created. If no name is given, then the DATAn naming convention is used. PROB | P= name requests that a variable containing the probability estimates be added to the output data set and be given this name. XBETA= name requests that a variable containing the estimates of x'b be added to the output data set and be given this name. STD= name requests that a variable containing the standard error estimates of x'b be added to the output