eg21

advertisement
Documentation of EG21 TEXT and related files:
last update: BQ 6/12/1991
Purpose: fitting multivariate binary regression models that allow
------more than one class in each cluster, and a different
regression for each class and for the dependence between and
within classes. Allows the choice between GEE1 and GEE2.
Version: 0.5 Beta test version.
------Environment: IBM-VM/CMS
----------Necessary files: To run the program the following files are needed
--------------EGEE EXEC A : the exec that executes the program
EG21 TEXT A : the executable code
QMATRIX TXTLIB A : library needed by the program
XXX
YYY ? : the data file
ZZZ
EGEE A : the control file
( XXX, YYY and ZZZ are specified by the user )
Output: ZZZ OUTPUT A
-----FILE STDERR A
FILE MFILE A
FILE PROBEF A
: the output
: reserved for debugging purposes.
: reserved for debugging purposes.
: reserved for debugging purposes.
The user should not be concerned with these files. Future
versions of the program will not generate them.
To run the program: From the CMS command line issue:
-----------------EGEE
ZZZ
assuming the control file is named ZZZ EGEE A.
Output will go to ZZZ OUTPUT A. If ZZZ OUTPUT A exists
it will be replaced (not appended to) by the new file.
Data file format: Free format with one record per observation.
---------------The variables are: Cluster id
The class number
The response variable (y = 0/1)
The regressor(s)
The data file could have RECFM F or V. There is no restriction
on the LRECL.
Control file format:
------------------The control file can have RECFM F or V, maximum LRECL is 133.
The first record must contain four zeros as follows:
0 0 0 0
(this will not be needed on the final version of the program)
The second and third records are titles that will be printed on the
output file.
The fourth record is the data file name.
The fifth record contains an integer, the number of classes. See below
for maximium allowed.
The sixth record contains an integer, the number of variables that follow
the response in the data file. It is not necessary that all these
variables be used in the regressions.
The seventh record contains two integers, i1 and i2:
i1 = number of parameters.
i2 = number of parameters for main effects.
Naturally i1 is greater than or equal to i2. (not checked)
It must be arranged so that the odds-ratios parameters
are the last in the parameter vector. (not checked)
The eigth record contains a real number, the convergence criterion.
Iteration stops when the sum of the absolute changes in all
parameters between two iterations is less than that number or
the maximum number of iterations is reached, whichever occurs
first.
The ninth record contains an integer, the maximum number of iterations.
The tenth record contains an integer i1, say. If i1 = 1 then the
current estimates of the parameters will printed at each iteration.
The eleventh record contains an integer, i.
i = 1 : GEE1
i = 2 : GEE2
The twelveth record contains an integer i1, say. If i1 = 1 then the
Zhao and Prentice formulae for third and fourth order moments will be
used. if i1 = 2 then the exact solution will be used for these moments.
The thirteenth record is ignored.
The fourteenth and following records, as many as there are parameters,
specify labels for the parameters. These will be used to label the
output. Only the first 16 characters will be used.
The following record is ignored.
The following record(s) contain initial values for the parameters.
These may span one or more records.
The following record is ignored.
The following records specify the regressions. If the number of
classes is C, then
C + C + {C * (C-1) / 2}
records are required.
C specifications for the regressions for each class.
C specifications for the regressions for the within class odds
ratios.
C * (C-1) / 2
specifications for the regressions for the
between class odds ratios.
examples: C
number of specifications
1
2
2
5
3
9
4
14
5
20
6
27
7
35
8
44
Each regression is specified by a sequence of integers as follows:
i1 i2 i3 i4 i5 i6 ...
where i1 and i2 are class numbers. To specify the regression for
the main effects for a class set i2 = 0. i3, i5, .. are the
parameter indices. i4, i6, ... are the regressor indices.
If B is the regression parameter and x is the vector of
regressors in the data file then the regression will be
B(i3)*x(i4) + B(i5)*x(i6) + ...
If i3 = 0 then that regression is set to 0.
It must be arranged so that the odds-ratios parameters
are the last in the parameter vector. (not checked)
Each parameter should appear at least once in the regression
specifications.
Order of specification not important.
Any following records will be ignored.
Note: extra text following numbers on the control file is allowed
except on the following:
record number 4: the data file name
the record(s) specifying the initial parameter values
the record(s) specifying the regressions.
This is demonstrated by the example control file.
Current program limits:
-------------maximum number of classes = 12
maximum cluster size = 12
maximium number of observations: The sum of
n + n * (n-1) / 2
over all clusters must be <=16000, where n is the cluster size.
maximum number of parameters = 60
maximum number of potential regressors in the input file = 64
maximum number of iterations that could be specified = 100
Technical notes:
---------------
The values of the regressors used in the regression for the
within and between class associations should be the same for all
members of any given cluster. The program currently uses the
values from the last member in each cluster. Don't rely on this
"feature". This will change in future versions of the program.
The program does a fair amount of checking on the control file
and the data file. However it is not an exhaustive check.
The model specification is very flexible. Completely ridiculous
models can be specified. The program has no way of recognizing these.
Care is needed here.
Example control file:
-------------------0 0 0 0 (reserved, must be as shown)
-- Title1: example control file --- Title2: -COPD1 DATA A
2
= number of classes. Suppose class 1 = P, class 2 = S
6
= dim (x): x1 x2 x3 x4 x5 x6
9 6
= total=9, main effects=6
0.001
= convergence criterion
50
= maximun number of iterations
1
= print current estimates each iteration. 1=yes, 0=no
2
= 1=GEE1, 2=GEE2
2
= exact
2=exact 1=Z&P approx.
labels for beta: these will appear on the output file
1 Intercept
2 Sex (F)
3 Race (B)
4 Age-50
5 Smoker
6 Ex smoker
7 P.P
8 S.S
9 P.S
Initial estimates:
-0.83188 -0.80439 -0.91741
0.03796 1.14924 0.39144
0.93362 0.934 0.934
model specification:
1 0
1 1 2 2 3 3 4 4 5 5 6 6
2 0
1 1 2 2 3 3 4 4 5 5 6 6
1 1
7 1
2 2
8 1
1 2
9 1
-- End of the control File -The model specified above is:
Main effects:
For class 1 (P):
logit(Pr{Y=1}) = B1*x1 + B2*x2 + B3*x3 + B4*x4 + B5*x5 + B6*x6
For class 2 (S):
logit(Pr{Y=1}) = B1*x1 + B2*x2 + B3*x3 + B4*x4 + B5*x5 + B6*x6
Odds ratios:
Within class 1 (P.P):
log(odds ratio) = B7*x1
Within class 2 (S.S):
log(odds ratio) = B8*x1
Between classes 1 and 2 (P.S):
log(odds ratio) = B9*x1
Suppose that a model with no association between classes 1 and 2 is
required. Then the last line of the control file should be:
1 2 0
Download